Sunday, December 11, 2016

Oracle Linux - transform CSV into JSON with bash

Like it or not, a large number of outputs created by systems is still in CSV, comma separated value, file format. The amount of information that is created, and needs processing, that is represented in CSV format is large. And it is good to understand how you could script against CSV files in your Oracle Linux bash scripts. to use the information or to transform this into other formats. As an example we use the below which is a section of a output file generated by an application that logs the use of doors within a building.

[root@localhost tmp]# cat example.csv
100231,AUTHORIZED,11-DEC-2016,13:12:15,IN,USED,F2D001
100231,AUTHORIZED,11-DEC-2016.13:14:01,IN,USED,F2D023
100231,AUTHORIZED,11-DEC-2016,13:15:23,IN,TIMEOUT,F2D024
100231,AUTHORIZED,11-DEC-2016,13:15:59,IN,USED,F2D024
100562,AUTHORIZED,11-DEC-2016,13:16:01,IN,USED,F1D001
100562,AUTHORIZED,11-DEC-2016,13:16:56,IN,USED,F1D003
100562,AUTHORIZED,11-DEC-2016,13:20:12,OUT,USED,F1D003
100562,AUTHORIZED,11-DEC-2016,13:20:58,IN,USED,F1D004
100231,AUTHORIZED,11-DEC-2016,13:20:59,OUT,USED,F2D024
[root@localhost tmp]#

As you can see from the above data we have some information in a per-line format using a CSV format to seperate the data. In this example the fields have the following meaning:

  • The ID of the access card used
  • The status of the authorization request by the card for a certain door
  • The date the authorization request was made
  • The time the authorization request was made
  • The direction of the revolving door the request was made for
  • The usage status, this can be USED or can be TIMEOUT in case the door was not used
  • The ID for the specific revolving door

The amount of things you might want to do from a data mining or security point of view are endless, however, having a CSV file on the file system of your Oracle Linux server is not making it useful directly. You will have to do something with it. To show you how you can use bash scripting to understand the CSV file itself is shown in the below example script;

#!/bin/bash
INPUT=/tmp/example.csv
OLDIFS=$IFS
IFS=,
[ ! -f $INPUT ] && { echo "$INPUT file not found"; exit 99; }
while read cardid checkstatus checkdate checktime doordirection doorstatus doorid
do
        echo "card used : $cardid"
        echo "check outcome : $checkstatus"
        echo "date : $checkdate"
        echo "time : $checktime"
        echo "direction : $doordirection"
        echo "usage : $doorstatus"
        echo "door used : $doorid"
        echo "----------------------"
done < $INPUT
IFS=$OLDIFS

As you can see in the above example we use the IFS variable to read and separate the values in the CSV file and place them in their own respective variables. The $IFS variable is a special shell variable and stands for Internal Field Separator. The Internal Field Separator (IFS)  is used for word splitting after expansion and to split lines into words with the read builtin command. Whenever trying to split lines into words you will have to look into the $IFS variable and how to use this.

The above example is quite simple and just prints the CSV file in a different way to the screen. More interesting is how you could transform a CSV file into something else, for example a JSON file. In the below example we will transform the CSV file into a correctly formatted JSON file.

#!/bin/bash
# NAME:
#   CSV2JSON.sh
#
# DESC:
#  Example script on how to convert a .CSV file to a .JSON file. The
#  code has been tested on Oracle Linux, expected to run on other
#  Linux distributions as well.
#
# LOG:
# VERSION---DATE--------NAME-------------COMMENT
# 0.1       11DEC2016   Johan Louwers    Initial upload to github.com
#
# LICENSE:
# Copyright (C) 2016  Johan Louwers
#
# This code is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This code is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this code; if not, write to the Free Software
# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
# 02110-1301, USA.
# *
# */

 inputFile=/tmp/example.csv
 OLDIFS=$IFS


 IFS=,
  [ ! -f $inputFile ] && { echo "$inputFile file not found"; exit 99; }

# writing the "header" section of the JSON file to ensure we have a
# good start and the JSON file will be able to work with multiple
# lines from the CSV file in a JSON array.
 echo "
  {
   \"checklog\":
   ["

# ensuring we have the number of lines from the input file as we
# have to ensure that the last part of the array is closed in a
# manner that no more information will follow. (not closing the
# the section with "}," however closing with "}" instead to
# prevent incorrect JSON formats. We will use a if check in the
# loop later to ensure this is written correctly.
 csvLength=`cat $inputFile | wc -l`
 csvDepth=1

 while read cardid checkstatus checkdate checktime doordirection doorstatus doorid
  do
     echo -e "   {
      \"CARDCHECK\" :
       {
        \"CARDID\" : \"$cardid\",
        \"CHECKSTATUS\" : \"$checkstatus\",
        \"CHECKDATE\" : \"$checkdate\",
        \"CHECKTIME\" : \"$checktime\",
        \"DIRECTION\" : \"$doordirection\",
        \"DOORSTATUS\" : \"$doorstatus\",
        \"DOORID\" : \"$doorid\"
       }"
     if [ "$csvDepth" -lt "$csvLength" ];
      then
        echo -e "     },"
      else
        echo -e "     }"
     fi
   csvDepth=$(($csvDepth+1))
  done < $inputFile

# writing the "footer" section of the JSON file to ensure we do
# close the JSON file properly and in accordance to the required
# JSON formating.
 echo "   ]
  }"

 IFS=$OLDIFS

As you can see, and test, you will now have a valid JSON output format. As JSON is much more the standard at this moment than CSV you can more easily use this JSON format in the next step of the process. You could for example use this to send it to a REST API endpoint for storing in a database, you can use it in a direct manner to upload it to Elasticsearch, or......

However, as stated, the entire way of working is using the $IFS variable available to you as a integrated part of the Linux shell. The above scv2json.sh code is uploaded to github and in case of any changes they will be made only to github.com, feel free to fork it and use it for your own projects. 

No comments: