Speeding Up Shell Scripts

Tags

, ,


Recently I wrote a BASH shell script to collect message from a RabbitMQ queue and process them into a MySQL database. it seamed pretty simple, just call an app to return me a file with the message in it and then pull apart the message that was formatted in “collectd” format, which looked something like:

collectd.host.process.load-ave 1.2 1407801422

Three values separated with spaces, I used a loop and then used old favourites  “cut” and “tr” to pull the line apart. The script looked like this:


for FILE in `ls mq-*.data`
do
    LINE=`cat $FILE`
    TimeStamp=`echo $LINE|cut -d' ' -f3|tr -d '40111215'`
    Val=`echo $LINE|cut -d' ' -f2`
    Host=`echo $LINE|cut -d' ' -f1`
    SOURCE=`echo $Host|cut -d'.' -f1`
    HOST=`echo $Host|cut -d'.' -f2`
    TYPE=`echo $Host|cut -d'.' -f3`
    NAME=`echo $Host|cut -d'.' -f4`
    INS="insert into metrics (source,host,metric_type,metric_name,metric_val,metric_timestamp) values ('${SOURCE}','${HOST}','${TYPE}','${NAME}','${Val}',${TimeStamp});"
    InsertIntoDB "${INS}"
    rm -f $FILE
done

Everything went fine till I substantially increased the number of files to process and then the time to process was sower than the delivery rate which resulted in the queue ballooning to 2,000,000 records to process. Clearly the processing time was ridiculously too long.

Each call to tr and cut spawned a new process so I needed to remove these and reduce the processing.

Reworking the script

Version 2 now looks like this:


for FILE in `ls mq-*.data`
do
    LINE=`cat $FILE`
    STR=(`echo ${LINE}|tr " " "\n"`)
    Host=${STR[0]}
    Val=${STR[1]}
    TimeStamp=${STR[2]}
    PARAM=(`echo ${Host}|tr "." "\n"`)
    SOURCE=${PARAM[0]}
    HOST=${PARAM[1]}
    TYPE=${PARAM[2]}
    NAME=${PARAM[3]}
    INS="insert into metrics (source,host,metric_type,metric_name,metric_val,metric_timestamp) values ('${SOURCE}','${HOST}','${TYPE}','${NAME}','${Val}',${TimeStamp});"
    InsertIntoDB "${INS}"
    rm -f $FILE
done

By using BASH shell arrays and a single tr to initially split on the space and then a second one to split on the “.” the number of spawned processes has been culled significantly that the processing time is now sustainable before the task needs to be re-written in “C’ or “C++”.

References:

http://shortrecipes.blogspot.com.au/2010/02/bash-split-string-to-array.html