1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
SCRIPT="gen_${1}.sql" echo "--gen_${1}.sql" > $SCRIPT typeset -i counter counter=0 echo "Get all lines that start with D|4| (this may take a while)" LINES=$(grep "D|4|" $1) echo "Generate insert statements to $SCRIPT" for LINE in $LINES; do REFNO=$(echo $LINE | awk '{split($0,a,"|");print a[8]}') echo "insert into CHEQUE (datecomptable, noreference) values (to_date('2007/11/30','YYYY/MM/DD'),'${REFNO}');" >> $SCRIPT counter=counter+1 done echo "commit;" >> $SCRIPT echo "${counter} records created in $SCRIPT"
Refactorings
No refactoring yet !
pascal.charest
February 8, 2008, February 08, 2008 05:23, permalink
Not sure you'll get that great of a speed up, but you should modify line 7 to be like following. The partern then require a start of line.
1
LINES=$(grep "^D|4|" $1)
Eineki
February 8, 2008, February 08, 2008 10:39, permalink
Just a quick thought
Maybe cut is more efficent than awk for a simple split.
If I'm right, line 11 REFNO=$(echo $LINE | awk '{split($0,a,"|");print a[8]}') should change in
NOTE: Assuming than awk start counting fields from 1
1
REFNO=$(echo $LINE | cut -d\| -f8)
furtive
February 8, 2008, February 08, 2008 19:49, permalink
It didn't make a difference performance wise but I really like the clarity of the cut command, I didn't know it existed. Thanks!
Eineki
February 10, 2008, February 10, 2008 00:57, permalink
Hi, the following code don't fire a lot of subshell and this should do a big difference.
With cut you don't have to process line per line the result of the grep and sed "decorates" the single lines
creating the sql statements you need.
Let me know if this solution works faster than the original one
Just A test I've done:
harvesting from a file of near 500000 lines tooks about 5 seconds
on a athlon64 3800 with 1Gb Ram as you can see in the time output
$ time ./test database
Generate insert statements to gen_database.sql (this may take a while)
469573 records created in gen_database.sql
real 0m5.633s
user 0m4.996s
sys 0m0.380s
1 2 3 4 5 6 7 8 9 10 11 12
SCRIPT="gen_${1}.sql" echo "--gen_${1}.sql" > $SCRIPT typeset -i counter counter=0 INSERT_STATEMENT="insert into CHEQUE (datecomptable, noreference) values (to_date('2007\/11\/30','YYYY\/MM\/DD'),'&');" echo "Generate insert statements to $SCRIPT (this may take a while)" $(grep "D|4|" $1 | cut -d\| -f8 | sed -e "s/.*/${INSERT_STATEMENT}/" >> $SCRIPT) counter=$(expr `wc -l ${SCRIPT} | cut -f1 -d\ ` - 1) echo "commit;" >> $SCRIPT echo "${counter} records created in $SCRIPT"
Andrew Gilmartin
February 11, 2008, February 11, 2008 16:34, permalink
The only bash change I would make is to replace the use of the back-tick with a nested expression.
1
counter=$(expr $(wc -l $SCRIPT | cut -f1 -d\ ) - 1)
grrrreg
February 19, 2008, February 19, 2008 14:44, permalink
I think you should replace the for loop with an awk command. Awk knows how to grep.
note: I use a "while read" loop with "echo" because printing ' inside awk is such a mess.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
SCRIPT="gen_${1}.sql" echo "--gen_${1}.sql" > $SCRIPT typeset -i counter counter=0 echo "Generate insert statements to $SCRIPT (this may take a while)" awk -F \| ' /D\|4\|/ { print $8} ' $1 | while read DATE;do echo "insert into CHEQUE (datecomptable, noreference) values (to_date('2007/11/30','YYYY/MM/DD'),'${DATE}');" >> $SCRIPT done counter=$(grep -v -- "--${SCRIPT}" ${SCRIPT} | wc -l) echo "commit;" >> $SCRIPT echo "${counter} records created in $SCRIPT"
Here's a quick and dirty bash script that takes a file and looks for a particular line, then loops through those lines and splits the contents to get the 8th column and create an sql insert statement with it. I'm not looking to make it more generic, but the for loop is unbearably slow. How to improve it?