Quick Split to Fix data silliness

We have a vendor sending us daily updates on shipping info. We have a well known and defined structure for each type of data and those types map neatly to tables in our database. We have about 9 tables that need updated each day to give us the complete picture from this vendor point of view.

After months of trying to get them to send the data, it finally showed up; in 1 file. *sigh* They jammed, in random order, all of the new tables records into 1 unorganized file. The only saving grace is that the 2nd column defines the record — and table — type.

After I pondered this for a few moments, I started working on a quick and “simple” solution. I came up with this:

for x in `cat INFILE.dat | cut -f 2 -d $'\x01' | sort | uniq`; do cat INFILE.dat | grep $'\001'${x}$'\001' > ${x}.txt; done

Grab all of the unique table types and loop over the data for each one grep’ing as needed. There are probably more efficient ways, but this works pretty fast on our smallish data set.


Grease Monkey ~~ GM

About Grease Monkey

Computer nerd since the 80's. Data nerd since the 90's. Generic nerd for a lifetime.
This entry was posted in Uncategorized. Bookmark the permalink.