Quick Split to Fix data silliness

We have a vendor sending us daily updates on shipping info. We have a well known and defined structure for each type of data and those types map neatly to tables in our database. We have about 9 tables that need updated each day to give us the complete picture from this vendor point of view.

After months of trying to get them to send the data, it finally showed up; in 1 file. *sigh* They jammed, in random order, all of the new tables records into 1 unorganized file. The only saving grace is that the 2nd column defines the record — and table — type.

After I pondered this for a few moments, I started working on a quick and “simple” solution. I came up with this:

for x in `cat INFILE.dat | cut -f 2 -d $'\x01' | sort | uniq`; do cat INFILE.dat | grep $'\001'${x}$'\001' > ${x}.txt; done

Grab all of the unique table types and loop over the data for each one grep’ing as needed. There are probably more efficient ways, but this works pretty fast on our smallish data set.

About Grease Monkey

30+ Years of IT Geekiness, Linux Fanboy and Open Source patriot.
This entry was posted in Administration, Data and tagged , , . Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>