At work we're taking up slack from people who've left under odd circumstances with deadlines unmet. I've been working on a parser to import data from mainframe flat files into SQL Server, which is then processed and put into Excel reports for viewing.

My first parser was web-based, because it only merges two files. Yesterday I wrote another parser, but this time a standalone EXE. It asks for one input field – the folder where the data is stored. It iterates through four subdirectories, renames files if necessary (because the StreamReader can't read files that have more than one dot in the name for some reason), parses them one at a time, and then writes out a CSV file, one for each folder. It then removes duplicate rows (using a hash table – the source data isn't completely clean), sorts the CSV, and imports it into SQL. On 50MB of source data it takes around 15 seconds. There is 1GB of data to process.

This isn't that impressive in terms of speed, since it was being done in SAS in an equivalent amount of time. However, we're trying to automate the reporting process, and this means phasing out SAS and other programs where possible, sticking with SQL and ASP.

My standalone EXE can be run from a SQL stored procedure, in case you didn't think I've thought of that! The web interface was too slow for running the merge on so much data (12 minutes for 50MB).

So, I've been enjoying work. It's the challenge I've been looking for whilst pursuing my teaching degree.

Leave a Reply

Your email address will not be published. Required fields are marked *