On CBS.com: Farting dog is expelled

A utility to parse fixed length flat files in C# using XML templates

Tags: Zach Smith, C#, FieldName, XML

  • Save
  • Print
  • Digg This
  • 4

Takeaway: Zach Smith explains how to use XML templates to create a generic flat file parsing utility that can be reused for many different file layouts. A sample application is included which demonstrates the ideas described.

This article is also available as a TechRepublic download, which includes the sample application and the accompanying code.

Fixed length flat files hold data without delimiters between the fields of data. Usually the layout of a fixed length flat file is shown in a series of columns with each column having a starting point and a length to indicate the position of the column. This allows developers to parse through the records of the file and segregate the columns. Many times the parsing routines are hard coded into the application which means every time the file layout changes or a new file needs to be imported you must change the source and recompile.

A recent project I was working on required me to parse many different flat file layouts. The file layouts were going to have frequent changes and additions. There would also be new files added that would need to be imported. What I needed was a parsing library that could import any number of different file layouts, without changing the source code for the parser.

The solution

After working with the problem I came up with the idea to use XML mapping files to instruct a parsing library how to parse the files. The library would load the mapping file, parse it, and create an internal map of how the flat file was laid out. It would then pick up the flat file and use the internal map to parse it and convert it into a List<T> of records, with each record being a List<T> itself (a collection of collections – List<List<T>>).

The first hurdle to this solution is coming up with the XML data needed to parse a flat file. As I said earlier, fixed length flat file fields are usually defined by their starting position and the length of characters they occupy. The XML shown in Listing A is what the example application uses as a template:

Listing A

<?xmlversion="1.0" encoding="utf-8" ?>
<FileMap>
  <FieldName="FirstName" Start="0" Length="5"/>
  <FieldName="LastName" Start="5" Length="10"/>
  <FieldName="Address" Start="15" Length="15"/>
  <FieldName="City" Start="30" Length="14"/>
  <FieldName="State" Start="44" Length="2"/>
  <FieldName="Zip" Start="46" Length="5"/>
</FileMap>

As you can see, the XML file simply defines the starting point, length, and name for each field specified in the flat file's format. An example record of this flat file is shown below:

DavidSmith     502 Gilgore AveJeffersonvilleIN47130

Using the XML template, this line would be parsed as:

FirstName: "David"
LastName: "Smith     "
Address: "502 Gilgore Ave"
City: "Jeffersonville"
State: "IN"
Zip: "47130"

Loading the XML template

Now that we have our XML template setup, we need to load and parse it into usable mapping data. To parse the XML template we will use an XmlDocumentobject. After the template is loaded into the XmlDocumentwe will then loop through the "Field" nodes and create a List<Field> object to hold our mapping information. The code for this is shown in Listing B.

Listing B

    privateList<Field> GetFields()
    {
        List<Field> fields = newList<Field>();
        XmlDocument map = newXmlDocument();

        //Load the mapping file into the XmlDocument
        map.Load(mappingFile);

        //Load the field nodes.
        XmlNodeList fieldNodes = map.SelectNodes("/FileMap/Field");

        //Loop through the nodes and create a field object
        // for each one.
        foreach (XmlNode fieldNode in fieldNodes)
        {
            Field field = newField();
           
            //Set the field's name
            field.Name = fieldNode.Attributes["Name"].Value;
           
            //Set the field's length
            field.Length =
                 Convert.ToInt32(fieldNode.Attributes["Length"].Value);

            //Set the field's starting position
            field.Start =
                  Convert.ToInt32(fieldNode.Attributes["Start"].Value);

            //Add the field to the Field list.
            fields.Add(field);
        }

        //Return the fields – this is now our map of how the flat
  // file is laid out.
        return fields;
    }

This method returns a List<Field> collection which contains all the information we need to parse the flat file into usable fields. Now we need to loop through each line of the flat file and parse it using the List<Field> collection as a map.

Parsing the flat file

To parse the file we simply open the file and loop through each line, parsing every line as we go. The code for this is shown in Listing C, with comments describing what we are doing at each step:

Listing C

    privateList<List<Field>> ParseFile(string inputFile)
    {
        //Get the field mapping.
        List<Field> fields = GetFields();
        //Create a List<List<Field>> collection of collections.
        // The main collection contains our records, and the
        // sub collection contains the fields each one of our
        // records contains.
        List<List<Field>> records = newList<List<Field>>();

        //Open the flat file using a StreamReader.
        using (StreamReader reader = newStreamReader(inputFile))
        {
            //Load the first line of the file.
            string line = reader.ReadLine();

            //Loop through the file until there are no lines
            // left.
            while (line != null)
            {
                //Create out record (field collection)
                List<Field> record = newList<Field>();

                //Loop through the mapped fields
                foreach (Field field in fields)
                {
                    Field fileField = newField();

                    //Use the mapped field's start and length
                    // properties to determine where in the
                    // line to pull our data from.
                    fileField.Value =
                        line.Substring(field.Start, field.Length);

                    //Set the name of the field.
                    fileField.Name = field.Name;

                    //Add the field to our record.
                    record.Add(fileField);
                }

                //Add the record to our record collection
                records.Add(record);

                //Read the next line.
                line = reader.ReadLine();
            }
        }

        //Return all of our records.
        return records;
    }

At this point we have parsed through all records in the flat file and have the field values segregated out into Field objects within a List collection. Looping through this collection and extracting the Field values is shown below. What we're basically doing here is printing out the information that was extracted from the flat file:

    foreach (List<Field> record in records)
    {
        foreach (Field field in record)
        {
            this.txtResults.Text += field.Name + ": " +
                                    field.Value + "\r\n";
        }

        this.txtResults.Text += "----end of record----\r\n";
    }

Extending this solution

Feel free to use these ideas in your own projects and extend upon them. The final solution for my project was very dynamic and configurable, but is built on the ideas shown in this article. The system I have created is easily able to handle new file types, file formats, and file pick-up locations without changing a single line of code. It's all handled dynamically in XML configuration/logic files. This allows the code base to remain unchanged, which eases deployment and compliance concerns.

This same type of solution is also used to write out flat files from a database. However, this gets complicated due to the need of padding different field types. For example, numeric fields may be zero padded to the left, while character fields would be space padded to the right. To solve this you could add extra attributes to the XML template -- in my case I even went through the trouble of creating a separate XML file which contains information on how different field types should be formatted.

If you have any questions or issues you need help with, feel free to post a comment on this article with your question and I'll be glad to help.

  • Save
  • Print
  • Digg This
  • 4

Print/View all Posts Comments on this article

Write out flat files from a databasealessandrosvenson@...  | 02/20/07
Re: Write out flat files from a databasezs_box@...  | 08/13/07
RE: A utility to parse fixed length flat files in C# using XML templatesdenny.regehr@...  | 07/18/07
RE: A utility to parse fixed length flat files in C# using XML templateszs_box@...  | 08/13/07
RE: A utility to parse fixed length flat files in C# using XML templatesaammi@...  | 01/29/08
Explainationzs_box@...  | 02/22/08

What do you think?

Article Categories

Security
Security Solutions, IT Locksmith
Networking and Communications
E-mail Administration NetNote, Cisco Routers and Switches
CIO and IT Management
Project Management, CIO Issues, Strategies that Scale
Desktops, Laptops & OS
Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
Data Management
Oracle, SQL Server
Servers
Windows NT, Linux NetNote, Windows Server 2003
Career Development
Geek Trivia
Software/Web Development
Web Development Zone, Visual Basic, .NET

The Green Enterprise

advertisement
Click Here