A utility to parse fixed length flat files in C# using XML templates
Takeaway: Zach Smith explains how to use XML templates to create a generic flat file parsing utility that can be reused for many different file layouts. A sample application is included which demonstrates the ideas described.
This article is also available as a TechRepublic download, which includes the sample application and the accompanying code.
Fixed length flat files hold data without delimiters between the fields of data. Usually the layout of a fixed length flat file is shown in a series of columns with each column having a starting point and a length to indicate the position of the column. This allows developers to parse through the records of the file and segregate the columns. Many times the parsing routines are hard coded into the application which means every time the file layout changes or a new file needs to be imported you must change the source and recompile.
A recent project I was working on required me to parse many different flat file layouts. The file layouts were going to have frequent changes and additions. There would also be new files added that would need to be imported. What I needed was a parsing library that could import any number of different file layouts, without changing the source code for the parser.
The solution
After working with the problem I came up with the idea to use XML mapping files to instruct a parsing library how to parse the files. The library would load the mapping file, parse it, and create an internal map of how the flat file was laid out. It would then pick up the flat file and use the internal map to parse it and convert it into a List<T> of records, with each record being a List<T> itself (a collection of collections – List<List<T>>).
The first hurdle to this solution is coming up with the XML data needed to parse a flat file. As I said earlier, fixed length flat file fields are usually defined by their starting position and the length of characters they occupy. The XML shown in Listing A is what the example application uses as a template:
Listing A
<?xmlversion="1.0" encoding="utf-8" ?><FileMap>
<FieldName="FirstName" Start="0" Length="5"/>
<FieldName="LastName" Start="5" Length="10"/>
<FieldName="Address" Start="15" Length="15"/>
<FieldName="City" Start="30" Length="14"/>
<FieldName="State" Start="44" Length="2"/>
<FieldName="Zip" Start="46" Length="5"/>
</FileMap>
As you can see, the XML file simply defines the starting point, length, and name for each field specified in the flat file's format. An example record of this flat file is shown below:
DavidSmith 502 Gilgore AveJeffersonvilleIN47130
Using the XML template, this line would be parsed as:
FirstName: "David"LastName: "Smith "
Address: "502 Gilgore Ave"
City: "Jeffersonville"
State: "IN"
Zip: "47130"
Loading the XML template
Now that we have our XML template setup, we need to load and parse it into usable mapping data. To parse the XML template we will use an XmlDocumentobject. After the template is loaded into the XmlDocumentwe will then loop through the "Field" nodes and create a List<Field> object to hold our mapping information. The code for this is shown in Listing B.
Listing B
privateList<Field> GetFields(){
List<Field> fields = newList<Field>();
XmlDocument map = newXmlDocument();
//Load the mapping file into the XmlDocument
map.Load(mappingFile);
//Load the field nodes.
XmlNodeList fieldNodes = map.SelectNodes("/FileMap/Field");
//Loop through the nodes and create a field object
// for each one.
foreach (XmlNode fieldNode in fieldNodes)
{
Field field = newField();
//Set the field's name
field.Name = fieldNode.Attributes["Name"].Value;
//Set the field's length
field.Length =
Convert.ToInt32(fieldNode.Attributes["Length"].Value);
//Set the field's starting position
field.Start =
Convert.ToInt32(fieldNode.Attributes["Start"].Value);
//Add the field to the Field list.
fields.Add(field);
}
//Return the fields – this is now our map of how the flat
// file is laid out.
return fields;
}
This method returns a List<Field> collection which contains all the information we need to parse the flat file into usable fields. Now we need to loop through each line of the flat file and parse it using the List<Field> collection as a map.
Parsing the flat file
To parse the file we simply open the file and loop through each line, parsing every line as we go. The code for this is shown in Listing C, with comments describing what we are doing at each step:
Listing C
privateList<List<Field>> ParseFile(string inputFile){
//Get the field mapping.
List<Field> fields = GetFields();
//Create a List<List<Field>> collection of collections.
// The main collection contains our records, and the
// sub collection contains the fields each one of our
// records contains.
List<List<Field>> records = newList<List<Field>>();
//Open the flat file using a StreamReader.
using (StreamReader reader = newStreamReader(inputFile))
{
//Load the first line of the file.
string line = reader.ReadLine();
//Loop through the file until there are no lines
// left.
while (line != null)
{
//Create out record (field collection)
List<Field> record = newList<Field>();
//Loop through the mapped fields
foreach (Field field in fields)
{
Field fileField = newField();
//Use the mapped field's start and length
// properties to determine where in the
// line to pull our data from.
fileField.Value =
line.Substring(field.Start, field.Length);
//Set the name of the field.
fileField.Name = field.Name;
//Add the field to our record.
record.Add(fileField);
}
//Add the record to our record collection
records.Add(record);
//Read the next line.
line = reader.ReadLine();
}
}
//Return all of our records.
return records;
}
At this point we have parsed through all records in the flat file and have the field values segregated out into Field objects within a List collection. Looping through this collection and extracting the Field values is shown below. What we're basically doing here is printing out the information that was extracted from the flat file:
foreach (List<Field> record in records){
foreach (Field field in record)
{
this.txtResults.Text += field.Name + ": " +
field.Value + "\r\n";
}
this.txtResults.Text += "----end of record----\r\n";
}
Extending this solution
Feel free to use these ideas in your own projects and extend upon them. The final solution for my project was very dynamic and configurable, but is built on the ideas shown in this article. The system I have created is easily able to handle new file types, file formats, and file pick-up locations without changing a single line of code. It's all handled dynamically in XML configuration/logic files. This allows the code base to remain unchanged, which eases deployment and compliance concerns.
This same type of solution is also used to write out flat files from a database. However, this gets complicated due to the need of padding different field types. For example, numeric fields may be zero padded to the left, while character fields would be space padded to the right. To solve this you could add extra attributes to the XML template -- in my case I even went through the trouble of creating a separate XML file which contains information on how different field types should be formatted.
If you have any questions or issues you need help with, feel free to post a comment on this article with your question and I'll be glad to help.
Print/View all Posts Comments on this article
SponsoredWhite Papers, Webcasts, and Downloads
- End-User Performance: Building and Maintaining ROI SAP
- Pricing and Revenue Optimization: A Manufacturing Perspective SAP
- SUSE Linux Enterprise: Differentiation Through Interoperability Novell
- Clearing the Way for Faster, Smarter Decisions: Instant, Accurate Information Drives Competitive Edge SAP
- SUSE Linux Enterprise 10 SP2: Virtualization Technology Support Novell
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET

Harnessing the power of waves
Planting solar gardens
Fill your car for $1.10 a gallon?
