Easily parse string values with .NET
Takeaway: The .NET Framework simplifies processing and formatting data with the String class and its Split and Join methods or regular expressions. Learn more about using these methods in your application.
Processing string values is an integral aspect of most application development projects. This often involves parsing strings into separate values. For instance, receiving data from an external data source such as a spreadsheet often utilizes a common format like comma-separated values. The .NET String class simplifies the process of extracting the individual values between the commas.
Extracting values
The
Here are the two variables:
- String.Split(char[]) in C# or String.Split(Char()) in VB.NET
- String.Split(char[], int) in C# or String.Split(Char(), Integer) in VB.NET
The following C# snippet populates an array with values contained in a comma-separated string value:
string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";
string[] sites = values.Split(',');
foreach (string s in sites) {
Console.WriteLine(s);
}
The following output is generated:
TechRepublic.com
CNET.com
News.com
Builder.com
GameSpot.com
The equivalent VB.NET code follows:
Dim values As String
values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"
Dim sites As String() = Nothing
sites = values.Split(",")
Dim s As String
For Each s In sites
Console.WriteLine(s)
Next s
You may specify multiple separator characters, which are contained in a character array. The following code splits a string of values separated by a comma, semicolon, or colon. In addition, it uses the optional second parameter to set the maximum number of items returned at four.
char[] sep = new char[3];
sep[0] = ',';
sep[1] = ':';
sep[2] = ';';
string values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com";
string[] sites = values.Split(sep, 4);
foreach (string s in sites) {
Console.WriteLine(s);
}
The following output is generated (notice that the second parameter places the remainder of the string in the last array element):
TechRepublic.com
CNET.com
News.com
Builder.com; GameSpot.com
The equivalent VB.NET code follows:
Dim values As String
values = "TechRepublic.com: CNET.com, News.com, Builder.com; GameSpot.com"
Dim sites As String() = Nothing
Dim sep(3) As Char
sep(0) = ","
sep(1) = ":"
sep(2) = ";"
sites = values.Split(sep, 4)
Dim s As String
For Each s In sites
Console.WriteLine(s)
Next s
While the Split method allows you to easily work with individual elements contained in a string value, you may need to format values according to a predefined format like comma-separated values. The String class makes it easy to assemble a properly formatted string.
Putting it together
The Join method of the String class accepts the character to be used as the separator as its first parameter. The values to be concatenated are passed as the second parameter in the form of a string array. It has one overloaded method signature that accepts integer values as the third and fourth parameters. The third parameter specifies the first array element to use, and the last parameter is the total number of elements to use.
The following C# code sample demonstrates assembling the values used in the previous example:
string sep = ", ";
string[] values = new String[5];
values[0] = "TechRepublic.com";
values[1] = "CNET.com";
values[2] = "News.com";
values[3] = "Builder.com";
values[4] = "GameSpot.com";
string sites = String.Join(sep, values);
Console.Write(sites);
The following output is generated:
TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com
The equivalent VB.NET follows:
Dim sep As String
sep = ", "
Dim values(4) As String
values(0) = "TechRepublic.com"
values(1) = "CNET.com"
values(2) = "News.com"
values(3) = "Builder.com"
values(4) = "GameSpot.com"
Dim sites As String
sites = String.Join(sep, values)
Console.Write(sites)
We could use the overloaded format to specify where to begin and how many elements to include in the result. The following sample begins with the second (note that element numbering begins at zero) and returns a maximum of three elements:
Dim sep As String
sep = ", "
Dim values(4) As String
values(0) = "TechRepublic.com"
values(1) = "CNET.com"
values(2) = "News.com"
values(3) = "Builder.com"
values(4) = "GameSpot.com"
Dim sites As String
sites = String.Join(sep, values, 2, 3)
Console.Write(sites)
The starting element number and the maximum values to return must be valid within the string array being used. If either is invalid (i.e., not contained in the array), then an exception is thrown. For this reason, it is a good idea to utilize a try/catch block to handle any problems.
While the String class provides the necessary methods, it isn't the only way to handle the parsing of a string value. Another common approach takes advantage of regular expressions.
Parsing with regular expressions
The .NET Framework provides the Regex class contained in the System.Text.RegularExpressions namespace for using regular expressions within a .NET application. Parsing is only one of the many applications of regular expressions.
Let's examine the parsing of our sample string using regular expressions. The following ASP.NET page uses C# to parse a comma-delimited list of sites into an array:
<%@ Page Language="C#" Debug="true" %>
<%@ Import Namespace="System.Text.RegularExpressions" %>
<script language="C#" runat="server">
private void Page_Load(object sender, System.EventArgs e){
if (!IsPostBack) {
string values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com";
string pattern = ",(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))";
Regex r = new Regex(pattern);
string[] sites = r.Split(values);
foreach (string s in sites) {
Response.Write(s);
Response.Write("<br>");
} } }
</script>
The equivalent VB.NET code follows. Notice that the inclusion of quotation marks in the string value (pattern) causes problems. So, the quotation marks contained in the string must be escaped to be recognized; this may be achieved by placing two of the characters adjacent to each other.
<%@ Page Language="VB" Debug="true" %>
<%@ Import Namespace="System.Text.RegularExpressions" %>
<script language="VB" runat="server">
Sub Page_Load
If Not (IsPostBack) Then
Dim values As String
values = "TechRepublic.com, CNET.com, News.com, Builder.com, GameSpot.com"
Dim pattern As String
pattern = ",(?=(?:[^\""]*\""[^\""]*\"")*(?![^\""]*\\""))"
Dim r As Regex
r = new Regex(pattern)
Dim sites As String()
sites = r.Split(values)
Dim s As String
For Each s In sites
Response.Write(s)
Response.Write("<br>")
Next s
End If
End Sub
</script>
Easily work with data
The .NET Framework makes it easy to work with data regardless of its format. A string containing values separated by a specific character is easily processed via the String class or possibly regular expressions. The method that you decide to use will depend on your specific application.
Miss a column?
Check out the .NET Archive, and catch up on the most recent editions of Tony Patton's column.
Print/View all Posts Comments on this article
SponsoredWhite Papers, Webcasts, and Downloads
- TCP/IP Troubleshooting Global Knowledge
- 2007 IT Salary and Skills Survey: What Impacts Salaries? Global Knowledge
- BitLocker: Is It Really Secure? Global Knowledge
- Vista SP1: What You Need To Know Before You Deploy Global Knowledge
- 7 Things Every System Administrator Should Know About OpenSSH Global Knowledge
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET





