How do I... Transfer and store data from an XML document in a relational database?
Takeaway: XML really makes it easy to move data from here to there and from there to here. However, eventually the data needs to be stored somewhere, most likely in a relational database. Here is how Edmond Woychowsky gets data from an XML document into a relational database.
This article is also available as a TechRepublic download, which includes the code listings in a more manageable text format.
Between AJAX, Web services and just plain XML there is a plethora of information going to and fro. Yes, XML really makes it easy to move data from here to there and from there to here. While this is really great it totally ignores the fact that eventually the data needs to be stored somewhere, most likely in a relational database. This leaves but a single question, how to get the information from the XML document into the relational database?
In a perfect world this type of procedure would be transparent; unfortunately this isn't a perfect world. If it were I'd look like Brad Pitt and have a Bill Gates type checkbook, alas. In order to make it as close to perfect as we can here are our choices:
- Plastic surgery
- Unleash my killer robot army
Oops, wrong list. Let's try that again.
- A plain vanilla looping approach using a single SQL statement per insert
- A plain vanilla looping approach building a number of delimited SQL statements, which can be executed at once.
- A bizarre mad scientist approach using XSL to create SQL
Care to venture a guess as to which approach I'm going to be using?
Yeah, number 3 is definitely the way to go. So let's take a look at the XML that we'll be dealing with which is shown in Listing A. Nothing flashy, just what is necessary for a proof of concept.
Listing A - Input XML document
<?xml version="1.0" ?>
<!-- Edited with the Butterfly XML Editor (http://www.butterflyxml.org) -->
<root>
<row>
<state_id>PA</state_id>
<state_name>Pennsylvania</state_name>
</row>
<row>
<state_id>NJ</state_id>
<state_name>New Jersey</state_name>
</row>
</root>
Next let's take a look at the table that we'll be creating the inserts for, shown in Table A.
Table A - Target table
|
state_id |
VARCHAR2(2) |
|
state_name |
VARCHAR2(50) |
Now with that information there are two possible courses of action that we can take. The first is to create an XSL style sheet that simulates the first approach, "A plain vanilla looping approach using a single SQL statement per insert". This approach has the advantages of speed and being pretty generic, after all XSL is pretty much a world wide standard everywhere except for possibly Redmond.
The XSL necessary for this task, shown in Listing B, looks similar to most other XSL with a couple of differences which I'll point out. First there is the xsl:output element, specifically the media-type attribute. In this style sheet the media-type is set to text/sql instead of the default text/xml or the common text/html. The other big differences are the inclusion of in.xsl and sqlApostrophe.xsl, which are used to determine which elements are text or date as opposed to numeric and replace single apostrophes with double apostrophes to prevent issues with the SQL. In addition, there is also the shear number of xsl:text and xsl:value-of elements used to create the output.
Listing B - XSL style sheet
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="table"/>
<xsl:param name="textColumns"/>
<xsl:param name="dateColumns"/>
<xsl:includehref="in.xsl"/>
<xsl:includehref="sqlApostrophe.xsl"/>
<!--
Stylesheet: sample.xsl
Creation Date: October 25, 2006
Programmer: Edmond Woychowsky
Purpose: The purpose of this XSL style sheet is to generate multiple SQL insert statements.
Template: match="/"
Creation Date: October 25, 2006
Programmer: Edmond Woychowsky
Purpose: The purpose of this template is to create the outer sql element and invoke the template for the individual INSERT statements.
Update Date:Programmer: Description:
-->
<xsl:template match="/">
<xsl:element name="sql">
<xsl:apply-templates select="//row"/>
</xsl:element>
</xsl:template>
<!--
Template: match="row"
Programmer: Edmond Woychowsky
Purpose: The purpose of this template is to control the creation of the INSERT statements.
Update Date:Programmer: Description:
-->
<xsl:template match="row">
<xsl:element name="statement">
<xsl:value-of select="concat('INSERT INTO ',$table,' (')"/>
<xsl:apply-templates select="*" mode="name"/>
<xsl:text>) VALUES (</xsl:text>
<xsl:apply-templates select="*" mode="value"/>
<xsl:text>)</xsl:text>
</xsl:element>
</xsl:template>
<!--
Template: match="*" mode="name"
Programmer: Edmond Woychowsky
Purpose: The purpose of this template is to list the column names.
Update Date:Programmer: Description:
-->
<xsl:template match="*" mode="name">
<xsl:if test="position() != 1">
<xsl:text>, </xsl:text>
</xsl:if>
<xsl:value-of select="name(.)"/>
</xsl:template>
<!--
Template: match="*" mode="value"
Programmer: Edmond Woychowsky
Purpose: The purpose of this template is to list the column values.
Update Date:Programmer: Description:
-->
<xsl:template match="*" mode="value">
<xsl:variable name="text">
<xsl:call-template name="in">
<xsl:with-param name="list" select="$textColumns"/>
<xsl:with-param name="value" select="name(.)"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="date">
<xsl:call-template name="in">
<xsl:with-param name="list" select="$dateColumns"/>
<xsl:with-param name="value" select="name(.)"/>
</xsl:call-template>
</xsl:variable>
<xsl:if test="position() != 1">
<xsl:text>, </xsl:text>
</xsl:if>
<xsl:choose>
<xsl:when test="$text = 'true'">
<xsl:text>'</xsl:text>
<xsl:call-template name="sqlApostrophe">
<xsl:with-param name="string" select="."/>
</xsl:call-template>
<xsl:text>'</xsl:text>
</xsl:when>
<xsl:when test="$date = 'true'">
<xsl:value-of select="."/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Of course this approach still leaves us with the problem of having to loop through the XML document in order to execute the SQL statements one at a time, which is not one of my favorite things. There is, however, an alternative, creating one multi-part SQL statement with the individual statements delimited by semicolons.
I don't consider myself lazy, I consider myself efficient. After all this approach is no more prone to errors that the last one. So with some minor changes, mostly moving things about, I produced the XSL style sheet shown in Listing C.
Listing C - Efficient XSL Style Sheet
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:param name="table"/>
<xsl:param name="textColumns"/>
<xsl:param name="dateColumns"/>
<xsl:includehref="in.xsl"/>
<xsl:includehref="sqlApostrophe.xsl"/>
<!--
Stylesheet: sample.xsl
Creation Date: October 25, 2006
Programmer: Edmond Woychowsky
Purpose: The purpose of this XSL style sheet is to generate multiple SQL insert statements.
Template: match="/"
Creation Date: October 25, 2006
Programmer: Edmond Woychowsky
Purpose: The purpose of this template is to create the outer sql element and invoke the template for the individual INSERT statements.
Update Date:Programmer: Description:
-->
<xsl:template match="/">
<xsl:element name="sql">
<xsl:apply-templates select="//row"/>
</xsl:element>
</xsl:template>
<!--
Template: match="row"
Programmer: Edmond Woychowsky
Purpose: The purpose of this template is to control the creation of the INSERT statements.
Update Date:Programmer: Description:
-->
<xsl:template match="row">
<xsl:value-of select="concat('INSERT INTO ',$table,' (')"/>
<xsl:apply-templates select="*" mode="name"/>
<xsl:text>) VALUES (</xsl:text>
<xsl:apply-templates select="*" mode="value"/>
<xsl:text>)</xsl:text>
</xsl:template>
<!--
Template: match="*" mode="name"
Programmer: Edmond Woychowsky
Purpose: The purpose of this template is to list the column names.
Update Date:Programmer: Description:
-->
<xsl:template match="*" mode="name">
<xsl:if test="position() != 1">
<xsl:text>, </xsl:text>
</xsl:if>
<xsl:value-of select="name(.)"/>
</xsl:template>
<!--
Template: match="*" mode="value"
Programmer: Edmond Woychowsky
Purpose: The purpose of this template is to list the column values.
Update Date:Programmer: Description:
-->
<xsl:template match="*" mode="value">
<xsl:variable name="text">
<xsl:call-template name="in">
<xsl:with-param name="list" select="$textColumns"/>
<xsl:with-param name="value" select="name(.)"/>
</xsl:call-template>
</xsl:variable>
<xsl:variable name="date">
<xsl:call-template name="in">
<xsl:with-param name="list" select="$dateColumns"/>
<xsl:with-param name="value" select="name(.)"/>
</xsl:call-template>
</xsl:variable>
<xsl:if test="position() != 1">
<xsl:text>, </xsl:text>
</xsl:if>
<xsl:choose>
<xsl:when test="$text = 'true'">
<xsl:text>'</xsl:text>
<xsl:call-template name="sqlApostrophe">
<xsl:with-param name="string" select="."/>
</xsl:call-template>
<xsl:text>'</xsl:text>
</xsl:when>
<xsl:when test="$date = 'true'">
<xsl:value-of select="."/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="."/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
</xsl:stylesheet>
Conclusion
I have been using this approach, or one similar to it, for the last several years with good result. While it's speed of execution, from a SQL perspective isn't that of stored procedures it does offer better results than the more common string concatenation method of creating SQL. I would also like to point out that I said that I use this approach or one similar to it, basically the only change that I'd add would be error handling with commit and or rollback statements. But, these modifications are only a matter of both personal taste and personal paranoia.
Print/View all Posts Comments on this article
SponsoredWhite Papers, Webcasts, and Downloads
- The Economist: A new mandate for IT SAP
- Watts and Volt-Amps: Powerful Confusion American Power Conversion (APC)
- CRM Without Compromise: A Strategy for Profitable Growth SAP
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET


