Parse XML documents with JAXP
Takeaway: You can implement a JAXP parser using the Apache Xerces-2 parser. Here's how.
This article originally appeared as an XML e-newsletter.
By Brian Schaffner
There are many ways to parse XML documents with Java. You have options for parsing using DOM and SAX, which are the standard parsing techniques. You also have the option to use the Java API for XML Processing (JAXP).
JAXP is a Java interface that provides a standard approach to parsing XML documents. Let's look at how you can implement a JAXP parser using the Apache Xerces-2 parser.
Factory patterns
JAXP provides parsers for DOM and SAX approaches to processing XML documents. The factory class you use determines the approach you use. A factory class is a standard design pattern that gives you the ability to manufacture classes as needed.
With JAXP, you can use either the DocumentBuilderFactory to create DocumentBuilder classes or the SAXParserFactory to create SAXParser classes. The difference is that DOM parsers read the entire document into memory and allow you to traverse the document in a random access way, while SAX parsers call handlers to interpret XML data as it's encountered in the document. We'll concentrate on DocumentBuilder classes for now.
DocumentBuilder
The DocumentBuilder class is created by calling the newDocumentBuilder method of the DocumentBuilderFactory class. You can create as many DocumentBuilderFactories as you want using the newInstance method of the DocumentBuilderFactory class.
For example, to start you'll want to create a new DocumentBuilderFactory, like this:
DocumentBuilderFactory dbfactory = DocumentBuilderFactory.newInstance();
Once you have a handle to the factory, you can create an instance of the actual DOM parser using the following code:
DocumentBuilder builder = dbfactory. newDocumentBuilder();
This creates a new instance of the actual DocumentBuilder class. In order to parse a document, you call the parse method of the DocumentBuilder class. The parse method will return a Document object, which you can use to process the XML document.
Listing A shows a simple implementation using the DocumentBuilderFactory and DocumentBuilder classes. Click here.
The DocumentBuilder class is really just a DOM parser. The advantage of using the JAXP DocumentBuilder class is portability to other underlying XML parser implementations. The DocumentBuilderFactory and DocumentBuilder classes provide an abstraction layer that removes the dependency on a specific parser from your code.
Real document
When using DOM via the DocumentBuilder interface, the parser will return a Document class. This is important because the Document class is defined by the W3C, which means you can interact with the Document class exactly as you would if you were using any other DOM parser.
For example, you can retrieve an element's value using the following method:
String getXMLValue(Document doc, String name)
{
NodeList
nlist=doc.getElementsByTagName(name);
String value =
nlist.item(0).getFirstChild().getNodeValue();
return value;
}
This method looks for a child node within the document with the same name as the string passed into the method.
Brian Schaffner is an associate director for Fujitsu Consulting. He provides architecture, design, and development support for Fujitsu's Technology Consulting practice.
SponsoredWhite Papers, Webcasts, and Downloads
- VoIP in K-12 Education: Leveraging Data Networks and E-Rate Funding ShoreTel
- PS Series Groups: Deploying Microsoft SQL Server in an iSCSI SAN Dell EqualLogic
- Sprint IPVoice Connect Fact Sheet Sprint
- SQL Server Advanced Protection and Fast Recovery with Dell EqualLogic Auto-Snapshot Manager Dell EqualLogic
- IP Telephony Executive Guide 2007 ShoreTel
Article Categories
- Security
- Security Solutions, IT Locksmith
- Networking and Communications
- E-mail Administration NetNote, Cisco Routers and Switches
- CIO and IT Management
- Project Management, CIO Issues, Strategies that Scale
- Desktops, Laptops & OS
- Windows 2000 Professional, Microsoft Word, Microsoft Excel, Microsoft Access, Windows XP,
- Data Management
- Oracle, SQL Server
- Servers
- Windows NT, Linux NetNote, Windows Server 2003
- Career Development
- Geek Trivia
- Software/Web Development
- Web Development Zone, Visual Basic, .NET





