Chapter 183: XML Parsing using the JAXP APIs Remarks XML Parsing is the interpretation of XML documents in order to manipulate their content using sensible constructs, be they \"nodes\", \"attributes\", \"documents\", \"namespaces\", or events related to these constructs. Java has a native API for XML document handling, called JAXP, or Java API for XML Processing. JAXP and a reference implementation has been bundled with every Java release since Java 1.4 (JAXP v1.1) and has evolved since. Java 8 shipped with JAXP version 1.6. The API provides different ways of interacting with XML documents, which are : • The DOM interface (Document Object Model) • The SAX interface (Simple API for XML) • The StAX interface (Streaming API for XML) Principles of the DOM interface The DOM interface aims to provide a W3C DOM compliant way of interpreting XML. Various versions of JAXP have supported various DOM Levels of specification (up to level 3). Under the Document Object Model interface, an XML document is represented as a tree, starting with the \"Document Element\". The base type of the API is the Node type, it allows to navigate from a Node to its parent, its children, or its siblings (although, not all Nodes can have children, for example, Text nodes are final in the tree, and never have childre). XML tags are represented as Elements, which notably extend the Node with attribute-related methods. The DOM interface is very usefull since it allows a \"one line\" parsing of XML documents as trees, and allows easy modification of the constructed tree (node addition, suppression, copying, ...), and finally its serialization (back to disk) post modifications. This comes at a price, though : the tree resides in memory, therefore, DOM trees are not always practical for huge XML documents. Furthermore, the construction of the tree is not always the fastest way of dealing with XML content, especially if one is not interested in all parts of the XML document. Principles of the SAX interface The SAX API is an event-oriented API to deal with XML documents. Under this model, the components of an XML documents are interpreted as events (e.g. \"a tag has been opened\", \"a tag has been closed\", \"a text node has been encountered\", \"a comment has been encountered\")... 1125

The SAX API uses a \"push parsing\" approach, where a SAX Parser is responsible for interpreting the XML document, and invokes methods on a delegate (a ContentHandler) to deal with whatever event is found in the XML document. Usually, one never writes a parser, but one provides a handler to gather all needed informations from the XML document. The SAX interface overcomes the DOM interface's limitations by keeping only the minimum necessary data at the parser level (e.g. namespaces contexts, validation state), therefore, only informations that are kept by the ContentHandler - for which you, the developer, is responsible - are held into memory. The tradeoff is that there is no way of \"going back in time/the XML document\" with such an approach : while DOM allows a Node to go back to its parent, there is no such possibility in SAX. Principles of the StAX interface The StAX API takes a similar approach to processing XML as the SAX API (that is, event driven), the only very significative difference being that StAX is a pull parser (where SAX was a push parser). In SAX, the Parser is in control, and uses callbacks on the ContentHandler. In Stax, you call the parser, and control when/if you want to obtain the next XML \"event\". The API starts with XMLStreamReader (or XMLEventReader), which are the gateways through which the developer can ask nextEvent(), in an iterator-style way. Examples Parsing and navigating a document using the DOM API Considering the following document : <?xml version='1.0' encoding='UTF-8' ?> <library> <book id='1'>Effective Java</book> <book id='2'>Java Concurrency In Practice</book> </library> One can use the following code to build a DOM tree out of a String : import org.w3c.dom.Document; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.xml.sax.InputSource; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import; public class DOMDemo { public static void main(String[] args) throws Exception { String xmlDocument = \"<?xml version='1.0' encoding='UTF-8' ?>\" 1126

+ \"<library>\" + \"<book id='1'>Effective Java</book>\" + \"<book id='2'>Java Concurrency In Practice</book>\" + \"</library>\"; DocumentBuilderFactory documentBuilderFactory = DocumentBuilderFactory.newInstance(); // This is useless here, because the XML does not have namespaces, but this option is usefull to know in cas documentBuilderFactory.setNamespaceAware(true); DocumentBuilder documentBuilder = documentBuilderFactory.newDocumentBuilder(); // There are various options here, to read from an InputStream, from a file, ... Document document = documentBuilder.parse(new InputSource(new StringReader(xmlDocument))); // Root of the document System.out.println(\"Root of the XML Document: \" + document.getDocumentElement().getLocalName()); // Iterate the contents NodeList firstLevelChildren = document.getDocumentElement().getChildNodes(); for (int i = 0; i < firstLevelChildren.getLength(); i++) { Node item = firstLevelChildren.item(i); System.out.println(\"First level child found, XML tag name is: \" + item.getLocalName()); System.out.println(\"\\tid attribute of this tag is : \" + item.getAttributes().getNamedItem(\"id\").getTextContent()); } // Another way would have been NodeList allBooks = document.getDocumentElement().getElementsByTagName(\"book\"); } } The code yields the following : Root of the XML Document: library First level child found, XML tag name is: book id attribute of this tag is : 1 First level child found, XML tag name is: book id attribute of this tag is : 2 Parsing a document using the StAX API Considering the following document : <?xml version='1.0' encoding='UTF-8' ?> <library> <book id='1'>Effective Java</book> <book id='2'>Java Concurrency In Practice</book> <notABook id='3'>This is not a book element</notABook> </library> One can use the following code to parse it and build a map of book titles by book id. import; import; import; 1127

import; import java.util.HashMap; import java.util.Map; public class StaxDemo { public static void main(String[] args) throws Exception { String xmlDocument = \"<?xml version='1.0' encoding='UTF-8' ?>\" + \"<library>\" + \"<book id='1'>Effective Java</book>\" + \"<book id='2'>Java Concurrency In Practice</book>\" + \"<notABook id='3'>This is not a book element </notABook>\" + \"</library>\"; XMLInputFactory xmlInputFactory = XMLInputFactory.newFactory(); // Various flavors are possible, e.g. from an InputStream, a Source, ... XMLStreamReader xmlStreamReader = xmlInputFactory.createXMLStreamReader(new StringReader(xmlDocument)); Map<Integer, String> bookTitlesById = new HashMap<>(); // We go through each event using a loop while (xmlStreamReader.hasNext()) { switch (xmlStreamReader.getEventType()) { case XMLStreamConstants.START_ELEMENT: System.out.println(\"Found start of element: \" + xmlStreamReader.getLocalName()); // Check if we are at the start of a <book> element if (\"book\".equals(xmlStreamReader.getLocalName())) { int bookId = Integer.parseInt(xmlStreamReader.getAttributeValue(\"\", \"id\")); String bookTitle = xmlStreamReader.getElementText(); bookTitlesById.put(bookId, bookTitle); } break; // A bunch of other things are possible : comments, processing instructions, Whitespace... default: break; }; } System.out.println(bookTitlesById); } This outputs : Found start of element: library Found start of element: book Found start of element: book Found start of element: notABook {1=Effective Java, 2=Java Concurrency In Practice} In this sample, one must be carreful of a few things : 1. THe use of xmlStreamReader.getAttributeValue works because we have checked first that the parser is in the START_ELEMENT state. In evey other states (except ATTRIBUTES), the parser is 1128

mandated to throw IllegalStateException, because attributes can only appear at the beginning of elements. 2. same goes for xmlStreamReader.getTextContent(), it works because we are at a START_ELEMENT and we know in this document that the <book> element has no non-text child nodes. For more complex documents parsing (deeper, nested elements, ...), it is a good practice to \"delegate\" the parser to sub-methods or other objets, e.g. have a BookParser class or method, and have it deal with every element from the START_ELEMENT to the END_ELEMENT of the book XML tag. One can also use a Stack object to keep around important datas up and down the tree. Read XML Parsing using the JAXP APIs online: using-the-jaxp-apis 1129

Chapter 184: XML XPath Evaluation Remarks XPath expressions are used to navigate and select one or more nodes within an XML tree document, such as selecting a certain element or attribute node. See this W3C recommendation for a reference on this language. Examples Evaluating a NodeList in an XML document Given the following XML document: <documentation> <tags> <tag name=\"Java\"> <topic name=\"Regular expressions\"> <example>Matching groups</example> <example>Escaping metacharacters</example> </topic> <topic name=\"Arrays\"> <example>Looping over arrays</example> <example>Converting an array to a list</example> </topic> </tag> <tag name=\"Android\"> <topic name=\"Building Android projects\"> <example>Building an Android application using Gradle</example> <example>Building an Android application using Maven</example> </topic> <topic name=\"Layout resources\"> <example>Including layout resources</example> <example>Supporting multiple device screens</example> </topic> </tag> </tags> </documentation> The following retrieves all example nodes for the Java tag (Use this method if only evaluating XPath in the XML once. See other example for when multiple XPath calls are evaluated in the same XML file.): XPathFactory xPathFactory = XPathFactory.newInstance(); XPath xPath = xPathFactory.newXPath(); //Make new XPath InputSource inputSource = new InputSource(\"path/to/xml.xml\"); //Specify XML file path NodeList javaExampleNodes = (NodeList) xPath.evaluate(\"/documentation/tags/tag[@name='Java']//example\", inputSource, XPathConstants.NODESET); //Evaluate the XPath ... 1130

Parsing multiple XPath Expressions in a single XML Using the same example as Evaluating a NodeList in an XML document, here is how you would make multiple XPath calls efficiently: Given the following XML document: <documentation> <tags> <tag name=\"Java\"> <topic name=\"Regular expressions\"> <example>Matching groups</example> <example>Escaping metacharacters</example> </topic> <topic name=\"Arrays\"> <example>Looping over arrays</example> <example>Converting an array to a list</example> </topic> </tag> <tag name=\"Android\"> <topic name=\"Building Android projects\"> <example>Building an Android application using Gradle</example> <example>Building an Android application using Maven</example> </topic> <topic name=\"Layout resources\"> <example>Including layout resources</example> <example>Supporting multiple device screens</example> </topic> </tag> </tags> </documentation> This is how you would use XPath to evaluate multiple expressions in one document: XPath xPath = XPathFactory.newInstance().newXPath(); //Make new XPath DocumentBuilder builder = DocumentBuilderFactory.newInstance(); Document doc = builder.parse(new File(\"path/to/xml.xml\")); //Specify XML file path NodeList javaExampleNodes = (NodeList) xPath.evaluate(\"/documentation/tags/tag[@name='Java']//example\", doc, XPathConstants.NODESET); //Evaluate the XPath xPath.reset(); //Resets the xPath so it can be used again NodeList androidExampleNodes = (NodeList) xPath.evaluate(\"/documentation/tags/tag[@name='Android']//example\", doc, XPathConstants.NODESET); //Evaluate the XPath ... Parsing single XPath Expression multiple times in an XML In this case, you want to have the expression compiled before the evaluations, so that each call to evaluate does not compile the same expression. The simple syntax would be: XPath xPath = XPathFactory.newInstance().newXPath(); //Make new XPath XPathExpression exp = xPath.compile(\"/documentation/tags/tag[@name='Java']//example\"); 1131

DocumentBuilder builder = DocumentBuilderFactory.newInstance(); Document doc = builder.parse(new File(\"path/to/xml.xml\")); //Specify XML file path NodeList javaExampleNodes = (NodeList) exp.evaluate(doc, XPathConstants.NODESET); //Evaluate the XPath from the already-compiled expression NodeList javaExampleNodes2 = (NodeList) exp.evaluate(doc, XPathConstants.NODESET); //Do it again Overall, two calls to XPathExpression.evaluate() will be much more efficient than two calls to XPath.evaluate(). Read XML XPath Evaluation online: 1132

Chapter 185: XOM - XML Object Model Examples Reading a XML file In order to load the XML data with XOM you will need to make a Builder from which you can build it into a Document. Builder builder = new Builder(); Document doc =; To get the root element, the highest parent in the xml file, you need to use the getRootElement() on the Document instance. Element root = doc.getRootElement(); Now the Element class has a lot of handy methods that make reading xml really easy. Some of the most useful are listed below: • getChildElements(String name) - returns an Elements instance that acts as an array of elements • getFirstChildElement(String name) - returns the first child element with that tag. • getValue() - returns the value inside the element. • getAttributeValue(String name) - returns the value of an attribute with the specified name. When you call the getChildElements() you get a Elements instance. From this you can loop through and call the get(int index) method on it to retrieve all the elements inside. Elements colors = root.getChildElements(\"color\"); for (int q = 0; q < colors.size(); q++){ Element color = colors.get(q); } Example: Here is an example of reading an XML File: XML File: 1133

Code for reading and printing it: 1134 import; import; import nu.xom.Builder; import nu.xom.Document; import nu.xom.Element; import nu.xom.Elements; import nu.xom.ParsingException; public class XMLReader { public static void main(String[] args) throws ParsingException, IOException{ File file = new File(\"insert path here\"); // builder builds xml data Builder builder = new Builder(); Document doc =; // get the root element <example> Element root = doc.getRootElement(); // gets all element with tag <person> Elements people = root.getChildElements(\"person\"); for (int q = 0; q < people.size(); q++){ // get the current person element Element person = people.get(q); // get the name element and its children: first and last Element nameElement = person.getFirstChildElement(\"name\"); Element firstNameElement = nameElement.getFirstChildElement(\"first\"); Element lastNameElement = nameElement.getFirstChildElement(\"last\"); // get the age element

Element ageElement = person.getFirstChildElement(\"age\"); // get the favorite color element Element favColorElement = person.getFirstChildElement(\"fav_color\"); String fName, lName, ageUnit, favColor; int age; try { fName = firstNameElement.getValue(); lName = lastNameElement.getValue(); age = Integer.parseInt(ageElement.getValue()); ageUnit = ageElement.getAttributeValue(\"unit\"); favColor = favColorElement.getValue(); System.out.println(\"Name: \" + lName + \", \" + fName); System.out.println(\"Age: \" + age + \" (\" + ageUnit + \")\"); System.out.println(\"Favorite Color: \" + favColor); System.out.println(\"----------------\"); } catch (NullPointerException ex){ ex.printStackTrace(); } catch (NumberFormatException ex){ ex.printStackTrace(); } } } } This will print out in the console: Name: Smith, Dan Age: 23 (years) Favorite Color: green ---------------- Name: Autry, Bob Age: 3 (months) Favorite Color: N/A ---------------- Writing to a XML File Writing to a XML File using XOM is very similar to reading it except in this case we are making the instances instead of retrieving them off the root. To make a new Element use the constructor Element(String name). You will want to make a root element so that you can easily add it to a Document. Element root = new Element(\"root\"); The Element class has some handy methods for editing elements. They are listed below: • appendChild(String name) - this will basically set the value of the element to name. • appendChild(Node node) - this will make node the elements parent. (Elements are nodes so you 1135

can parse elements). • addAttribute(Attribute attribute) - will add an attribute to the element. The Attribute class has a couple of different constructors. The simplest one is Attribute(String name, String value). Once you have all of your elements add to your root element you can turn it into a Document. Document will take a Element as an argument in it's constructor. You can use a Serializer to write your XML to a file. You will need to make a new output stream to parse in the constructor of Serializer. FileOutputStream fileOutputStream = new FileOutputStream(file); Serializer serializer = new Serializer(fileOutputStream, \"UTF-8\"); serializer.setIndent(4); serializer.write(doc); Example Code: import; import; import; import; import nu.xom.Attribute; import nu.xom.Builder; import nu.xom.Document; import nu.xom.Element; import nu.xom.Elements; import nu.xom.ParsingException; import nu.xom.Serializer; public class XMLWriter{ public static void main(String[] args) throws UnsupportedEncodingException, IOException{ // root element <example> Element root = new Element(\"example\"); // make a array of people to store Person[] people = {new Person(\"Smith\", \"Dan\", \"years\", \"green\", 23), new Person(\"Autry\", \"Bob\", \"months\", \"N/A\", 3)}; // add all the people for (Person person : people){ // make the main person element <person> Element personElement = new Element(\"person\"); // make the name element and it's children: first and last Element nameElement = new Element(\"name\"); Element firstNameElement = new Element(\"first\"); Element lastNameElement = new Element(\"last\"); 1136

// make age element Element ageElement = new Element(\"age\"); // make favorite color element Element favColorElement = new Element(\"fav_color\"); // add value to names firstNameElement.appendChild(person.getFirstName()); lastNameElement.appendChild(person.getLastName()); // add names to name nameElement.appendChild(firstNameElement); nameElement.appendChild(lastNameElement); // add value to age ageElement.appendChild(String.valueOf(person.getAge())); // add unit attribute to age ageElement.addAttribute(new Attribute(\"unit\", person.getAgeUnit())); // add value to favColor favColorElement.appendChild(person.getFavoriteColor()); // add all contents to person personElement.appendChild(nameElement); personElement.appendChild(ageElement); personElement.appendChild(favColorElement); // add person to root root.appendChild(personElement); } // create doc off of root Document doc = new Document(root); // the file it will be stored in File file = new File(\"out.xml\"); if (!file.exists()){ file.createNewFile(); } // get a file output stream ready FileOutputStream fileOutputStream = new FileOutputStream(file); // use the serializer class to write it all Serializer serializer = new Serializer(fileOutputStream, \"UTF-8\"); serializer.setIndent(4); serializer.write(doc); } private static class Person { private String lName, fName, ageUnit, favColor; private int age; public Person(String lName, String fName, String ageUnit, String favColor, int age){ this.lName = lName; this.fName = fName; this.age = age; this.ageUnit = ageUnit; this.favColor = favColor; 1137

} public String getLastName() { return lName; } public String getFirstName() { return fName; } public String getAgeUnit() { return ageUnit; } public String getFavoriteColor() { return favColor; } public int getAge() { return age; } } } This will be the contents of \"out.xml\": Read XOM - XML Object Model online: model 1138

