Sunday, October 29, 2017

XML Parsers

What is XML Parsing?

XML Parsing refers to going through an XML document in order to access or modify data.
What is XML Parser?
XML Parser provides a way to access or modify data in an XML document. Java provides multiple options to parse XML documents. Following are the various types of parsers which are commonly used to parse XML documents.
·        Dom Parser − Parses an XML document by loading the complete contents of the document and creating its complete hierarchical tree in memory.
·        SAX Parser − Parses an XML document on event-based triggers. Does not load the complete document into the memory.
·        JDOM Parser − Parses an XML document in a similar fashion to DOM parser but in an easier way.
·        StAX Parser − Parses an XML document in a similar fashion to SAX parser but in a more efficient way.
·        XPath Parser − Parses an XML document based on expression and is used extensively in conjunction with XSLT.
·        DOM4J Parser − A java library to parse XML, XPath, and XSLT using Java Collections Framework. It provides support for DOM, SAX, and JAXP.
There are JAXB and XSLT APIs available to handle XML parsing in object-oriented way

Java DOM Parser
DOM parser parses the entire XML document and loads it into memory; then models it in a “TREE” structure for easy traversal or manipulation.
DOM Parser has a tree based structure.

//Get the DOM Builder Factory
DocumentBuilderFactory factory =DocumentBuilderFactory.newInstance();

//Get the DOM Builder
DocumentBuilder builder = factory.newDocumentBuilder();

 //Load and Parse the XML document
//document contains the complete XML as a Tree.
Document document =builder.parse(ClassLoader.getSystemResourceAsStream("xml/employee.xml"));

Advantages

1) It supports both read and write operations and the API is very simple to use.
2) It is preferred when random access to widely separated parts of a document is required.

Disadvantages

1) It is memory inefficient. (Consumes more memory because the whole XML document needs to loaded into memory).
2) It is comparatively slower than other parsers.

SAX (Simple API for XML)

SAX Parser is different from the DOM Parser where SAX parser doesn’t load the complete XML into the memory, instead it parses the XML line by line triggering different events as and when it encounters different elements like: opening tag, closing tag, character data, comments and so on. This is the reason why SAX Parser is called an event based parser.


SAXParserFactory parserFactor = SAXParserFactory.newInstance();
SAXParser parser = parserFactor.newSAXParser();
SAXHandler handler = new SAXHandler();
parser.parse(ClassLoader.getSystemResourceAsStream("xml/employee.xml"), handler);

Features of SAX Parser

It does not create any internal structure.
Clients does not know what methods to call, they just overrides the methods of the API and place his own code inside method.
It is an event based parser, it works like an event handler in Java.

Advantages

1) It is simple and memory efficient.
2) It is very fast and works for huge documents.

Disadvantages

1) It is event-based so its API is less intuitive.
2) Clients never know the full information because the data is broken into pieces.






Java JDOM Parser
JDOM is an open source, Java-based library to parse XML documents. It is typically a Java developer friendly API. It is Java optimized and it uses Java collections like List and Arrays.
JDOM works with DOM and SAX APIs and combines the best of the two. It is of low memory footprint and is nearly as fast as SAX.

StAX Parser

StAX stands for Streaming API for XML and StAX Parser is different from DOM in the same way SAX Parser is. StAX parser is also in a subtle way different from SAX parser.
·         The SAX Parser pushes the data but StAX parser pulls the required data from the XML.
·         The StAX parser maintains a cursor at the current position in the document allows to extract the content available at the cursor whereas SAX parser issues events as and when certain data is encountered.
XMLInputFactory and XMLStreamReader are the two class which can be used to load an XML file. And as we read through the XML file using XMLStreamReader, events are generated in the form of integer values and these are then compared with the constants in XMLStreamConstants. The below code shows how to parse XML using StAX parser:

  XMLInputFactory factory = XMLInputFactory.newInstance();
XMLStreamReader reader =actory.createXMLStreamReader(ClassLoader.getSystemResourceAsStream("xml/employee.xml"));



JAXB (Java Architecture for XML binding)

JAXB so called java architecture for XML binding is an efficient technology to convert XML to and from Java Object. JAXB is mostly used to create java classes from XML in Java Web Services. In Java JAXBprovides two general purpose implementation.

Marshalling – It Converts a Java object into XML.
Unmarshalling – It Converts XML into a Java Object.

Again 
JAXB is a part of JDK , we don’t  need to download or add anything extra to start. But if you are using a version less than JDK5.0 you need to add two libraries named ‘jaxb-api.jar’ and ‘jaxb-impl.jar’ to the classpath.

What is XPATH xml parsing

XPATH is a more advanced technique to parse and extract required data from XML. XPATH provides an extensive support for query based extraction system to get more accurate data from XML documents.

XPATH is similar to SQL in context of query, it provides a powerful set of expressions to parse and extract data from xml. Java provides full support for XPATH implementation, all classes required by XPATH can be found under ‘javx.xml.xpath.*’ package. We don’t need to download or add anything else.



No comments:

Post a Comment