What are the ways of parsing XML document?
The XML parser sits in between the XML document and the application who want to use the XML document. The Parser exposes a set of well defined interfaces which can be used by the application for adding, deleting, and modifying the XML document contents. Now whatever interfaces XML parser exposes must be standard or else that would lead to different Vendors preparing their own custom way of interacting with the XML document.
There are 2 standard specifications which are very common and should be followed by a XML parser:-
DOM: - Document Object Model.
The DOM is a W3C recommended way for treating XML documents. In DOM we load the whole XML document into memory and allow us to manipulate the structure and data of XML document.
SAX: - Simple API for XML.
The SAX is an event driven way for processing XML documents. In the DOM we load the entire XML document into memory and then application manipulates the XML document. But this is not always the best way to process large XML documents which have huge data elements. For illustrate you only want one element from the entire XML document or you only want to see if the XML is proper that means loading the whole XML in memory will be quiet resource intensive. The SAX parsers parse the XML document sequentially and emit events such as start and end of the document, elements, text content etc. Therefore the applications who are interested in processing these events can register implementations of callback interfaces. The SAX parser then only sends those event messages which the application has demanded.
Figure: - DOM Parser loading XML document
The above is a pictorial representation of how a DOM parser works. The application queries the DOM Parser for the "quantity" field. The DOM parser loads the complete XML file into the memory.
Figure: - Returning the Quantity value back to application
The DOM parser then picks up the "quantity" tag from the memory loaded XML file and then returns back to the application.
Figure: - SAX parser in action
The SAX parser does not load the entire DOM into memory but has event based approach. The SAX parser while parsing the XML file emits events. For e.g. in the above figure it has emitted Invoice tag start event, Quantity tag event , Amount Tag event, and Invoice end tag event. But our application software is only interested in the quantity value. Hence the application has to register to the SAX parser saying that he is only interested in the quantity field and not any other field or element of the XML document. Depending on what interest the application software has SAX parser only sends those events to the application, the rest of events is suppressed. For illustrate in the above figure only quantity tag event is sent to the application software and the rest of the events are suppressed.