Question regarding XML Parsing
I've recently received an assignment to parse an XML file using the SAX Parser.
I've already sub-classed the DefaultHandler class to take control of the startElement etc. methods and it works fine.
My only question is, what is the general practice used to parse XML files?
i.e:
- is it normal to simply develop the parser to match the contents of the XML file (XML structure known), as in; if the file contains 4 elements of an invoice, I take the 4 elements and pass them to the correct parameters etc.
OR
- Should it be developed so that it works with whatever XML file I throw at it where the actual structure isn't known?
Apologies if it's possibly a rather a dopey question, just I wouldn't want to feel like my code was "cheating" if I did the first way, if the second is the way to go about it.
(Please bare in mind it is an Assignment and nothing proffessional).
Kind regards, Newbie.
Re: Question regarding XML Parsing
Working with whatever sort of xml you throw at it is in part the job of the xml parser - it's independent of the contents of the xml. So I guess the answer depends upon what you want to do with the contents of the xml in the end: you can readily generalize code to save the data in another format, but (for example) generalizing GUI code to display any xml data could potentially get complicated, and typically there is a specific goal in mind where generalization is not the point to meet project goals and deadlines. My .02
Re: Question regarding XML Parsing
When using a SAX parser, the parser will step through the XML, calling back to your 'start' and 'end' parsing methods for the start and end of each element in the XML input. In those methods, you check which element has been passed to you (usually by tag name) and take appropriate action. So although the parser is handling the XML in a generic way, your parsing methods are specific to the XML definition you're using. If the XML is structured hierarchically (nested elements & lists, etc.), you often need to keep track of the current state between callbacks, which takes some care.
The alternative DOM (Document Object Model) parser reads all the XML input in one go, and gives you random access to any element, which makes it easier to use, but it takes time and needs enough memory to read the whole XML document.