XPATH vs DOM

12-12-2005

Just another useless benchmark... All started when I was about to write some Java code to process an XML file... I was about to use XPATHApi class... but I noticed in the JavaDoc, they were warning on poor performance when using XPATHApi. Ooops!! Since when I am doing Java, performance is quite often an issue, I decided to try an alternative approach using just DOM. The resulting code was ugly, complex and shameful. But I was glad to believe I at least managed performance...

But it continued to bug me: how much or less perfomant could it be, whould it be better using the low level XPATH API? So I spend some few time on my own to code some samples to check that out... Not a great thing, but hopefully a way to test the different APIs. Both are tested in my MacOS X box, with JRE (1.4.2_09-232), and just using JAXP 1.2 (Is this the right one for J2SE 1.4, isn't it?).

These are my three clasess: XPATHParser.java (just a simple class that uses XPATHApi to count some tag ocurrences: I know all will look dumb, but come on, that is the reason to use XPATH: simple things are simple) DOMParser.java (Tries to implement the same functions just using DOM objects. Some things are still simple, but some other are a little bit more complex. Still, quite simple code) XMLPerfTest.java (Just the class that runs the whole test. It executes both clasess and times the execution of the two). The XML file that I use is not very complex, but it is about 600kb, and quite full of tags. I repeated each execution about 10 times to annoy the garbage collector.

For a very, very dumb case (counting all the ocurrences of a tag in the file) I use either XPATH or Document.getElementsByTagName(). I know there is no reason to use XPATH in such a silly task, but I wanted to see how much overhead is included in the simplest tasks... Results are:

Testing DOM parsing (10 executions)
Execution Time in Milliseconds:2511
Testing XPATH parsing (10 executions)
Execution Time in Milliseconds:3482	
	

Now something more complex: I just wanted to have the tags if they were inside of a certaing tag that fulfilled a specific condition. With XPATH this was very easy!! With DOM I used again Document.getelementsByTagName() and Node.getParentNode() (and recursivity) to descend the tree and to verify the condition. (I believe the approach is quite elegant and shoud be quite fast).

Testing DOM parsing (10 times)
Execution Time in Milliseconds:80791
Testing XPATH parsing (10 times)
Execution Time in Milliseconds:3445
	

What a suprise!! Now my DOM implementation is completely unacceptable, while XPATH continues with a very similar performance. Either my implementation is awful (which so far I do not find so many flaws) or either DOM parsing when it gets complex... makes things go wrong.

My summary will be: XPATH API seems to have a good scalability, it will make things well for complex tasks. Code will be simpler, and there is not such a huge performance gain. Aditionally there is always the chance to use the low-level XPATH API, or the new XPATH API access in the more recent XALAN implementations. (Well, this means I will probably have to redo my initiall code at work, since it is probably dooing poorly).

(I have been reviewing my DOM algorithm and I have see it just does the fewer number of steps to perform it's task. So it does not seem a matter of the algorthim but more on using DOM methogs to navigate on the tree. Bad news for me again... I need to figure out not how to use less DOM).


comment feed - top


·La Rabadilla· ·Laconada· ·iRamos· ·O Vello Corvo· ·Palabros·
counter [CSS 2 Valid!] [XHTML 1.0 Valid!] [Made with Ant] [Get Firefox] RSS 1.0RSS 1.0 Atom 1.0RSS 1.0