SXC - Simple XML Compiler: JAXB runtime, streaming XPath implementation, and more
March 16th, 2007Last year I started a project which is called SXC, or simple XML compiler. This project has gone through a couple revisions (and name changes), but I think its finally ready to talk about. The core of SXC is an API which allows you to declaritively say what type of XML you’re expecting to be parsing or writing. You can then attach actions to perform when you encounter/write that xml. For instance, you can say I’m expecting the “customer” element and I want to do “new Customer()” when I get there. It will then compile an optimized XML parser for you.
SXC includes 3 frontends in addition to its core APIs, and these are probably what you really care about:
- JAXB: SXC uses the JAXB annotations to drive compilation of an optimized parser/writer for you at runtime. Why would you do this? It is 2-3x faster on the reading side, and about 20-90% faster on the writing side of things. (See below for more important details). We do not claim JAXB compliance yet, but we support most common JAXB functionality. It has been tested with both CXF and XFire. Note: we aren’t trying to replace the JAXB RI here, we’re augmenting it. You still need to use XJC to compile your schemas and keep the RI around at runtime.
- XPath: SXC can compile down XPath expressions and build an optimized streaming xpath parser for them. Through an XPath event API, you can listen for multiple events as you scan a document. Currently we only support a vey limited subset of XPath expressions, but my hope is it will grow significantly as time goes on. On a VERY ROUGH test (did I mention this was very rough?) it performed about 100x faster than Jaxen for repeated evaluations on a DOM with expressions like “/foo/bar[@bleh='biz']/baz[text()]“.
- Drools: Now that we have a streaming XPath parser, whats the next logical step? Efficient content based message routing using rules! Through Drools and SXC you can create a very efficient XPath based message router. Rules will fire as your XPath criteria are met and you can take action. I’m sure there are many other uses for Drools+XPath, but this is one at the forefront of my mind.
Going Deeper…
Maybe that is a lot to understand, well how about some code to help? Check out the parser example on the website. Here is a quick outline of how building a parser works:
- Create a SXC ElementParserBuilder
- Call: elementParserBuilder.expectElement(new QName(”customer”));
- Tell SXC to create a new Customer object and return it. This happens using the excellent Sun CodeModel API which allows you to declaritivley create Java files.
- Call builder.compile(). This will generate Java files and compile them in the background at runtime. It will then pass you a Context class.
- Use the Context class to get a Reader optimized for your XML needs
SXC opens up a realm of possibilities with respect to databinding. One of the cool things I like about SXC is it makes it really easy to try different ways to databinding. The JAXB related code is a total of 2500 lines. Pretty much any databinding toolkit could be rewritten to use it too.
Another cool thing, which is slated for a future release, is inline validation. If you think about how much performance is lost when validating navigating a DOM, changing strings to ints over again, etc, when doing validation, it is a lot. We should be able to inline most, if not all validation, right into the parser. For instance, if you were limiting a value to between 1 and 10, we could simply add an if statement - if (num < 1 || num > 10) throw ValidationException(). Of course it gets a lot trickier than that thanks to the lovely specification known as XML Schema (*cough*), but the idea is there and with some work it should be doable.
JAXB Performance
I’ve prepared a few initial benchmarks: read performance and write performance.
As with all benchmarks, you may want to keep several things in mind:
- Depending on your own document types, performance will vary. If you have a lot of textual content (as opposed to small textual values), you are likely to benefit less from SXC.
- This was done on my 2GHz T60 Core Duo laptop. The FSB is 667 MHz. With a faster bus my suspsicion is that SXC performance would probably improve.
- The JAXB RI does have an output mode that is faster - if you write directly to an OutputStream with UTF-8. I haven’t done benchmarks to compare it yet to SXC though.
Future
There are many things I’d like to look at for the future revisions:
- Simplifying and documenting the SXC APIs
- Passing the JAXB TCK
- Better XPath support
- XSLT support?
- Inline schema validation
If any of this sounds up your ally, or you have other ideas for SXC, be sure to join the mailing list or contact me at dan AT netzooid dot com.
Enjoy and report back on your experiences!
June 19th, 2007 at 4:56 pm
Thanks for developing SAX.
We want to develop a customizeable application where user can add/remove more customize fields and tables from Configuration User Interface. Also able to define rules for fields when business logic changes.
Can we rely on SXC for our application ?
Thanks and Regards
Alauddin
July 2nd, 2007 at 1:40 pm
hi dan,
The xpath evaluation is based on xml stream api or still based on the DOM? is there any project out there which use xpath with the xml stream api?
July 3rd, 2007 at 5:17 am
Hiya - its based on StAX not DOM. It is fully streaming. It doesn’t support xpath expressions which require you to cache the document in memory (i.e. something like //a/../b) at the moment.