Archive for July, 2008

New Mule Performance Benchmark: Yup, we come out on top.

Monday, July 21st, 2008

WSO2 has felt the need over the past few months to make many false claims about Mule’s performance. For instance:

Mule CE 2.0.1 couldn’t handle the cases where we used a concurrency level of 80; while other ESB’s scaled to support to over 2500 concurrent connections. This was after tuning the maximum active thread count to 100 from its default value, which limited Mule to a very few concurrent connections.

I ran their benchmarks. Sure enough, with their configuration, Mule performance was crappy. There were a couple fatal flaws with their benchmark though:

  • It used the stock HTTP transport instead of the Jetty transport which is NIO based. Swtiching fixed concurrency issues.
  • It turns out there is a bug/feature with Linux pre 2.6.17 that requires you to turn on tcpNoDelay switch with Mule. This affects performance on Linux based systems significantly for many of the tests—up to 200-300% differences were noted. In essence this controls whether or not the tcp message is sent before the buffer is full. Because the number of concurrent users is low in a lot of tests, the system is operating far under 100% load. This means it takes longer for a buffer to fill up and hence longer for the message to send.

Results

We released a paper with pretty graphs. Here are the relavent conclusions:

With a proper configuration Mule was able to process many more transactions per second than WSO2′s ESB in all three of their scenarios at almost every load level. Mule was on average 28% faster for [proxying HTTP endpoints], 77% faster for [XPath based] content based routing, and 286% faster for [XSLT] transformations. The only tests where Mule did not exceed WSO2 were with small XML messages and very light loads. Here the difference was less than 2% and is not statistically significant.

My hunch is that we can also beat the proprietary ESB in many scenarios as well if the system is properly tuned.

Content Based Routing

While I was looking into this I decided we might as well not just beat them, but significantly widen the lead. You may remember a while back I announced SXC, an XML parser compiler. It has a streaming XPath engine. Mule now supports it and it out performs anything else out there by a wide margin. For small messages (0.5K), we were up to 25% faster. For medium sized messages (5K) we were 200-300% faster with large loads. Take out all the HTTP overhead and I think we can safely assume that SXC is about 10x faster than anything else.

On the other hand we have AXIOM + Jaxen. Jaxen is fundamentally a DOM based. Even though AXIOM is a “streaming DOM”, Jaxen is very often going to trigger a full load of the document into memory. Not to mention SXC actually compiles the whole XPath expression down to a series of Java functions/statements to the most optimized form possible.

(Surely someone will object and say that SXC doesn’t support all of XPath. Yes, that is true. However, in that case you can just use the Jaxen routing filter and then performance is equal. But rarely do you route on such complicated expressions. If SXC is missing something, file a JIRA and I’ll try to add it.)

In addition to all this goodness, I get the added satisfaction of knowing that to equal our performance WSO2 will have to adopt code that I’ve written (SXC) or write something like it from scratch, which I would consider quite funny.

Metadata on the outside: AtomPub + OSGi with Galaxy

Sunday, July 6th, 2008

Bill de hÓra writes:

At a different layer but with passing similarities – can I suggest that OSGi and Maven port their jar/bundle metadata to Atom?

You can already do the OSGi part with Galaxy.

Step 1: Start Galaxy:

java -jar galaxy-web-standalone-1.0.jar

Step 2: Add a new OSGi bundle to an AtomPub collection [1]:

curl -v –data-binary @slf4j-api-1.5.0.jar -u admin:admin -H “Content-Type: application/octet-stream” -H “X-Artifact-Version: 1.5.1″ -H “Slug: slf4j-api-1.5.0.jar” http://localhost:8080/api/registry/Default%20Workspace

Step 3: Do either:

curl -v -u admin:admin http://localhost:8080/api/registry?\
q=select artifact where jar.osgi.Import-Package.packages = 'org.slf4j.impl'

Or:

curl -v -u admin:admin \

http://localhost:8080/api/registry/Default%20Workspace/slf4j-api-1.5.0.jar;atom

For the former command you receive (snipped for brevity):

<?xml version='1.0' encoding='UTF-8'?>
<feed xmlns="http://www.w3.org/2005/Atom">
...
<entry>
 <link href="/api/registry/test/slf4j-api-1.5.0.jar;atom" rel="edit" />
 <id>urn:galaxy:artifact:70c344d2-5bba-4686-bdd4-6a83432ef8fd</id>
 <title type="text">slf4j-api-1.5.0.jar</title>
 <updated>2008-07-07T22:44:36.959Z</updated>
 <author>
 <name>Galaxy</name>
 </author>
 <summary type="xhtml"></summary>
 <metadata xmlns="http://galaxy.mule.org/1.0">
  ....
  <property name="jar.osgi.Import-Package.packages" locked="true" visible="true">
   <value>org.slf4j.impl</value>
  </property>
  <property name="jar.osgi.Export-Package.packages" locked="true" visible="true">
   <value>org.slf4j</value>
   <value>org.slf4j.spi</value>
   <value>org.slf4j.helpers</value>
  </property>
 </metadata>
 ...
 <content type="application/java-archive" src="/api/registry/test/slf4j-api-1.5.0.jar" />
 <link href="/api/registry/test/slf4j-api-1.5.0.jar" rel="edit-media" />
 </entry>
</feed>

For the second command you receive just the individual entry.

I will also note that this is completely extensible. The OSGi headers are indexed via a simple Groovy script that comes bundled with Galaxy. You can add your own groovy scripts as well.

1. It took me the better part of an hour to figure out that one must use –data-binary not -d with curl. Argh…

Mixing JARs and OSGi Bundles with Maven

Thursday, July 3rd, 2008

Seeming that I’m on an OSGi roll here, I want to elaborate on why Maven and OSGi don’t mix well.

One of the things I like about Maven is that it has created a repository layout to store JARs/applications/etc for folks to use. There is a central repository, but you can also create your own repositories easily as well. Whether its log4j or Mule or Jetty or Spring, I can find it in the Maven repository.

The thing I want to discuss in this entry is how to create an application, with Maven, when dependencies are a mix of OSGi enabled bundles and just regular ol’ JARs. (Things such as how do I create an OSGi bundle from Maven or how to do I create a POM around an existing OSGi bundle are a separate discussion…)

Flawed Approaches

Spring Source went down the route of giving everything a new name in the Maven repository, or more specifically a new groupId which starts with com.springsource. This is a fundamentally flawed model as it breaks everything that Maven does. Now, I have some projects which depend on com.springsource.org.hibernate:hibernate and some which depend on org.hibernate:hibernate. It requires a complete clean room implementation of the Maven repository which is a time consuming and losing battle. Every time somebody adds a project, it has to be added to the SpringSource repository.

Also, what happens as projects add OSGi headers to their jars? Will everything in the repository be instantly switched back to the original groupId? If so its probably going to break all sorts of stuff in consumers’ POMs like dependency exclusions or usage of the assembly plugin. I could go on about how bad this approach is long term…

Next option: What if I used BND to modify JARs inside a repository and adding headers. Now you avoid all the issues around conflicting artifact/group ids, but you open up another can of worms. If the dependency is in your local repository, then you have the issue of having a different JAR from the regular Maven repository. Checksums and signatures: broken.

If you modify a JAR that sts in a public repository, then we have issues. First, who gives you the right to add the OSGi headers to somebody else’s JAR? Its their official distribution not yours. Then, how are you going to resign it? What are you going to do about the fact that some repositories might have the original JAR which you didn’t modify? You could end up with the non-OSGi enabled version of the JAR in your repository quite easily.

You could try to convince everybody to add OSGi headers upstream. I’m not waiting around for that to happen though.

A Way Forward?

I see a couple approaches which can be utilized.

First, you can use the mixed approach that David Savage advocates in a comment on my blog:

If you are building some new OSGi application that depends on classes from a plain old jar and those classes never need to be passed between OSGi code in different bundles. Then in this case you can embed the jar containing the classes directly into your bundle and add it to the local classpath of your bundle via the Bundle-Classpath header.

This gives you a deployable unit of code that automatically contains all code needed to function. Compared to a standard J2SE approach where you need to suppliment the local classpath with all the dependencies of a given jar.

The biggest problem with this is that you end up not gaining any of the strongly touted benefits of OSGi. If I’m going to do this, why even bother with OSGi?

The second approach that I can see is to use different version names. Instead of 2.0, you can use 2.0-osgi. This allows you to get around some of the disadvantages that I outlined above as its clear that this isn’t the original version of the JAR, its a new one. It also gets around the SpringSource problems as switching to an OSGi versio of the jar is as simple as adding the dependency (with the updated version) to your <dependencyManagement> section of the POM. Even better, as projects decide to add OSGi headers, you can remove these dependency listings from your <dependencyManagement> section, allowing your POM to shrink and become more manageable as time goes on.

(Along these lines, I wouldn’t mind seeing a Maven repository which could dynamically create bundles via a wiki like fashion. I as a user could submit an updated set of MANIFEST headers. Then the proxy would dynamically download the original dependency, add the headers, change the version and provide it to me as a user.)

Another approach that I wouldn’t mind seeing (and this is more long term) is to equip OSGi containers to be able to retrieve OSGi metadata to supplement JARs from a server. On a final note, I’l leave you with a tool that popped up on Steve’s blog to be able to do this – PAX URL wrap:

Pax URL Wrap is an OSGi URL handler that can process your legacy jar at runtime and transform it into an OSGi bundle.

Cool stuff. I have yet to try it out though.

OSGi is a PITA: Bundles

Wednesday, July 2nd, 2008

I feel the need to rant about OSGi for a bit. Too many people are hailing it as the solution to everything and anything, and it can be a serious PITA sometimes.

Today’s topic (one of many that are on my mind) is bundles.

Bundles are a PITA.

The recommended OSGi practice is that you turn every JAR into a “bundle” so that it is OSGi compliant. I have to echo what Steve Loughran said this week about this: tool choices should not be transitive [1]. I shouldn’t have to force every upstream project to get on the OSGi bandwagon. A JAR should not have to be OSGi compliant, a JAR is something that somebody gives to me and I should not have to changed. Ideally it is signed so I wouldn’t even want to change it.

Requiring the OSGi headers in a bundle manifest also breaks the way we work with many tools [2]. Take Maven or Ivy for instance. They download JARs for my application from a common public repository. What happens when a JAR is not OSGi enabled? You end up having to do crazy things like having your own version of a Maven repository.

So why should I have to modify a JAR to be able to deploy things in an OSGi container? Or why should a softare provider have to supply this information upstream? It seems to me that a better way to go about this would be to have an OSGi supplemental metadata repository. Given JAR X we could download the missing OSGi information. An OSGi container could just assemble these two things on the fly.

Am I missing something here? Why would people develop such an invasive model?

1. While I did not investigate in depth, I do have to slightly disagree with the stance taken on his blog. ODE should make it easier for Maven users and distribute the transitive dependency information. But, that doesn’t mean Maven has to be the build system for ODE. Just means they have to distribute POMs.

2. Don’t give me crap about how OSGi has been around longer or how we should just ensure that everyone has OSGi headers. We don’t live in an OSGi only world.

Open Repository API

Tuesday, July 1st, 2008

There have been several calls for an open repository API based on AtomPub over the past few months, starting with Anne Thomas Mannes and the latest being Glen Daniels’.

I think now is probably the time. From a personal point of view, I’ve been holding off until we got Galaxy 1.0 out the door so we could learn more about how it should be done and the scope of it.

This is all of course my own personal opinion at this point and is likely to change, but I outline it here so other people can have an idea of what I’m thinking.

Scope

  • Should be built on AtomPub
  • Define a way to deal with hierachical collections of versioned resources. That is, we need a standard way to model folders and files in AtomPub. This would be of benefit to a lot of people.
  • Define a standard way to query for resources. I’m not sure how much specifying will need to be done here, it should be based on OpenSearch and AtomPub. A good use case is being able to search for a WSDL from an IDE and generate a client for it. Another use case might be to be able to download an application (= set of resources) using a query and start it.
  • This may not need to be a spec like AtomPub is a spec. Since we will be using a lot of standard tools, this may be as simple as saying “yes we’re ALL using these things (AtomPub, OpenSearch, etc) in the same manner and here’s how the big picture comes together”

Not in scope

Or at least a separate optional specification:

  • Dealing with metadata about artifacts (although I’m tempted to roll this into the above)
  • Lifecycle management
  • Dependency management

Participants

Besides WSO2 who I’ve talked with about this before, I would think the following open source projects may be interested as well:

  • JBoss Drools BRMS team
  • Sonatype’s Nexus team – they’ve expressed interest in having an AtomPub API to browse their repository/metadata
  • IDE developers (NetBeans, Eclipse) – we definitely could use some IDE plugins.
  • Anyone else who wants to have a say

Next Steps

We should probably set up a mailing list and wiki. Maybe at the Codehaus? We can then figure out where this thing goes from there.

Mule Galaxy 1.0

Tuesday, July 1st, 2008

I’m happy to announce that Mule Galaxy 1.0 is out!

The Mule Galaxy team is pleased to announce the availability of the 1.0 final release. Mule Galaxy is an open source governance platform with an integrated registry/repository. It includes versioning, lifecycle management, dependency management and policy enforcement features which enable you to effectively govern your applications and services. Support for a wide range of products is included, including support for Mule, Apache CXF, Spring and Maven. Custom integration may also be written via Galaxy’s simple Atom Publishing Protocol HTTP API.

Enterprise Version

You may also notice that we now have an enterprise version*. There are some features which may or may not be interesting to you:

  • Out of the box clustering support. We’ve built some tools to make it easy to set up Galaxy in a cluster.
  • Free text search with support for MS Office documents. We developed this feature for a customer who writes their service documentation in Word documents. They wanted the ability to associate the docs with services/applications so people could easily find services and figure out how to use them.
  • Easy to UIs for defining new artifact types and indexes. (This can be done in community as well, but it requires you doing it with an XML descriptor)

* Some people may want you to think that because we have a commercial version that we are not committed to open source and are truly evil. Yes, we do need to make some money and cannot give away everything for free. I don’t think this is evil though. We develop Galaxy Community in the open. And certain features will probably migrate to the open source version as we move along. If you have feedback on what we should be doing, please drop us a line as we’re happy to listen.

How are you using Reg/Reps?

As we move into planning our next releases, we’d definitely like to hear how you’re using Galaxy (or any one of our competitor’s products). If you have features which you find useful or features which you really need, we’d like to know.