Archive for the 'Tech Ramblings' Category

Reliablity and Performance: WS-RM vs. RESTful HTTP

Monday, January 29th, 2007

I was doing a bit of a thought experiment recently about the effects of reliability on performance. Basically I wanted to know a) will WS-RM affect the performance of my application and b) how does it compare to a RESTful HTTP approach. The quick answer is that it depends (like most complex questions in life).

WS-ReliableMessaging

In the RM case the first question that needs to be asked is what type of message exchange patterns am I working with?

  • One way?
  • Request/Response?
  • Small messages? (< 1K)
  • Large messages?

WS-RM specifies that we receive a <SequenceAcknowledgment> for every N messages. If we are using a request/response this gets combined into the SOAP response headers. If we have one way messages we’ll receive a new SOAP Envelope every so often with just some acknowledgement headers. There will also be a <Sequence> header with each request message. And if you aren’t already using WS-Addressing, headers will be inserted for it on both the request and response.

In a typical service invocation we have the following items which consume our processing time:

  • Processing of WS-RM + Addressing headers. These end up being around 600 bytes total.
  • Processing of request and/or response messages
  • Time spent in transport. For this example we’ll use HTTP.
  • Time spent in service

Right away, we can see if we’re dealing with large messages, RM will have very little performance affect on our service. Whats another 600 bytes if you’re already sending around 20K?

So lets look at what might be the worst case: one way message which are < 1K. For 1K one way messages HTTP ends up being about 30% of the processing time (roughly). Lets also assume for now that your time spent in the service (i.e. doing database stuff) is negligable.

Performance degradation = (1K*1.3 + .6K) / (1K*1.3) = 146%

The “*1.3″ is to factor in HTTP processing time - which ends up being the equivalent time of processing a .3K message.

Its important to remember that this completely disregards server side processing time, so I think the worst case for a 1K one way message is probably more around 30% degradation in performance. For larger messages its probably safe to assume that it will probably have a 5% or less performance impact.

Acknowledgements

You may have noticed this completely ignores acknowledgements. For now I’ve assumed those will only occur every so often, so they won’t have a huge performance impact. If we look at the extreme situation, I can see where we might have 100 clients, each sending 1-2 messages per second. How many messages will the server want to buffer here? If these are small messages and we end up acknowledging ever 3rd or 4th message, this could have a big impact.

HTTP with Idempotent Methods

We should also be able to achieve reliability through idempotent methods. PUT and DELETE are idempotent which just means we can “attempt to transfer our state” (aka submit our data) as many times as we’d like - until we receive a response code. Any time we’re retrieving data, we can issue GET as many times as we’d like as GET does not change the server side state.

Right away its clear that if our application is only retrieving data, the RESTful HTTP approach does not have any impact. We simply issue GETs and don’t do any extra work. Of course, if you’re only retrieving data you probably wouldn’t enable WS-RM because it wouldn’t add muchh value there either, so this really isn’t an interesting case.

Lets say I want to submit an order for some widgets. This will become a two step process with HTTP. First, a POST to /widgets/order. This will return a URL to a place we can PUT our order - i.e /widgets/order/abc-123. The server will listen at that URL for new orders and only accept one order there. The client will submit as many times as it needs to.

Performance does degrade here as it involves an extra POST. The degradation is probably comprable to the one way small message case in WS-RM, but I don’t have any numbers handy to prove that. The nice part is that you only take this hit when you’re actually transferring state. If you’re application is 90% GETs then you probably won’t see a huge impact here regardless of your message sizes. On the other hand if your application is continually submitting data this approach will probably have an impact.

Conclusion

Each approach can be made to work I think. Which one you chose probably depends on your application and use cases. Are you sending lots of data? Are you only retriving data? Also, what about interoperability, ease of use, and integration with existing architectures? And we haven’t even touched the possibilities of using something like JMS or OpenWire here. It seems performance is just one question of many for service builders.

ApacheCon EU talks

Sunday, January 28th, 2007

Much to my surprise, it seems that somehow I managed to get two talks accepted to ApacheCon EU (May 1-4) this year. The best part is that they're on completely different sides of the spectrum.

The first talk is Navigating WS-death^H^H^H^H^H*. (It seems not everyone liked the name, so that may be changed - feel free to give feedback in the comments). There are many WS specs out there with many different versions (like WS-Addressing). When should I as a user consider using them? What benefits will they bestow on my project? How interoperable is a particular specification? What does the spec roadmap look like?

The second is entitled Building scalable, reliable, and secure RESTful services. This talk is intended to be practical advice on how to build RESTful services. I am to illustrate scalability, reliability, and security through a series of practical examples using different toolkits/frameworks.

Hope to see many of you in Amsterdam this year, I'm sure it'll be a great time!

Component Discovery with Spring

Saturday, January 27th, 2007

One of the things that really bugs me is when people go about reinventing configuration, discovery, or wiring for their Java components. We have plenty of containers out their which can do this for us. By reusing them we end up with higher quality code and can focus more on features specific to our problem domain. Spring has been the most popular container for a while now, and its pretty obvious how to do configuration or wiring together of components. But how do you do discovery?

I’ve seen many examples which advocate this:

<bean id="widgetManager">
<constructor-arg>
<list>
<ref bean="widget1">
<ref bean="widget2">
</list>
</constructor-arg>
</bean>

The problem with this though is that you need to explicitly list all your widgets in your configuration file. In many cases what you’d like instead is to be able to just add acme-widget.jar to your classpath and have Spring automatically pick it up!

To get around this issue in CXF I wrote a SpringBeanMap class. What this will do is look through all your ApplicationContexts and find beans which implement a specific interface. Here is a small example of how it would be applied to the above case:


<bean id="org.widgetthingy.WidgetManager">
<constructor-arg>
<bean class="org.apache.cxf.configuration.spring.SpringBeanMap">
<property name="type" value="org.widgetthingy.Widget"/>
<property name="idsProperty" value="widgetIds"/>
</bean>
</constructor-arg>
</bean>

This will invoke the getWidgetIds method on any bean which implements the Widget interface. For each id supplied, it will use it as the key in your Map with the widget as the value. Once that is done the resulting Map gets supplied to your WidgetManager via a constructor or property.

That only solves half the problem though. We also need to discover new configuration files that have been added to the classpath. This can be done by creating a ClassPathXmlApplicationContext that looks for all the **/widget-*.xml files. Now all one has to do when writing a new widget is define their bean in a widget-foo.xml file and it will be automatically added to your application..

The only issue that I’ve found with this approach is that the Widgets aren’t lazily loaded. This isn’t a huge deal for us at the moment as the things we’re loading aren’t resource intensive. There is at least one way around this for those so inclined. Instead of instantiating the bean and invoking getWidgetIds() we could instead look at the Spring BeanDefinition for what the “widgetIds” property holds. This means that you must include the widgetIds in your Spring bean definitions though, and they can’t be hard coded into your class.

If anyone has other approaches which get around this, I’d love to hear them. I’ve recently re-enabled comments so drop me a line…

UPDATE: Arjen Poutsma pointed out via email out that if I want lazy initialization there is no other possiblity than to put the ids in the XML. If I want to get ids via the bean, I’m going to have to instantiate it. (*kicks self*) I updated SpringBeanMap to support both methods. Now if there are ids specified in the XML they are used and the bean is not initialized. If there are no ids specified, we initialize the bean and grab them. I’m fairly happy with this solution now. It is completely non invasive and very flexible!

iPhone thoughts

Tuesday, January 9th, 2007

The iPhone revealed today certainly looks very cool. It is thin, seems to have decent battery life, looks good, and has an interesting touch screen interface.

Two thoughts after being a long time smartphone person:

  • Touch screens suck while you’re driving
  • Touch screens don’t work if you’re trying to write an email or text someone

The main lesson being here that tactile feedback is important. The iPhone looks good enough that I might be wrong though. I will have to reserve judgment for now.

Also, EDGE sucks. Can’t we have UMTS please?

Sun T2000 Review Part 1: Sun Sales

Sunday, March 5th, 2006

We’re currently evaluating the Sun CoolThreads T2000 boxes for a project we’re working on. At Jonathan Schwartz’s request, I’m blogging about my experiences.
These machines are looking pretty sweet. For instance:

  • An 8 core 1.2 GHz UltraSPARC T1 processor
  • The best performance/watt in its class
  • They’re small
  • They’re reasonably priced, starting at $3K for the T1000 and $8K for the T2000
  • Did I mention they’re fast? Like double the speed of a dual 3.8 GHz pentium system.

Sun has this wonderful try before you buy idea going on. Except its not so wonderful in reality. Turns out the machines are in such high demand it takes forever to get ahold of one.

After I order my system, I get an email from one of the sales rep that said “thanks for the purchase, email Jane who will be your actual sales rep.” I think thats kind of strange, but I emailed her asking about possibly getting a T1000 to try instead of a T2. After a couple days they determine, no thats not possible. Oh well.

About two days pass and I start thinking… hmmm.. did they ship my machine? So I email. My sales representitive is OUT OF THE OFFICE (in caps because thats how it came in the email). I email again, finally someone else responds. My machine is not going to ship for another week. They are “ramping up product inventory.” Oi.
I emailed them a couple days ago asking for benchmarks relating to XML. No response. And once again my rep is OUT OF THE OFFICE. I also cc’d two others, no response from them either.
So now I’m in anxious anticipation of my machine. I’m hoping it actually shipped on Friday, but the comments here aren’t helping my confidence. All of them full of hope, but left still wanting their machine…
By now you’ve noticed this isn’t a review, just me protesting at the lack of machine to review and the crappy sales process. Heres hoping I get a machine and we can do part 2 next week!

(Ridiculous?) Predictions ‘06

Wednesday, January 4th, 2006
  • XML networking/gateways/routers will do even better this year.
  • WS-* support will continue to move out of the realm of application developers and into mediators/gateways/routers.
  • XFire 1.0 will ship.
  • Google ad growth will slow, but it will still be well loved.
  • Microsoft will really start moving again and start shipping products faster then Google. And their stock price will go up.
  • Windows Mobile Smartphones will finally start to take off thanks to some ultra cool phones coming out.
  • The new Powerbooks will be ultra successful.
  • Open Source will spread further into vertical markets.

Hmm. Nothing too crazy there. Maybe I need get a little more wild. *sigh* - another day.

Mindreef Coral: Wow

Thursday, December 8th, 2005

Just read about what Tim Ewald has been up to at Mindreef: their new product called Coral. If you thought SOAPScope was cool, check out Coral. It adds in a whole new collaborative aspect that appears to be pure genius. And once again with a beautiful interface!

The cons of the Apache Software Foundation

Saturday, November 19th, 2005

Susie over at the Apache Marketing Blog (is this official???) writes some thoughts on why your project should be at the ASF.. While I like the ASF and I work on some projects there, I feel like the article is both horribly idealistic and one sided.

My main gripe is the idea that community results in a better product. Susie tries to convince the ActiveGrid project that Apache is the place to be.

Why not foster the kind of community that could create the kind of momentum you’d want to spread a message of this magnitude? A well marshalled community usually trumps one single voice (no matter how articulate it is) in beating the drum.

I don’t agree with this at all. One only needs to look at Apache to find this is not the case. Cosinder the Avalon project. Too many voices couldn’t create a coherent vision. As Eliotte Harold from XOM has demonstrated cathedral development can result in a much cleaner API and much better software.

This isn’t to say I don’t like Apache. I use software from Apache all the time and am a committer there. It just isn’t for everyone and doesn’t necessarily result in better software.

Mossberg blasts GMail

Wednesday, September 21st, 2005
I’m sure Gmail will get better and better, and will eventually adopt the new programming techniques that allow desktop-like ease of use. But I’m not sure Google’s arrogance will ever make room for user preferences on things like folders or ads, or how emails are grouped. Yahoo’s new email program would blow Gmail away if it were widely released today. That’s partly due to its features, but also to its respect for user choice.

Full WSJ text here.

Coordination, SOA, and control

Tuesday, September 20th, 2005

I’ve been putting a lot of thought lately into coordination as it relates software, people and organizations.

On one end we have email, RSS/Atom, and web. All very loosely coordinated. Nearly everyone has email (hey Warren Buffet, you listening?), nearly everyone uses the web, and more and more people are aware of RSS/Atom. People can go out and start sharing information in a very loosely coordinated fashion. This comes at the expense of the ability to capture domain specific data or provide high security. When you move into the enterprise things fall apart.

Which brings me to the other end of the spectrum – I’ll loosely refer to it as SOA for now. With web services industry, groups, organizations, and even individuals can create domain specific languages. We have a broad basis of web service specs now (SOAP, WSDL, WS-Addressing, WS-Security, WS-….). On top of this people are developing specs like the Physical Markup Language which is a way to share RFID data. While RSS could embed this data, it isn’t really suited for it. There a couple problems with the SOA approach though…

Thought

When embarking on the SOA approach it requires a lot of up front thought. One must develop schemas, wsdls, etc. One must get every department to agree on a schema. One must worry about extensibility and versioning. And then there is scalability. And then there is choosing the right software platforms.

Coordination

If you do anything outside your organization it requires a lot coordination. For instance in a very fragmented industry (i.e. logistics), no one player can establish The Standard for everyone else. Getting everyone on board for a standard can take years. And there are huge benefits to sharing the data NOW.

Agility

Of course once you’ve decided on a standard, its already nearly worthless. But you’ve probably already committed to the software as well for the forseeable future. What happens when you need to extend? Or even worse – you need somethign radically different?

Data Control

All the above are issues, but one concerns me more than the others. Data control. Lets take the example of RFID. This page says it all, but let me describe. An item tagged via RFID will pass from a supplier, to a warehouse, to a distributor, to a retailer. Each one of these people contains a piece of information about the product as it passes through the suppline chain.

Who owns this data? How do we get at it?

Central storage? As a manufacturer you going to require every warehouse that you work with to send their data to your database?

What about distributed query of all the partners? Will every warehouse that you use be willing to adopt the same interfaces so you can query the particular tag? Will you have to write software against each partner’s interface?

What about security? Limiting subsets of data to specific people is hard, but not insurmountable. But a twist is that there may be an implicit trust relationship involved. If I am a manufacturer, my transportation provider may be partnering with a warehouser. Since my transportation provider trusts me, can I access the warehouse database?

Does the manufacturer have an implicit ownership right to the rfid data generated at the warehouse? Even if they have a right to it, how do we get at it and do we share it in a meaninful fashion? Can the warehouser charge extra for the data?

Data ownership and control brings up a lot more questions than it answers. Will SOA, ESBs, ERPs, middleware, and a host of technologies save us from data hell? I feel there must be a better way, but I have no real answers.