Deployment
January 24th, 2009(Skipping week in review for a long entry today…)
I’m continually amazed at how hard of a problem deployment actually is. If you’re going to be deploying any reasonably sized application you have an endless list of things to worry about:
- Taking the cluster up and down so there is no downtown
- Managing the configuration of individual nodes
- Operating system setup
- Installation of required libraries/3rd party tools
- Managing dev, QA, staging and production deployments
- Schema migration/database updates
- How to do rollbacks
We’ve done a bit of work with Galaxy to support deployment which addresses a small subset of these problems. Our NetBoot feature allows you to store your Mule application in a repository and have it downloaded on the fly from any number of nodes. You just trigger a restart on the node to get the application update via JMX.
There are a few other interesting tools out there.
Capistrano: Allows you to create Ruby scripts which automate all the aspects of your deployment. It looks endlessly flexible.
SmartFrog: A system for describing and managing software components. Its goals seem much more ambitious than Capistrano. Check out Steve Loughran’s presentation on deploying Hadoop on a cluster (a non trivial task) with SmartFrog for a good overview. It even comes with a management console! Although it looks complex at first glance, I bet that once you get the hang of it, it can simplify things quite a bit.
Puppet: A declarative language for “automating system administration tasks.” This seems much more oriented at automating sys admin tasks, than actually deploying your application.
(I would love to hear about anyone’s experiences with any of these tools or any of the commercial vendors as well.)
Given the complexity of deployment, it certainly makes PaaS offerings appealing. No worrying about operating systems, databases, 3rd party libs, configuring individual nodes… Ideally it could be such that you just upload your application and push it out. Which is what it seems many large companies are doing – Amazon, Google, Yahoo, and LinkedIn to name a few.
I hope that we start to see more core infrastructure managed by the infamous cloud people. Just write your app, upload, and tell it where to deploy. Then we can focus on building applications, which is what we really want to do anyway.
On a related note, are there any managed Hadoop instances out there? This could be a very useful part of a hosted application infrastructure. It’ll be interesting to see if Amazon supports something like this as a core service someday.
January 24th, 2009 at 7:53 pm
for our cluster (jboss), we just use subversion for everything. the entire deployment is checked into subversion and then checked out to the production servers. updates are as simple as ’svn up’. we have scripted this to be ’stop app server; svn up; start app server’.
another option (if you use a debian distro such as ubuntu) is to build .deb’s and deploy those. we started off with that, but found that svn was a much easier alternative to maintain.
January 25th, 2009 at 12:08 pm
It’s almost like cfengine never existed. People don’t even mention it. Or bcfg2, lcfg, or other pieces of free software that have been used or designed for massive clusters or large departments.
January 25th, 2009 at 3:54 pm
“anon” -if you do look at the slides dan sites, I do mention these. As to why it isn’t mentioned -maybe enough people aren’t aware of it. It’s as if software automation stopped at the build tools, and you are down to drag and drop and hand machine configuration at the end. Which is so wrong.
Dan -are you going to be at ApacheCon EU?
January 25th, 2009 at 6:16 pm
Fabric is a simple, lightweight, trivially-easy-to-use version of Capistrano. I’ve had good luck with it: http://www.nongnu.org/fab/
January 25th, 2009 at 7:13 pm
Thanks for the suggestions. I’m really not a deployment guy, so everything here is completely new to me. I’ll look into cfengine, bcfg2, lcfg and Fabric. But, deployment is still seems like a PITA to deal with no matter what.
January 26th, 2009 at 11:42 pm
Now that I don’t work for Amazon any more, I can comment on Hadoop thing… and my wild speculation is: very probably.
I don’t know of any concrete plans, but I do recall some turf-protection mentality regarding the idea of doing “things like Hadoop but better”, since apparently some team(s) were investigating offering Hadoop. Kinda calling dibs on map/reduce, I guess.
(plus: “why would you want to do that, Hadoop is already done — let’s rather work on XYZ” — bah, Hadoops is nice, but that’s not end-all-be-all solution, just the first generation of things to come)
January 27th, 2009 at 10:10 am
My experience is with capistrano & with puppet.
Puppet can be hacked to do deployment, but, having done it, I would not recommend it. It is not really its strength.
However – 3 of the 7 items on your list are well handled by puppet (the sysadmin-like tasks). I haven’t written custom tasks (which are in ruby) so I am not sure how that works. I know trying to use it as a procedural language is difficult. So getting it to a) fetch the latest build, b) unpack, c) link, d) restart is difficult.
Personally I can’t say that I like capistrano anymore. Too much trouble with it. Dependent on the remote shell. We have many systems configured to use csh and capistrano does not like that. It’s nice at first that you don’t need to have cap installed on the machine deployed to but in the end I think this feature is not worth the trouble.
January 30th, 2009 at 12:21 am
Hey Dan,
As you know, we had some pretty old systems where I work. Recently, we all decided to upgrade to a Linux stack, and I have been working on the deploy problem.
We like Puppet. We like cobbler + koan. We like Linux clusters, paravirtualization, and OS migratable services. We like YUM & RPMS. I think intelligent use of these tools can lean to a management deployment problem. There are tradeoffs, but it’s a lot better than a java only centric vision where people are non committal about what platform things will actually run on.
Now if we can just get more 3rd party vendors to see the light ..