Jump to content

Check out our Community Blogs

- - - - -

Book Review: Release It! Design and Deploy Production-Ready Software

Posted by WingedPanther73, 26 July 2011 · 1039 views

I know, you're shocked that I'm reading another book by The Pragmatic Programmers. This is a book about what types of things you should look at doing when you get ready to release a major software product. That should immediately alert you to what it is NOT about.

This is NOT a book that will help you get through your programming homework. It will not give you insight into how to design your program, or how to work as a team. The examples are consistently dealing with huge e-tailer websites that deal with things like getting 250,000 hits per hour instead of 20,000 hits per hour on the day of site launch, or a service company reducing their capacity on Black Friday at the same time marketing was drumming up traffic for that service.

So, this is a book about doing programming on systems that utilize load-balancers, failover systems for databases, etc. In addition, Michael T. Nygard, the author, talks primarily about Java and Java's features in this type of system. He gives acknowledgement to Ruby on Rails, and speaks frequently about Oracle for databases. You will also see a lot of commentary about Sun Servers, since this was written in 2007, before Oracle bought them out.

With all that said, it has a lot of useful information for anyone who is looking at writing big to huge systems. The book is broken into several major sections: Stability, Capacity, General Design Issues, and Operations. Each section covers some major concepts, and starts with an anecdote from his experience with a large system failing.

In Stability, he sets the example with an airline system that crashed because of a cascading failure. Access to the database got overloaded, and it started causing blocked threads in other systems, which caused user frustration on the web portals while blocking out airline personnel from being able to process boarding. With this illustration in mind, he discusses ways to prevent failures in one subsystem (such as a database), or the connections between two systems (such as a firewall that decides to block calls), from taking down the entire system. He presents these as a series of anti-patterns and patterns, and illustrates how to cope with problems. If you assume things will break, and prepare for it, then your system will be able to limp along instead of collapsing.

In Capacity, he starts by looking at a launch of a new website. They were planning for a maximum of 20,000 concurrent sessions. Half an hour after launch, they reached 250,000 sessions and the site crashed. He then goes on to discuss Capacity anti-patterns and patterns, including proper handling of session length, handling of session data, etc. He also talks about how "users" will violate your expectations. A lot of web-crawlers do NOT act like normal users. Google, Yahoo, and Bing may behave themselves well, but there are always home-grown crawlers that can behave quite poorly, such as spawning hundreds of connections per second, each of which creates a new session object on the server. It's what you didn't plan for that will get you, and you have to plan for it.

General Design Issues talks about miscellaneous things related to networking, security, etc. In general, there are lots of little things you can do to help. One thing he trashes is the notion that "CPU's are cheap" and "RAM is cheap". When you're talking about enterprise class hardware, this simply stops being true. High-end servers range from $200,000 to $1,000,000 for an empty shell. The CPUs are similarly expensive. Anything you can do to lighten that load will make a huge difference in costs to the team using the software. We saw something similar with the release of Windows Vista: it came out as bloatware, and couldn't run on the netbooks that were suddenly popular. The fixed it in Windows 7, but came close to giving Linux a huge commercial success. The Windows XP end of life got pushed back because Microsoft didn't anticipate hardware getting smaller instead of faster.

The final section is Operations. There are two major concerns in this section. First, make your software transparent. If a vendor suddenly throttles back availability of a service at the exact time you slam it, you'll never figure it out if all you know is the customers are complaining. You need the ability to see what parts of your software are getting stuck. Also, you need to work WITH the IT group. I've got a buddy who does IT as part of his job, and he hates it. Nothing causes him more grief than change. Any change causes problems. Opening a port causes problems. Tracing wires to figure out which jack is dead because things aren't documented causes problems. New software versions? Yeah, they cause problems too. You want your software to be as IT friendly as possible. That includes logging to any location they want, gradual upgrades without outages, etc.

This book will NOT help you create a change counting program. What it will do is help you start making decisions on real software that are realistic. Can your software run without administrative privileges? Can your software handle an unresponsive database gracefully? Does your QA process test the same 0, 1, many relationships that production will have? Do errors get logged someplace? Do status messages get logged in that same place? Can grep parse the logs easily? Can you tune your software on the fly, so it can be made to limp along if something goes wrong?

Some things will not apply to the software you are getting ready to build. I see many things that apply to software I work on, and it even helped me figure out a weird bug in one of the packages I support.

  • 0

Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download