The last word on SCons performance

My previous look at SCons performance compared SCons and gmake on a variety of build scenarios — full, incremental, and clean. A few people suggested that I try the tips given on the SCons ‘GoFastButton’ wiki page, which are said to significantly improve SCons performance (at the cost of some accuracy, of course). Naturally, I felt that I had to do one last follow-up exploring this avenue. And since that meant I would already be running a bunch of builds, I figured I’d try out SCons’ parallel build features too. My findings follow.
Read the rest of this entry »

What’s new in GNU make 3.82

GNU make 3.82 hit the streets last week, the first new release of the workhouse build tool in over four years. Why so long between releases? To me the answer is obvious: the tool Just Works ™, so there’s no need to churn out new releases chasing the latest development fad. But as this release shows, there is still room to innovate, without compromising on the points that make the tool so great. The two improvements I find most interesting are .ONESHELL, and changes to pattern-search behavior:
Read the rest of this entry »

Making Automated Tests Truly Automatic

[A version of this article appeared on eWeek http://www.eweek.com/c/a/Application-Development/How-to-Make-Your-Automated-Software-Tests-Truly-Automatic/]

A recent poll of software development professionals showed that the majority would rather be doing their taxes than dealing with their company’s test infrastructure. The reason: automated software tests require tremendous amounts of manual time and energy to configure, run, and monitor. This can be a startling revelation to companies that have invested substantial engineering efforts into automated test frameworks specifically to reduce the human cost of continually running large regression suites.

What’s at the root of this disconnect?
Read the rest of this entry »

Bridging the IT – Development Gap

In the last year I have increasingly run into a new character in application development shops – IT.

IT is not taking over coding tasks, but they are certainly taking a much more active role regarding where application development tasks get run.   This is no surprise, as software development has an insatiable appetite for compute resources.  Whether it is vast clusters for testing, dozens of machines to run ALM tools, or the incessant requests for “just one more box,” development teams are always asking for something.  Multiply this by the number of development teams in your company, and it is easy to see why this is a big job for IT.

IT’s response is to increasingly look at centralizing where development tasks are run.  The logic is that with a cloud of development resources, they can get more efficiency in sharing them across groups and offer their customers (development) more resources than they would otherwise get.  However, IT’s goals are largely different than those of the development teams.  IT organizations measure their success by how efficient, smooth and cost effective their compute environment is. They want a large, identically configured, scalable, uninterruptible environment that consistently delivers established levels of performance to the downstream customer.  In other words, the more that they can make things the same, the more efficient they can be.

On the surface, these goals are at odds with development.   Development teams are measured on software output and quality, not on resource efficiency.  Each development group often has unique needs to achieve peek productivity and quality.  The matrix of test environments never shrinks, in conflict with IT’s desire to standardize.  If environment customizations and optimizations make development more effective (which they often do) they want their own resources, even if it means they get fewer of them.

How do you bridge these competing goals?  The wrong answer is compromise: it’s not about finding the midpoint that’s just useless enough to each party’s goals that everyone is unhappy.

The right answer is to define an environment that can deliver against both goals simultaneously.  Allow IT to provision the compute cloud – these are the areas where homogeneity and efficiency shine.  This allows IT to meet development’s needs for peak resource demands by sharing across large pools of compute resources while reducing cost.  Virtualization is an important ingredient in the solution because it meets IT’s need for homogeneity and development’s need for configuration specialization.   However, virtualization is not enough.  What is really needed to bridge the gap is a framework that allows development to maintain control of what processes get run, who runs them, and how and when they end up on the compute cloud.

Is this possible?  Our most successful customers have used ElectricCommander to do just this.

For IT, ElectricCommander enables software development to happen in one large, scalable, reliable development cloud.  For development, they get all of the control that they need, only with a heck of a lot more compute resources.

A second look at SCons performance

UPDATE: In response to comments here and elsewhere, I’ve done another series of SCons builds using the tips on the SCons ‘GoFastButton’ wiki page. You can view the results here


A few months ago, I took a look at the scalability of SCons, a popular Python-based build tool. The results were disappointing, to say the least. That post stirred up a lot of comments, both here and in other forums. Several people pointed out that a comparison with other build tools would be helpful. Some suggested that SCons’ forte is really incremental builds, rather than the full builds I used for my test. I think those are valid points, so I decided to revisit this topic. This time around, I’ve got head-to-head comparisons between SCons and GNU make, the venerable old workhorse of build tools, as I use each tool to perform full, incremental, and clean builds. Read on for the gory details — and lots of graphs. Spoiler alert: SCons still looks pretty bad.
Read the rest of this entry »

Designing for high performance

Here’s the thing about high performance: you can’t just bolt it on at the end. It’s got to be baked in from day one. No doubt those of you who are experienced developers are now invoking the venerable Donald Knuth, who once said, “Premature optimization is the root of all evil.” But look at it this way: with very rare exceptions, no amount of performance tuning will turn an average system into a world class competitor.

Of course, high performance is the entire raison d’être for ElectricAccelerator. We knew from the start that parallelism would be the primary means of achieving our performance goals (although it’s not the only trick we used). Thanks to Amdahl’s law, we know that in order to accelerate a build by 100x, the serialized portion cannot be more than 1% of the baseline time. Thus it’s critical that absolutely everything that can be parallelized, is parallelized. And I mean everything, even the stuff that you don’t normally think about, because anything that doesn’t get parallelized disproportionately saps our performance. Anything that isn’t parallelized is a bottleneck.
Read the rest of this entry »

Visualizing Build Processes

The other day I was waiting for my continuous integration build to finish. Think about that a minute. See anything wrong with that statement? I was waiting for a continuous integration build to finish. The whole point of continuous integration is to detect integration errors as quickly as possible. After years of growth, our builds are too long to really live up to that definition.

Obviously I have to do something about this, but what? Where do I start? The build is composed of several distinct phases: checkout, compile, unit test, package, install and system test. In addition, the complete process is actually replicated on several platforms simultaneously. In total, the process encompasses over 200 distinct steps, running in parallel on dozens of machines.
Read the rest of this entry »

How scalable is SCons?

The marquee feature in ElectricAccelerator 5.0 is Electrify, a new front-end to the Accelerator cluster that allows us to distribute work from a wide variety of processes in addition to the make-based processes that we have always managed. One example is SCons, an alternative build system implemented in Python that has a small (compared to make) but apparently growing (slowly) market share. It’s sometimes touted as an ideal replacement for make, with a long list of reasons why it is considered superior. But not everybody likes it. Some have reported significant performance problems. Even the SCons maintainers agree, SCons “Can get slow on big projects”.

Of course that caught my eye, since making big projects build fast is what I do. What exactly does it mean that SCons “can get slow” on “big” projects? How slow is slow? How big is big? So to satisfy my own curiosity, and so that I might better advise customers seeking to use SCons with Electrify, I set out to answer those questions. All I needed was some free hardware and some time. Lots and lots and lots of time.

Read the rest of this entry »

Automating around scarcity by using virtual resources

[posted on behalf of Usman Muzaffar, who is on a long flight with no WiFi]

Here’s a sobering truth that shows up often in software automation: people are way better at sharing stuff than computers are. For example: say you have a scarce resource, like a box with special hardware or a service with serial access. You’re tasked with automating a software build/test/release workflow, and part of it needs to talk to this One Big And Fancy Thing. Do you try to teach your build script good playground behavior, so it automatically knows when to wait politely (and when, as deadlines approach, it should bully its way to the top of the slide), or do you declare this problem out-of-scope, and just provide the hook to let the team manage access manually?

The default on that checkbox is: *don’t automate*, for two reasons. First: letting people handle it means no extra work. More importantly, because we’ve been doing it our whole life, we’re actually pretty good at adapting to environments where we have to share things, whether that’s roads or restrooms or rack space. A small number of people on the same team with similar goals will usual self-organize around a few ground rules with a minimum of fuss. One clear and crisply delivered directive at a weekly team meeting (“OK guys the new 32-way sol box is for the full server test suite, so give that priority and check with each other before you use it for other stuff”) is often all it takes.

Second, technically getting the semantics of shared simultaneous access right is a notorious pain in the neck. As in any software automation system, there’s no credit for a partial answer: it’s a net loss if your script still needs a babysitter for the corner cases. So that means your solution needs to take selection and queuing and load into account, and have mechanisms for priority and pre-emption and be smart about busted network connections. More fundamentally, at its core it usually boils down to something awfully close to multithreaded programming, with the usual challenges in that space around semaphores, locks, deadlocks, races. Great stuff in a CS course or maybe your server’s ConnectionPool class — rathole alert in your build and test system!

So, largely with good reason, the automation train comes to a screeching halt right here. It’s just not worth the effort to build a system that’s going to manage the synchronization for parallel access to scarce resources. In other words: when shared resources wind up in the software production system, people show up next to them, and that sucks all the fun (and potential efficiency gains) out of automation. What to do?

One thing worth investigating are tools that can handle this for you.  Solving this was a key goal for our ElectricCommander product. Commander lets you describe your job as a series of command line steps, and each step can be specified to run on a resource. A resource is simply a system that we’ll remotely execute commands on, and it comes with a sack full of infrastructure goodies you’d expect like pooling, exclusive reservation, broadcast, security, access control, load balancing, and fault tolerance. As a user of the system, you specify what you want to run, and where you want to run it, press the ‘Go’ button and Commander does the rest, queuing steps when resources are oversubscribed and efficiently scheduling around your other constraints. Nice!

Then one day a customer asked us how they could automatically control access to a piece of hardware that simulated network traffic critical to the product’s system test. This wasn’t a gadget we could install software on; indeed, we couldn’t directly connect to it at all, so Commander can’t treat it as a resource. But it soon became evident that we could solve this just as elegantly with a simple tweak to the approach. Fundamentally, we needed the ability to specify that a step 1) needed access, 2) must block when it wasn’t available, and 3) once acquired, hang on to it until it was done. If something could just take care of this synchronization and queuing, the test could connect to the traffic simulator directly and simply execute as if invoked manually.

In other words: the problem called for a *subset* of Commander resources;  ignore half the stuff in the goodie sack (remote login, execution, fault tolerance, etc.) and you’re left with a general purpose resource access and acquisition facility. We set up dummy resources (good old 127.0.0.1, always up and ready for this sort of game!), injected them into the workflow and configured the job to hang on to them as long as it was talking to the traffic simulator. It worked beautifully: each test run was guaranteed to get just the access it needed, and for the first time, the customer had safe, parallel end-to-end automation for the full test cycle.

More importantly, this design pattern, since dubbed Virtual Resources, opened a whole new realm of possibilities. Once you start looking for them, there are *lots* of shared things in a software system that aren’t compute hosts, and they’re all threatening or overcomplicating automation in some way or another.  We’ve used Virtual Resources to manage access database tables, SCM labels, virtual machines, filesystem repositories, flaky external systems that don’t like more than one client talking to them, and our customers keep showing us new ways. It’s a great example of how the core of a clean design — a resource is something a job can request and relinquish — was readily adapted to a wider set of problems around Software Production Automation.

How to quickly navigate an unfamiliar makefile

The other day, I was working with an unfamiliar build and I needed to get familiar with it in a hurry. In this case, I was dealing with a makefile generated by the Perl utility h2xs, but the trick I’ll show you here works any time you need to find your way around a new build system, whether it’s something you just downloaded or an internal project you just transferred to.

What I wanted to do was add a few object files to the link command. Here’s the build log, with the link command highlighted:

gcc -c  -I. -D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -O2 -g   -DVERSION=\"0.01\" -DXS_VERSION=\"0.01\" -fPIC "-I/usr/lib/perl/5.10/CORE"   mylib.c
rm -f blib/arch/auto/mylib/mylib.so
gcc  -shared -O2 -g -L/usr/local/lib mylib.o   -o blib/arch/auto/mylib/mylib.so   \
                \

chmod 755 blib/arch/auto/mylib/mylib.so

Should be easy, right? I just needed to find that command in the makefile and make my changes. Wrong. Read on to see how annotation helped solve this problem.

Read the rest of this entry »