Are Clusters a Dying Technology?

I happened across a blog today that made the claim that accelerating builds by distributing to a cluster of computers is “a dying technology.” Instead, they said, you should take advantage of increasing CPU density to enable increasing levels of parallelism in your builds — a single system with eight or more cores is pretty affordable these days. So is it true? Have we reached a critical point in the evolution of computer hardware? Are clusters soon to become obsolete? In a word: no. This notion is rubbish. If the massive push to cloud computing (which is all about clusters) isn’t convincing, there are (at least) three concrete reasons why clusters will always beat individual multicore boxes.
Read the rest of this entry »

ElectricAccelerator vs. distcc: samba reloaded

ElectricAccelerator vs distcc – samba reloaded

In an earlier post I compared the performance of ElectricAcclerator and distcc by building samba using each tool in turn on the same cluster. In that test I found that Accelerator bested distcc at suitably high levels of parallelism, but that distcc narrowly beat Accelerator at lower levels of parallelism. At the time I chalked the difference up to slightly higher overhead associated with Accelerator. But you must have known I couldn’t just leave it at that. I had to know where the overhead was coming from, and eliminate it, if possible. The exciting conclusion is after the break.
Read the rest of this entry »

Do more (computing) with less (hardware) (people) (money)

Imagine if you discovered your colleagues only work 4 hours a day. You thought everyone was working as hard as you until you started monitoring what they did all day. To your surprise, many of them were idle for hours at a time, just sitting still waiting for someone to give them work. And when they did work it was in 10 minute bursts separated by more waiting around. I think you would be upset if this were true. You should only get a full day’s pay if you do a full day’s work, right?
Read the rest of this entry »

Makefile performance: recursive make

Are you using the best method for invoking submakes in a recursive make based build? A simple change can turn on the afterburners by unlocking parallelism.
Read the rest of this entry »

ElectricAccelerator vs. distcc, round 3: samba

In this continuation of the ElectricAccelerator vs. distcc battle royale, I’ll compare the performance of these two tools when building samba, a suite of tools that provide file and print services to Windows clients from Unix-like servers. Samba is a particularly interesting package for this comparison because distcc was originally created in order to accelerate samba builds, and until recently, the distcc project was hosted by the samba organization. In some sense, samba is the poster child for distcc acceleration, so it should work quite well with distcc.
Read the rest of this entry »

ElectricAccelerator vs. distcc, round 2: MySQL

Previously we have compared the performance of ElectricAccelerator and distcc when building the Linux 2.6 kernel. In what I hope will be the first of several followups, I will repeat the experiment with different software packages, to determine whether that result was a one-off, or whether ElectricAccelerator really is consistently better performing than distcc. In this round, we’ll be building MySQL, a popular free-for-non-commercial-use database server.
Read the rest of this entry »

Building Linux 2.6 with ElectricAccelerator and distcc

There are lots of different parallel, distributed build systems in the world besides ElectricAccelerator. In this post, I’m going to share my recent experience with one popular alternative, GNU make combined with distcc.
Read the rest of this entry »

Untangling Parallel Build Logs

I spend most of my time with ElectricAccelerator working on the “big” features — performance, scalability, fault-tolerance. It’s easy to forget that there are a ton of “little” features that can themselves make a big difference in the value of the system. Case in point: the build log. If you have any experience with parallel build systems, you know what a mess the build log becomes because you have any number of parallel commands all dumping output to a single logfile simultaneously. The output from each command gets interleaved with the output from other commands. Worse, the error messages get jumbled up too, so it becomes difficult to tell which commands are producing the errors.

I was reminded of this issue when it popped up again on the GNU make-help mailing list. Take a look at this recent post:

Read the rest of this entry »