Using Markov Chains to Generate Test Input

One challenge that we’ve faced at Electric Cloud is how to verify that our makefile parser correctly emulates GNU Make. We started by generating test cases based on a close reading of the gmake manual. Then we turned to real-world examples: makefiles from dozens of open source projects and from our customers. After several years of this we’ve accumulated nearly two thousand individual tests of our gmake emulation, and yet we still sometimes find incompatibilities. We’re always looking for new ways to test our parser.

One idea is to generate random text and use that as a “makefile”. Unfortunately, truly random text is almost useless in this regard, because it doesn’t look anything like a real makefile. Instead, we can use Markov chains to generate random text that is very much like a real makefile. When we first introduced this technique, we uncovered 13 previously unknown incompatibilities — at the time that represented 10% of the total defects reported against the parser! Read on to learn more about Markov chains and how we applied them in practice.
Read the rest of this entry »

Makefile performance: built-in rules

Like any system that has evolved over many years, GNU Make is rife with appendages of questionable utility. One area this is especially noticeable is the collection of built-in rules in gmake. These rules make it possible to do things like compile a C source file to an executable without even having a makefile, or compile and link several source files with a makefile that simply names the executable and each of the objects that go into it.

But this convience comes at a price. Although some of the built-in rules are still relevant in modern environments, many are obsolete or uncommonly used at best. When’s the last time you compiled Pascal code, or used SCCS or RCS as your version control system? And yet every time you run a build, gmake must check every source file against each of these rules, on the off chance that one of them might apply. A simple tweak to your GNU Make command-line is all it takes to get a performance improvement of up to 30% out of your makefiles. Don’t believe me? Read on.
Read the rest of this entry »

Friday Fun: Generating Fibonacci Numbers with GNU Make

Nobody would ever claim that GNU Make is a general purpose programming language, but with a little work, we can coerce it into generating Fibonacci numbers for us. Why bother? Because we can.
Read the rest of this entry »

Rules with Multiple Outputs in GNU Make

I recently wrote an article for CM Crossroads exploring various strategies for handling rules that generate multiple output files in GNU make. If you’ve ever struggled with this problem, you should check out the article. I don’t want to spoil the exciting conclusion, but it turns out that the only way to really correctly capture this relationship in GNU make syntax is with pattern rules. That’s great if your input and output files share a common stem (eg, “parser” in parser.i, parser.c and parser.h), but if your files don’t adhere to that convention, you’re stuck with one of the alternatives, each of which have some strange caveats and limitations.

Here’s a question for you: if ElectricAccelerator had an extension that allowed you to explicitly mark a non-pattern rule as having multiple outputs, would you use it? For example:

#pragma multi
something otherthing: input
	@echo Generating something and otherthing from input...

What do you think? Comments encouraged.

ElectricAccelerator vs. distcc: samba reloaded

ElectricAccelerator vs distcc – samba reloaded

In an earlier post I compared the performance of ElectricAcclerator and distcc by building samba using each tool in turn on the same cluster. In that test I found that Accelerator bested distcc at suitably high levels of parallelism, but that distcc narrowly beat Accelerator at lower levels of parallelism. At the time I chalked the difference up to slightly higher overhead associated with Accelerator. But you must have known I couldn’t just leave it at that. I had to know where the overhead was coming from, and eliminate it, if possible. The exciting conclusion is after the break.
Read the rest of this entry »

Makefile performance: pattern-specific variables

If you’ve been using GNU make for some time, you are probably familiar with both pattern rules and target-specific variables. You may even be familiar with the intersection of these features: pattern-specific variables. But you may not be aware of a subtle change in gmake 3.81 which affects the processing of pattern-specific variables with potentially disastrous performance consequences.

Read the rest of this entry »

Makefile performance: $(shell)

One rookie performance mistake I’ve seen in GNU make makefiles is the use of $(shell) without := assignment. Of course I’m not the first person to write about this, but people are still making this mistake, and it’s so easy to fix, it’s really tragic that it’s still out there.

Read the rest of this entry »

Makefile performance: recursive make

Are you using the best method for invoking submakes in a recursive make based build? A simple change can turn on the afterburners by unlocking parallelism.
Read the rest of this entry »