[This post is unabashedly technical, and contains nothing that you absolutely need to be a good photographer. I couldn’t help myself.]
You’d have to be living in a cave to miss the big switch in personal computing from a single processor per chip to two or four, with eight coming soon. The power dissipation of single processor chips was going through the roof as the designers cranked up the clock rate to get more and more performance. Power supplies got beefier, computers needed bigger and noisier fans to keep puddles of molten silicon from forming at the bottom of the cases, the electric bill started to climb, and there’s that global warming thing. So in order to get the possibility of more performance with reasonable power consumption, the chip designers turned the clock frequency back down, put more than one processor on each chip, and left it to the software engineers to figure out how to turn the potential into performance improvements that users could see.
I’ve been reading articles saying how clever the hardware people were, and how the software folks are behind, and need to work hard so that computers can effectively use all those new processors. I have a different perspective. I think the hardware folks basically gave up and punted, and left the software people to sort it out.
For thirty years or more, it has been easier to get more instruction cycles per second by adding more processors than by getting a single processor to run faster. The difficulty has been, and continues to be, that many problems are difficult to parallelize, or cut up into little pieces that can be simultaneously executed. Because of that, for the first 25 years of the last 30, the only computers that employed many processors were what were then called supercomputers, designed for problems for which the utmost performance was necessary. For all other computers, the designers just kept finding ways to turn up the wick, and clock speeds went from a few megahertz to a few gigahertz.
Now we’ve reached to point where a) the clock speeds have gotten to the point where power dissipation is a real problem, and b) we can cram so many transistors onto a chip that it’s easy to use them to build multiple processors. The chip designers have taken the path of least resistance. What else could they have done? For starters, they could have rethought processor design with the idea of getting faster scalar (single-threaded) performance more energy-efficiently. That’s hard work; my favorite approach involves asynchronous circuitry, which is difficult to design and more difficult to test. Slapping more processors onto the chip is conceptually easy, although I admit that details like cache management get tricky.
Some problems have a good deal of inherent parallelism. Luckily for photographers, image processing is usually pretty easy to parallelize, since the operations done to one pixel sometimes don’t affect the value of any other pixels (color space conversion, curves), or if they do (unsharp masking, Gaussian blur), they don’t affect the value of sufficiently distant pixels. Even so, current implementations of most image editors and plugins don’t ever peg the CPU meter on my four-processor system (Lightroom is a notable exception – Thank you, Adobe).
This isn’t the first time that the hardware designers have given up and left it to the software folks to clean up the mess — I myself have been guilty, and I hereby apologize to Dave Ladd and Steve Plant for the complicated Connection Table constraints in the Rolm CBX. Sometimes the best systems designs require that the software deal with a lot of messiness – Dave and Steve, I’d do it all over again. This could be one of those times, but it’s not something of which hardware designers should be proud.
I don’t have a good feeling about this transition. For the first time in the history of the personal computer, scalar performance has decreased. It’s pretty easy to get enough parallel processing to get back ahead in a two-processor system, but faster clock rates don’t seem to be in the cards, at least for a while, so the route to better overall performance has to be more and more processors per workstation. In order for that potential performance to turn into something we users can appreciate, the software designers have to figure out a way to keep all those new processors busy.
So it is true that the software folks are behind in the multicore game. They’re behind because the hardware designers changed the rules on them. They deserve our sympathy, because it’s not going to be easy to get on top of this situation. Automatic parallelizing has not had great success in thirty-some-odd years of trying. The software tools for manually parallelizing problems are not highly evolved either. Now that multiprocessor systems are thoroughly mainstream, perhaps there will be some breakthroughs.