Sunday, November 4, 2012

Quality Only Breaks Once

In a recent party, we were discussing the poor quality of most kitchen appliances.  The topic began because a friend (Mike) complained that he was on his third or fourth bread machine.  I jokingly suggested that any appliance with "For Household Use Only" stamped on the bottom was a piece of junk, meant to hold together only long enough to get it off the showroom floor.  This was called to mind by my seemingly solid electric pasta machine, which nonetheless stopped working after it had produced a few dozen batches.  It provided the information needed to translate "For Household Use Only" correctly.

The first Kitchen Aid stand mixer I had stopped working after about a month of use.  It came with a lifetime warranty, so I took it to a local appliance repair center.  There, they replaced the broken annular gear (free of charge), and it has worked fine ever after -- providing almost two decades of faithful service before I gave it to my niece.  I joked that it came with a cheap, nylon annular gear, but when I took it in for service, they replaced that one with one of solid brass.

Whether or not that's true, it suggests a business model that could be quite successful:  Have a production model of your product that is manufactured rather cheaply.  Nonetheless, deliver the standard model with a lifetime guarantee, and charge a bit extra because you can claim superior quality.  The trick is, that most owners will not use the product nearly enough to discover its flaw.  Those who do will hopefully return the unit for service or replacement.  The replacement unit or parts are top quality.  If the unit ever comes in for repair, *all* of the cheap critical components are exchanged for top-of-the-line replacements.

Among those who actually use the appliance it will eventually earn the reputation for high reliability, while among the rest the assertion will never be tested.  Meanwhile the difference in price between what can be charged for a top quality model and the cheap models you actually produce is almost pure profit.  As long as the amortized cost of repairing (or replacing) the models which do come in is less than that margin, you're money ahead.

I suppose that selling crappy software along with the promise to fix any bugs that may arise follows pretty much the same business model....

Monday, October 15, 2012

On Parallel Programming

"Sequential programming is really hard, and parallel programming is a step beyond that." -- Andrew Tannenbaum, USENIX 2008 Keynote Speech

A colleague quoted this in a recent email, and I feel obliged to respond to it.  From practical experience, it is easy to nod in assent and let it go at that.  An application that is not designed to run in a multithreaded environment is quite likely to fail when simulated or actual parallelism is added.  And even some applications that are carefully designed to run under parallel execution fail because the designer or coder neglected an important detail.

 But the fact that I am working on a programming language that is intended to make the job of programming parallel applications easier means that I cannot agree with this assessment wholeheartedly.  Of any profession, the promise that automation can rescue practitioners from tedium dangles closest in front of software engineers.  Early programmers struggled with machine code until the first symbolic assembler was created; later coders strove with assembly language until usable high-level languages were introduced.

Even though the concepts used to produce code that will run correctly in a parallel processing environment have been around for decades, it is reasonable to assume that parallel programming is still in its infancy, and that software tools will again come to the rescue.[1]  The main reason is that multicore processors have only become common within the last half-decade or so.  The redundant cores are currently an underused resource, but already applications must find ways to utilize this resource to remain competitive.

As with any in the technology used to program computers, it will take some time for new programming models to become widely used.  There is a lot of legacy code and many entrenched coders.  The old code bases may need to be recoded or encapsulated to run efficiently in the new environment.  Experienced coders will have to learn a new skill set to be effective with new coding paradigms.  However, we have ample examples from the past in which this paradigm shift has occurred.  The new programming model not only shapes how software is written to run on existing machines, it affects how the machines themselves are designed.  Hardware advances impinge upon the tools and programming model used, and the cycle continues.

In the present turn, the presence of parallel hardware is the motive force; tools and especially the programming model are obliged to catch up.  The programming model is central to the discussion, and carries the burden of the perceived difficulty.  Serial programming is hard, but in my view it is harder than it needs to be.  If the process of creating a parallel programming involves first writing a serial program and then parallelizing it, it is no wonder that parallel programming is thought to be more difficult.  But the excess complexity of serial programming and extra step of parallelization can both be dispensed with.

The inessential complexity in serial programming arises the expression of a linear flow of control.  Early single-processor machines could execute only one instruction at a time, so a complete ordering of the instructions fed to it was an accurate representation of what the machine was doing.  The state of the machine between each instruction could be predicted with precision.

With the advent of superscalar architectures, however, the connection between instruction sequence and execution sequence became more tenuous.  Superscalar architectures can dispatch more than one instruction at a time, perhaps blurring the exact state across several instructions.  Such processors contain internal circuitry to ensure that the observable state of the machine matches what is expressed by the code.  A hazard occurs when two simultaneously-executing instructions intend to read and write (or both write) the same register or memory location.  The circuitry enforces a stricter ordering of the execution of instructions when hazards are present.  A programming model that fundamentally assumes parallel execution would not require this extra circuitry.

If, in the absence of hazards, instructions can be executed in any order, then it becomes clear that the serial expression of execution carries extra information that is not essential to the expression of the algorithm.  It is convenient to list the work that needs to be done linearly.  But often, the exact ordering of the statements in a program is incidental.  I use this flexibility to group statements that are related and thus enhance the readability of the code.  But the very fact that I can do this reflects the presence of ordering information that is unimportant.

In fact, some compilers will perform instruction reordering to reduce execution time.  They perform a dependency analysis among the variables described in the program, and determine the order in which operations must be performed to preserve the original intent.  In effect, that optimization throws out the most of the ordering information supplied by the programmer, preserving only those partial orderings which are essential to the algorithm.  It then builds a new total ordering from scratch.

Given that much of a serial program representation is incidental, we are encouraged to consider a programming model that avoids this overhead.  The language must provide the means to express important sequence constraints while ignoring or discouraging the inclusion of incidental ordering information. 

It turns out that a number of programming languages already exhibit this property to some extent.  The goal-dependency representation in makefiles is a familiar example.  The goals express the dependence of a given target on a number of source files, which is in effect a sequence constraint:  A given source file must be present before the target can be produced.  In a serial processing environment, a feasible total ordering can be derived from an initial goal and the constraints provided.  In a parallel processing environment, each goal represents a task; a task can be added to the ready queue as soon as all of its constraints are satisfied; any available processor can execute any task from the ready queue at any time; completion of a task changes the state of each file touched by the task.

This declarative (as opposed to "procedural") style of programming is represented by many languages -- some of which have been around for decades.  Prolog is an early example, XSLT a more recent contribution.  What these languages have in common is that -- at least at a high level -- they specify the "what", not the "how".  This approach allows the implementation considerable lattitude in scheduling tasks so long as the explicit sequence constraints are met.  In fact, the algorithm for executing on a single processor is just the degenerate case of the algorithm used for parallel execution.

If declarative languages are more suitable for parallel applications and implementations exist today, why haven't they enjoyed more widespread use?  One reason is historical: When declarative langauges were introduced, available computers were almost exclusively single-processor.  Quite a bit of overhead is involved in scheduling the tasks to be run, and all of this had to be simulated in software.  Once scheduled, the tasks were run on a single processor, so no speedup was possible.  It was impossible for an algorithm specified in a declarative language to run faster than the same algorithm specified in a procedural language.  Hence, declarative languages were labelled as "slow" and dismissed.

With an abundance of processors sitting idle and hardware-assisted task scheduling, the landscape has changed.  Perhaps it is time to give declarative programming another chance.  In future posts, I'll provide a more thorough review of existing programming languages, with an eye to their applicability in a parallel execution environment.  Then if need be, I'll decribe the characteristics of a new language that is well suited to bridging the gap between serial and parallel programming.

[1] Myths About Scalable Parallel Programming Languages Part 6: Performance of Higher-Level Languages

Tuesday, January 24, 2012

Dirty Trick #6 -- Fun With Ice

From time to time, a random chance works out in your favor -- especially if you nudge the odds in the right direction.  The last winter we lived in Allentown, it snowed a lot.  Like 65 inches.  It soon got to be too much trouble to clear all the snow off the top of our Dodge van, seemingly every time we wanted to go out.  And so it happened that when temperatures climbed above freezing in the spring, there was still an inch-thick layer of ice on top of the van.
I knew I had this load of ice on the roof as I got onto US 22.  My idea was that sometime before I got up to speed, the wind would lift the ice off the roof and let it smash harmlessly in the roadway.  All I had to do was time my acceleration so that noone was following too closely behind.  I drove with a good bit of my attention directed toward the rear-view mirror.
It was also the case that -- even though the I-78 bypass had recently opened -- most of the truck traffic still went through on US 22.  And they drove as if they owned all 4 lanes.  Passenger cars were not welcome.  (I'll bet you can guess where this story is going.)  Well, the ice proved more tenacious than I had anticipated.  And in the mean time, one of those obnoxious trucks came up behind and completely filled the rear-view.  I was already doing 60, and there he was -- not 16 feet off my rear bumper.  He was, in fact, driving in my slipstream -- as proven by what happened next.  I had switched from thinking "I hope it doesn't go now." to "Oh please oh please oh please!"
The little bump going onto the deck of the Lehigh River bridge and a little gust together were enough to tilt the leading edge of the ice sheet up, and away it went!  I watched it start to fall toward the pavement, but then the slipstream carried it up and let it smash right across the hood of the truck.  Score!  Well what do you think? A minute later he was back on my tail blowing the horn like crazy.  OK, fine.  I'd be upset too.  I slowed down a bit, but he continued following for at least five miles.  Somewhere in there it became apparent he wanted me to pull over.  But there was nothing I could do because the shoulder was all filled with snow.  When I did finally pull over, the trucker came up and wanted my insurance information.  He said that the ice from my car had cracked his windshield and wanted my insurance to cover the damage.  OK, fine.  We exchanged insurance and contact information.  He wasn't angry like I expected him to be.  He was also apparently clueless that his tailgating played a major part in getting his windshield smashed.  Maybe he got a clue later.
When I returned home, I received a call from my insuror, wanting to know the circumstances of the claim.  I stressed that the trucker had been following me so closely.  Otherwise, the ice from my roof would have fallen harmlessly to the ground.  An hour later, the trucking company called and I gave them the same story.  My guess is that the claims adjustor for my insurance called the trucking company and at least gave them a hard time before paying up.  I can only hope that the trucking company gave their trucker the same treatment.