Search This Blog

Tuesday, September 22, 2009

Complexity and [Over]Simplification

Pardon my absence (if anyone noticed) while I recovered from a bout of depression brought on by repeated exposure to thought streams defined by buzzwords and marketing hype. Ulysses (James Joyce) was memorable for steam of consciousness paragraphs that ran on for page after page. I did not find the process of navigating someone else's thought stream enjoyable in the least. This is the same feeling I have been experiencing of late. The process of mapping the path from stimulus to response (or problem to decision) in humans is poorly understood at best.

My thought was to try to zero in on one kind of problem to see if I could cast some light on the decision-making process and find out how it gets caught in the deep ruts so often.

If you are a technophile who buys in at the bleeding edge and who speaks in acronymese (let's say it's pronounced uh-kron'-uh-meez): a language based on acronyms or sets of initial letters chosen or pronounced in the form of words, for example SQL (for Structured Query Language), pronounced see-kwel, then you may as well find something else to read because I'm about to ask you to take some real responsibility. Actually, I'm going to suggest that those who have to listen to you should demand that you take responsibility.

Many of you will be familiar with the Ishikawa Diagram though you might recognize it as the fishbone or cause-and-effect diagram. If you have ever used this tool to identify the cause(s) of a problem, you may remember just how quickly the diagram can become unmanageably complicated. A few years ago S.M.Casey wrote a book titled Set Phasers on Stun: And Other True Tales of Design, Technology, and Human Error. This book looks at some of the biggest man-made disasters of the past twenty years in an attempt to identify the cause(s).

While human nature demands that we be able to blame someone for anything that goes wrong, this research very pointedly shows that each time we identify what we believe to be the cause, it is always possible to say, "But if x had been alert, the damage would have been minimal or avoided entirely." In short, the cause is always a related set of events that may have been initially set in motion by a "proximate cause" (http://en.wikipedia.org/wiki/Proximate_cause). The problem, if we're interested in insuring that the damage is not repeated, is that legality is irrelevant as are the needs of human nature. The only way to guarantee that a certain problem never arises is to guarantee that anything that could be contributory is not allowed to happen.

So, "What does this have to do with me?" you ask. A brief example being better than a long explanation, here is an actual exchange that happened at a DAMA meeting recently. The chapter President called for suggestions for small-group discussion following the main presentation. One of the suggestions was, "Why do we keep making the same mistakes?" The group noted that

  1. it may not be possible to answer the question and
  2. an answer might well be useless in avoiding the mistakes.

Discussion proceeded to other topics. A specific question was asked involving response-time performance in a database application. Many possible causes for extended response times were trotted out without shedding any light. Someone asked about the possibility that a single query, executed repeatedly might be the point of "failure" and included the suggestion that the DBA (database administrator) should be able to say whether this was the case or not. The response: "We don't have a DBA. We thought we could get along without that additional expense."

Another example: Hammering a nail to hold together two pieces of wood is a simple and straightforward operation that can be accomplished by almost anyone. A group of eight-year-old boys could hammer many nails in a relatively short time. If the goal is a habitable dwelling, though, or even a serviceable garage or potting shed, no sane person would entrust the job to the boys. Note that the issue isn't motivation or lack thereof and it isn't tools nor skill per se, it's really about basic knowledge concerning the desired result.

Some theoretical knowledge concerning material characteristics and structural formats is required to deliver a result that will be in service for more than a few minutes.

In the information systems world, the analogy is much more apt than we might be comfortable in admitting. It isn't possible to be involved in a database design discussion (or even a data modeling one) without the term "denormalization" popping up. I'm going to use denormalization as a placeholder for a host of other bits of conventional wisdom in the discussion that follows. Relational data design is an application of set theory which, in turn is a branch of mathematics. Normalization rules (or forms) were developed to ensure that set operations would be applicable and would produce the expected result when applied to a database. Denormalization means avoiding the application of normalization; in practice, it is rarely an activity. There is no methodology for denormalization.

My advice to the supervisor, manager, project manager who is told that the database will be (or has been) denormalized is to ask to see the normalized design and to ask questions about how far the normalization was taken. My guess is that you will get a lot of verbal tap dancing and arm waving. Ask that normalization be explained to you at least through the third normal form. Remember that you will never be able to utilize the full power of your relational database engine using set operations on a non-normalized database. You will always need programming to get at the data.

I have participated in many conversations in which people stated unequivocally their feelings about denormalization and normalization and could not articulate any of the normal forms. The same holds true for statements about standards, methods, tools... It is apparently necessary today to have a strong opinion about any topic that arises. I haven't heard, "I don't know enough about that to have an opinion yet." in several years.

George Santayana may be the first to say that those who cannot remember history are doomed to repeat it. He certainly won't be the last. Data and information design theory has not evolved much in the past 20 years nor have software engineering or project management. Despite that, every day there are dozens of exciting press releases trumpeting the newest (always trademarked) approach to data management, project management, system development...

In the words of the Wizard, "Don't look behind the curtain!" The same difficult, complex work must be done today as twenty (or 30 or 40) years ago. We do have better ways today of dealing with simple repetitive tasks but all that really means is that what is left is more difficult, more complex, more rigorous. This is no place for amateurs nor lone rangers. Every system will be the result of a team of experts working for a common goal and relying on one another completely for their individual expertise. The manager or project manager had better be able to recognize when tap dancing, arm waving and smoke emission is taking place.

Or maybe all you want is cheap and/or quick.