Search This Blog

Sunday, September 9, 2012

Reach the Unreachable Goal

Information quality, like many information and technology subspecialties, seems to drift in and out of focus.  Sometimes I find it difficult to understand why the subspecialty exists.

A recent epiphany concerning words and language has sent me on an outward-bound trajectory.  I used to chafe at the "definition" of quality in a data context--the one that defines quality as "fit for purpose."  I am now ready to break with it completely. 

My epiphany had to do with the fact that meaning is not packaged up in words and phrases.  Rather meaning is hiding behind and beneath them.  Well-chosen words can be used as markers to stake out the boundaries of meaning or even, if we're careful, to constrain meaning much like a fence.  When we focus on the stakes or the enclosure we risk losing the meaning that's inside.

We have all sat in high school (or college) literature classes and debated what the author or poet meant--what was the meaning that he or she had captured within the fence of language they had constructed.  What we have failed to see in our information technology context is that business leaders, managers, consultants and others are exactly like those authors (only usually not nearly as careful in their use of language).  We always have to ask what they meant and when we investigate, we invariably find that they didn't really know.

For more than 30 years we have labored to constrain the use of terms and encourage (or even enforce) the standardization of terminology.  The issue seems to have taken on even more importance with the widespread adoption of newer reporting (business intelligence) technology that brings the data full circle.  The leaders who were unable to stay focused long enough to generate good requirements are now launching their BI desktop and seeing the result.

I think that if we are honest with ourselves we will realize that though we now have better titles for what we do and we are getting paid better to do it, we would have to admit that our goal is not attainable.  We are being held accountable for The Quality of Data because that's how we have labeled ourselves.  We haven't even learned from the tribulations suffered by the other quality disciplines.  At least we should begin calling it data quality control or data quality assurance.  We have to focus on improvement in processes rather than improvement in data.

Look inside the fence I have built here and see if you can find some meaning that will help you.

Friday, June 22, 2012

Language, Information, Data and Quality

Hat in Puddle
Hat in a Puddle
Language is a slippery thing.  For many years I conducted my life in the firm belief that if I were only precise enough in my selection of vocabulary, clear enough in my choice of syntax, I could convey an idea without ambiguity to any audience.

I have frequently been disappointed with the result. There is a force at work that allows people in the audience to navigate their own way through the meaning of my language.
Those in the data and information industry are accustomed to thinking of meaning as the semantic of the data.  Nothing could be further from the truth.

The dictionaries are full of semantics.  We can choose from a rich set of words (semantic tokens) to describe the situation represented in the picture above.  Note that we can change the meaning of the set of semantic tokens by several non-verbal (without words) methods including tone and inflection.  For example "hat in a puddle" has a different meaning than "hat in a puddle" or "hat in a puddle."

When we talk about semantics, we mean the meaning denoted by the words.  Alas, as humans we must also deal with the connoted meaning that each of us associates with the words.  Words and collections of words invoke in us memories, hopes , desires that are not part of the semantic but are part of the meaning.

The situation is very much like the puddle above.  Most people will accept the puddle at "face value" and simply avoid it so as not to get wet or muddy.  Others will make assumptions and develop expectations based on their personal experience with puddles.  Some of these will not change course, especially if their experience and their current situation allows them to expect that they won't get muddy.  Note that a new dimension was just introduced--a temporal dimension that allows us to react differently now than we might have an hour ago.

Appearance is Not Meaning
All Meaning Not Apparent
A man walking along a road saw a hat in a puddle and recognized the hat as one usually worn by his neighbor.  He thought to pick it up and return it.  When he picked up the hat, however, he saw the face of his neighbor.  He asked whether the neighbor needed help.  "I'm all right.  I've got my horse under me," was the reply.

The face value of words (and appearances) is accepted by most people and used to support decisions of all kinds.

Poets understand that meaning is not conveyed by words.  "Wait a second!" you say, "Poetry is composed of words."  We're both correct.  The meaning of a poem (or a story) is created by all the images, memories, hopes, dreams and desires that those words evoke in us.  This is why everyone who makes rhymes is NOT a poet and why everyone who has a command of vocabulary and syntax is NOT an acclaimed author.

This is the world in which we attempt to improve data quality.  While we may aspire to improve the quality of the semantics, it seems clear that we will never influence the quality of meaning.  This is, perhaps, what "fit for intended purpose" tries to convey.  What if the semantic tokens were musical notes instead?  What if they were colors or smells?  Would we be as confident?

What if we ceased our attempts to control the perceptions of an audience and instead created ways for our audience to explore the boundary between semantics and meaning?

Monday, May 14, 2012

Whole Cloth or Patchwork

Data Quality is too big to conquer. 

Customer mailing addresses or patient demographics are big enough challenges for most.  What is the common thread that, once followed will allow for a holistic approach to data and information quality? 

The picture at right (captured from Wikipedia and source unknown though the file name was in German) gives an idea of what I mean.  The weft (or woof) is the continuous thread while the warp is comprised of individual threads such as tools, methods, metadata, process, culture, management, governance and so on.

The whole cloth can smother any initiative while the individual threads of the warp provide gainful employment and career advancement for (at least) thousands.  Anyone with thread or yarn experience knows that, wound on spool or into an organized and well-designed ball, it is useful and can be applied effectively to many purposes.  Off the spool and out of control it is trash that must be thrown away.

If the whole cloth is data quality, then what is the weft that makes it more than a collection of individual fibers?  When we wish to weave data quality, we don't get to choose our warps.  We have to take everything as we find it and somehow turn it into the blanket that everyone wants.  Sometimes we find that we have a small patch of that blanket but no way to merge it with other patches because the weft is too specific and insufficient.

Sometimes to find a way to take several such patches and turn them into a quilt.  Rarely, the quilt provides the security needed.

Some questions that occur to me:
  • Can data quality be a patchwork?
  • What are the candidates for the weft that will bring all the variations of warp together into a consistent fabric?
  • Can we content ourselves with becoming masters of a specific warp thread?  If enough such masters emerge, can they collaboratively create data quality?
  • Is computer science part of the warp?  How about MIS?  Psychology?  Anthropology?  Or are these kinds of things that can be twisted into the yarn of the weft?
  • How do management, governance, leadership, software development, system design and architecture, data design and architecture, process design and architecture... fit?
  • How do (data) modeling, metadata management and various other data-specific technologies fit?
  • How do commercial products and tools fit?
I'm on the lookout for new questions but if I ever come across an answer, I'll certainly scoop it up and add it to my basket.