Search This Blog

Sunday, November 13, 2011


We don't value now.  We talk about it all the time.  We use it for emphasis.  We mostly use it to separate the past from the future.

Often when we talk about now we have only the fuzziest of notions, the flimsiest of definitions in mind.  "Now is the time for men and women of conscience..."  The concept of now in this instance could mean anything from a generation to a session of Congress to a particular crisis.  Sometimes we use now to mean very soon or as soon as humanly possible (the future) as in, "I need it now!"  Occasionally we use it to bound some time in the past as in, "Now in those days..."

Now has power that, for most people, is unrealized because it is unrecognized.  For our purposes now is the moment of decision.  If we are able to grasp that moment, that now, and use it, we can change our own life and the lives of those around us.  We can use now to create a new future.

In the gap between stimulus and response there is a piece of eternity--now--in which we can decide what the future will look like and decide on the response that will launch that future.  Please be aware of now and use it as it is intended to be used. 

When we are present in our life we are conscious of each now and we use them to create a future that matches our vision.  What is your vision, your ideal?  What will you use to direct the decisions you make in each of your nows?

Wednesday, November 9, 2011

All Governance (like Politics) is Local

Tip O'Neill, the Speaker of the House of Representatives during the Kennedy/Johnson years, is famously said to have offered the advice that "All politics is local."  If there is anyone out there who doesn't understand that Data Governance is politics then wait.

If we're to gain any advantage from the former Speaker's wisdom, we going to have to pull it apart and take a look at all the pieces.  Clearly he wasn't denying the existence of national and even international politics.  He had participated in politics at every possible level so what did he mean and how can we benefit?

First of all the context (which is always eliminated from "sound bites") is that of successful politics.  Which of us doesn't dream of successful data governance?  If we can accept that DG is political rather than technological or administrative or managerial, then we're ready to make use of political wisdom in our quest for successful data governance.

Successful politics is about getting enough people to come with you so that you can accomplish a vision.  Because we're human, we look for shortcuts.  We start by assuming that if we can convince the right person then that person will bring everyone else along.  So we start with our elevator speech in case we find ourselves confined with an influential person for any period of time. 

We also adopt the position that money will equate to support.  We pursue funding which requires approval at the executive level.  In short, we focus much if not most of our efforts on the critical few in the blind hope that all others are followers.

For sheep this may work.  Substantial research has been done on flock or herd behavior in an attempt to understand how humans are influenced to move one way or the other.  We have all seen a flock of starlings or sparrows or a school of fish suddenly change direction--apparently with a single will.  What magic would get people to act that way?  Leaving aside the question of goal or vision, which may or may not involve the common good, if we could master this magical force, think of all the effort that could be put to better use.

I have read some of this research and at the risk of oversimplification the answer lies not in identifying the leader but in identifying the first followers.  When one bird or fish or wildebeest, in motion, changes direction it may be for any reason or no reason at all.  If no one comes with them, they will very quickly rejoin the mass.  If another individual comes along then two going in the same direction exert some "gravitational" attraction that acts to influence others in the vicinity.

In the human world we divide people into leaders and followers.  More generally, we try to create leaders by assigning titles or creating org charts.  As mentioned above, we intend to leverage leadership by devoting our efforts to affecting their path, trusting that they will bring with them enough followers to makes our effort successful.  The problem with all of this is that titles do not confer leadership. 

What lesson can we learn then from Tip's advice?  My own take is that, rather than search for a leader, we might better be a leader, campaigning locally and helping our neighbors and those in need.  When we have one or more others with us because they are benefiting form the relationship we become much more effective in changing the direction of the heard.  Tip understood that grand political movements arise from individual voters recognizing common goals. No legislation is effective when the governed choose not to obey.  Devote your efforts locally and pay attention to what your neighbors in the next block are saying.

Politics is local.

Friday, July 1, 2011

How Would Jesus Do Data Governance (Part 3)

Having tried the preaching route, attracting big crowds, name recognition and a following, (are you with me?) Jesus recognized that the results he came to provide were not being realized. People listened, cheered and went home to the same life they were living before. Anyone who has been involved in the data industry for any length of time will immediately recognize this situation.

Data-driven development, Data Administration, Information Engineering, Data Architecture, Data Quality, Data Management, Data Governance... Cheering followed by more of the same.

What now? Even if you are God and are omnipotent, you still have to get humans to change in order to bring them to the new world--one in which they can rely on one another and in which each acts in light of a defining principle and for the greater good. How long have you been a human? I have more than 60 years of experience in being and being with humans. Jesus was obviously smarter than I. He realized after only 30 years that preaching and teaching just wasn't going to work--even though the Ideal was attractive and clearly in the best interests of all. He was getting, "Well, even if..." and "But everyone would have to buy in..." and "That's not how we understand the policies..."

Maybe this is what you're hearing as well. Maybe you are beset on all sides by scribes and pharisees, nit-pickers and policy wonks. Maybe they are constantly trying to trap you in a bit of heresy or a policy violation. Maybe you've thought about giving up because for some reason the people would rather listen to them than to you.

What did Jesus do? He changed his emphasis from preacher/teacher to minister. He modeled the changes he was talking about and he did it consistently with each and every person he met, meeting each one where they were. He showed them that the past was irrelevant and the future was not a given. He gave them the present and they experienced for themselves that their lives were better.

He did not give them rules, instead he gave them hope and someone to come to. He gave them someone who understood them and who showed them how, by subtly changing their perception, they could obtain victory over the troubles that plagued their lives. He never showed them judgment or condemnation. He always cared about their welfare and he planted seeds of change.

He gave himself for the people long before he gave his life for them.

If you believe that better policies, better standards, better rules, better laws will force people to care about data, I'd like to help you and I hope that I may already have helped you by planting this seed. Sitting in an ivory tower and sending out criers to inform the people of the duke's latest whim will never (as in NEVER) be productive. Maybe you'll think about giving up some day and you'll remember this seed.

Tuesday, June 21, 2011

How Would Jesus Do Data Governance (part 2)

First of all, as I mentioned last time, I am NOT comparing data governance to the Salvation of humanity--I am simply examining methods in light of the fact that both ideas resist description, can't be marketed and are not for sale.

So, I've got an idea that will make the lives of many easier to get through, make the company more profitable, reduce the level of risk involved in making use of information technology, and improve the ability to communicate. My problem is that the idea is based on something I can't describe. I can provide plenty of examples but that only seems to increase the confusion. So far I'm in good company. We've all been exposed to the parables of Jesus. He used them as everyday examples to shed light on some seriously abstract concepts. His disciples scratched their heads and asked each other, "What did he mean by that?" Two thousand years later, they're still being explained to us on Sunday mornings around the world.

Fortunately my idea has no expiration date. There is no competition to put the idea across--no race to the finish. The pain will continue until a critical mass of enlightenment is achieved. Enlightenment is essential because without it all we'll ever have is a collection of examples and stories that, to the average person, seem completely disconnected. Because people like for things to make sense and to be predictable, this disconnectedness leads to alienation.

When Jesus stuck to his idea, looking for better examples, better stories, he found himself on the outside. He was a different sort, something of a kook, but certainly not threatening. He had his small following who were largely content to be associated with him for the limited notoriety that the association provided. They knew him, liked him, respected him. The powers-that-be were not threatened and left him free to try to get his idea across.

Up to this point we have already made us of his methods and find ourselves in exactly the same position. We are tolerated and even receive minimal support from those who hear that others are doing it and seem to derive some non-quantifiable benefit. Unfortunately, as a band of disciples we lack much. We either lack a common cultural foundation or we ignore it. perhaps Jesus' disciples were fortunate in that they weren't constantly bombarded with "fresh, new takes" on their central idea. They were able (forced to) discuss the examples and stories among themselves. This process no doubt kept them cohesive as a group and, in the end was the springboard that helped them to launch the Idea out into the world.

Saturday, June 18, 2011

How Would Jesus do Data Governance?

Recently I've been led to a reassessment of my approach to this whole blogging thing. Originally I had the idea that people would be attracted to a common sense, git'r done view of data governance. This quaint notion was driven by the fact that, although a lot of money was changing hands under the label "data governance" not much was really getting done to realize the vision.

There have been several incidents of late that have honed this approach by grinding away some remaining misconceptions on my part. What is left is the sharp edge--no assumptions, no vain hopes, no vanity, no illusions or delusions--just a cutting edge that can be applied to any knotty issue in the data governance/data quality landscape. Actually this edge can be applied to any knot whatsoever.

First I had to let go of the idea that anyone would listen to and heed an idea just because it's good. We've come to expect marketing glitz. A wise person once said, "All that glitters is not gold." The best ideas are seeds that must be tended over time but which will, in their own time, produce fruit.

Then I had to abandon the notion that people would rally to an idea. As it turns out, people rally to people--to leaders. Here I would ask the reader to consider leaders in general and some specific leaders. What was the basis of their "leadership?" In most cases it was charisma. Too often their followers realized too late the direction in which they were being led. I'm not charismatic.

Finally, I have understood that people will follow reward (most often money) rather than ideas. The great idea of data governance has been crushed, chopped, sliced and diced in search of marketing leverage and greater monetary reward.

All of this honing has left me with a better appreciation of Jesus of Nazareth, known as the Christ. I'm not comparing his mission with data governance--that would just be ridiculous. I do want to look at his methods, however, since many of the problems are strikingly similar.
Stay tuned here for an exploration of "What Would Jesus Do?" applied to data governance.

Monday, February 14, 2011

Chapter One

This will be chapter one of the eventual book. Some things have to laid out very explicitly. No one should ever be able to complain that they didn't get their money's worth or that they thought they were getting something different.

Some basic principles that will govern everything else that is said:
  • Data is part of language
  • All communication about data is, itself, data
  • Language consists of denoted meaning (denotation) and connoted meaning (connotation) and on top of these is layered implication and inference which involve human perspective
  • Nothing about language guarantees communication
  • Communication requires a minimum of two entities from the following set [human, machine, logical construct (e.g., software)]

If we are to get a grasp on data quality we have the best chance of success if we restrict our discussion to ONLY that data that is part of communication between machines or between machine and logical construct. Of course this is neither exciting nor even very useful. We have many specifications formats (ethernet, etc.) that guarantee that communication on some level will take place between machines and logicals. ASCII or EBCDIC are the most basic of communication specification. It doesn't take very long for the alert observer to notice that unless a human is involved somewhere in the process, it doesn't really matter what the communication is.

"Matter" implies human involvement or at least we can infer human involvement from a statement that something does or does not matter. Matter is a value judgment couched in an emotional context. It's only when we start to peel back the layers, asking why or in what sense something matters that we begin to get to the idea of quality. Our exploration, then, will follow the trail of what matters.

As we follow this trail we're going to encounter the idea that what matters is, in many ways, distinct to the judge. What matters to the reader of a graphical novel (formerly comic book) may not be the same things that matter to a reader of War and Peace or a viewer of The Mona Lisa. What matters to someone watching Wile E. Coyote fail in yet another attempt at catching the Road Runner is not the same as what matters to someone watching Being John Malkovitch or Inception.

How then do we determine whose perspective to assume? Whose view matters?

The answer of course is that, where communication is concerned, everyone's perspective matters. It would be a great feat of communication if we could present a context-free (perspective-free) discussion of data quality. In fact, it would be such a feat that we're not likely to ever see it and it certainly won't happen here. Our intent is to zero in (or home in but NOT hone in) on a very small number of perspectives to see what matters to them and then step back to see if there are any common themes that can be exploited. If we are successfull in that, we may have created a springboard for the one who comes after.

In the next chapter we will nominate some key perspectives and begin to investigate what matters to them.

Saturday, February 12, 2011

DQ: More Than Meets The Eye

As we move forward toward a view of data quality that allows us to create and use a language specific to DQ issues, descriptions and solutions, let’s take a minute here to examine the behavior of data.

Certainly, one of the attributes of quality data is that it is well-behaved. In other words it consistently delivers value according to principles that are applicable because of its type, domain, range, relationships, maturity, purpose(s)…

It is useful at this point to differentiate between static and dynamic properties of data. Any DQL (data quality language) that we might define should work well where static properties are concerned. When we begin to consider dynamic properties, the task becomes much more complex. The greater the number of dynamic properties, the greater will be the complexity.

Our chances of designing a DQL will be significantly greater if we can restrict ourselves to static properties only. Before we can do that, we have to understand the dynamic properties and assess their relative importance. Can we carve them out of the discussion? Will excluding them compromise our DQL’s capabilities?

Looking back at the list in paragraph 2, the first three properties might be thought of as static. These are the focus of our modeling efforts or, if we only pretend to do modeling, of our programming efforts. There is a tangent here that we’ll resist for now, but at some point we have to come back to it. The question of how data is initially defined is huge and the effect of initial definition on the lifetime of a datum and in particular on its quality is not to be underestimated.

For now, though, we’ll put that on the back burner. We expect the individual pieces of data to possess a definition (usually called a description), and our DBMS requires that we say what kind of data it is. Is it variable length text strings, a specified number of characters, integer, floating point, money, date/time, etc. It is surprising how many data are defined to the DBMS as varchar. It shouldn’t be surprising since all of our modeling tools allow us to set a default type and the default for the default is always varchar(n). This is popular because it guarantees that any value supplied will be accepted. Oops, another tangent almost sucked us in.

The final three items in the list are dynamic in the sense that their values can and will change, sometimes rapidly and usually unexpectedly. Let’s take the last first. Purpose, as “fit for…,” will change whenever we’re not paying attention. We hope that our stewards will be on top of this but pragmatically (everyone likes pragmatism), they may be too close to the business itself so that changing business needs or drivers loom so large that defined purpose fades to insignificance.

Maturity is also dynamic. We expect maturity to change over time. When we think of data maturity (if we do) we include stability (of all the other properties), quality metrics that have flattened out, recognition within the enterprise and probably several other aspects.

Finally, we have to face relationships. We’re not very good at relationship management. Some of us wouldn’t recognize a relationship if it sent us a valentine. Others pile all sorts of unwarranted expectations on top of our relationships and then wonder where has the quality gone.

It all starts in the modeling phase. Chen, when he invented a graphical notation for describing data, gave equal weight to entities and relationships. Both had a two dimensional symbol and the opportunity to possess attributes. For many reasons, not least perhaps that tool developers didn’t grasp the importance of relationship, “data modeling” tools eventually turned a multi-dimension, real thing into a single one-dimensional line that is only present all as a clue to the schema generator to copy the identifier from one of the linked entities into the attribute list of the other and label it as a foreign key so that the database engine can build an index.

Although I find examples are often counter-productive in the discussion of data quality, one example may illustrate the role of relationship in completing the semantic of a data set. PATIENT is such a common entity in the health care marketplace that no one even bothers to define it. It is a set of “demographics” by which we mean the attributes and it has relationship with PHYSICIAN or PROVIDER. It probably also has relationship with Visit or Admission, Order, Procedure, Prescription, Specimen and other entities of specific interest to the enterprise such as EDUCATION_SESSION, CLAIM…

It deosn’t take long to figure out that the relationship between patient and physician is more complex than can be accommodated by a single foreign key. A physician can “see” a patient, refer a patient, treat a patient, consult (with) a patient, admit a patient…the list goes on and on. Each of these relationships has real meaning or semantic value and may even be regulated by an outside body. Typically, these are implemented by a single foreign key attribute for each.

Now, imagine a situation in which an in-utero procedure is scheduled on a fetus. You may be aware that transfusions, heart valve repair and a host of other medical procedures are actually performed on the fetus while it is still within the mother’s womb. So, who is a patient? If the facility also terminates pregnancies for any reason you can see the conundrum. Medicine doesn’t allow for terminating the life of a patient (Dr. Kevorkian excepted). At the same time, we would like to sometime treat the fetus as a patient, perhaps for reasons of safety. We also experience the lack of values for attributes that we may have viewed as mandatory, e.g., DOB, SSN.

It is only when we explicitly talk about relationships that these issues emerge. Relationships cast light on the entity from all angles.

Relationships also represent the business processes that inform the purpose of the data. Often, undocumented meaning gets attached to data. Two analysts will get together and agree that for the purpose of this analytic, this combination of attribute values will be included (or excluded). For a given ETL job, we decide that an attribute value that isn’t on the approved list will be replaced with “&”. The adjustments to business processes are constant and usually undocumented and unnoticed. Until we can point to a documented process/relationship, we have no way of capturing and dealing with changes.

What’s the difference between an association and a relationship? Somewhere in there we’ll find clues about dynamic quality properties. One thing leaps out as a property of quality and a property of relationship—expectation. When we claim that something has quality, we establish an environment in which it is permitted to have certain kinds of expectations. The same is true of relationship. When two parties or entities enter into relationship they agree as to the expectations they will have of each other.

In our quest to define quality for data, we will be forced to document expectations and to monitor accountability with respect to those expectations.

Monday, February 7, 2011

Data, Governance and Data Governance

There have been some great discussion threads on the IAIDQ LinkedIn group recently. One thread that attracted a lot of attention started with a question from a PhD candidate in Data Quality. It simply asked whether there is an accepted definition of data quality. 200 replies later, most people would say, "No." More recently a thread began by bemoaning the fact that there is no accepted definition of Data Governance. A lively discussion followed that continues even now. Yet another refers to an article on five reasons to cleanse downstream instead of preventing upstream.

I am about to shed some light, however feeble, on the subject. Allow me to start by admitting that I am a person who likes to do the analysis necessary to solve a problem. Though my patience is improving with practice, those who know me will back me up when I say that I want to solve a problem ONCE.

Previous posts here have explained the abstract nature of data and all that implies in terms of getting people on board when it comes to doing something about quality. People will listen to or read a horror story about some preventable data issue that cost Company ABC $umpteen million. They will nod sagely and say something like, "They should have seen that coming." They are simply unable to see that their own company is engaged in the exact same practices and completely at risk for the $umpteen million.

Friends, it isn't just our favorite whipping boys, Management. There is no more recognition within IT than there is in the boardroom. Our boxes and wires friends think of data in terms of DASD and Raid configurations or bandwidth and throughput. Our developer pals don't really think of data at all except as the fuel that activates their code. Architects appear to be concerned with the storage and throughput views overlaid with an access management filter. They seem more concerned with making developers and DBAs happy than with the quality of the asset.

Enter Data Governance, which in most instances wants to be about definitions, rules and "enforcement." Often Data Governance tries to heap another thick layer, called meta data, on top of all the data that is already being mismanaged in the organization. It's often the case that Data Governance fails to practice what they preach.

Here's the revelation: Data Governance isn't about data. Data Governance is about process. It is the means to the Data Quality end. I have already said that Data Governance is that part of corporate governance that is dedicated to stewarding the corporation's data asset. It is exactly analogous to the role of Finance/Accounting with respect to the capital asset. Unfortunately, Finance has two things going for it that Data Governance doesn't have: GAAP and audits.

Generally Accepted Accounting Practice is a set of guidelines for money management processes that are accepted as the name implies and USED nationally and internationally. The use of these practices insures that processes will be auditable. The audit process verifies that GAAP was used and if there were exceptions, that they were clearly noted with enough information to allow the results to be brought back into alignment with GAAP. The underlying theme is that if the processes were sound then the result is believable.

Imagine if every company of any size whatsoever were able to devise and use its own bookkeeping structure and process. There could never be a stock market. Equity trading would be too risky for anyone and all business would essentially be sole proprietorships. Moreover, there would be no chance of oversight by outside bodies (Government).

This is a picture of the situation with respect to data today. When will it get better? Data Governance has no power to make the situation better. Without an externally defined data management framework and periodic audits by independent auditors, there will be no improvement. In the meantime, if data quality metrics improve, it's only because some particularly strong and charismatic personality is present.

No one questions the need for accounting nor the rigor of accounting procedures. Actually the same can be said for data governance and data management procedures. The difference being that in the case of money, the lack of question results in compliance while in the case of data it results in apathy or confusion.

Does the data world have something like GAAP that could become the necessary process infrastructure to support data management audits? I don't see it. Data is still too personal, too subjective, too misunderstood to attract the attention of researchers. Data management is a black box to virtually everyone and they like it that way.

People prefer to cleanse downstream data because their customers fell their pain being relieved. Happy customers is the goal after all. The bonus is that cleansing provides an unending source of employment for those doing the cleansing. It's win-win! People aren't going to be highly motivated to change a win-win scenario any time soon.