This blog addresses all things related to the safe handling of data and information. It is not for the faint of heart. Draw nigh all ye searchers. Learn and teach.
Search This Blog
Monday, February 7, 2011
Data, Governance and Data Governance
I am about to shed some light, however feeble, on the subject. Allow me to start by admitting that I am a person who likes to do the analysis necessary to solve a problem. Though my patience is improving with practice, those who know me will back me up when I say that I want to solve a problem ONCE.
Previous posts here have explained the abstract nature of data and all that implies in terms of getting people on board when it comes to doing something about quality. People will listen to or read a horror story about some preventable data issue that cost Company ABC $umpteen million. They will nod sagely and say something like, "They should have seen that coming." They are simply unable to see that their own company is engaged in the exact same practices and completely at risk for the $umpteen million.
Friends, it isn't just our favorite whipping boys, Management. There is no more recognition within IT than there is in the boardroom. Our boxes and wires friends think of data in terms of DASD and Raid configurations or bandwidth and throughput. Our developer pals don't really think of data at all except as the fuel that activates their code. Architects appear to be concerned with the storage and throughput views overlaid with an access management filter. They seem more concerned with making developers and DBAs happy than with the quality of the asset.
Enter Data Governance, which in most instances wants to be about definitions, rules and "enforcement." Often Data Governance tries to heap another thick layer, called meta data, on top of all the data that is already being mismanaged in the organization. It's often the case that Data Governance fails to practice what they preach.
Here's the revelation: Data Governance isn't about data. Data Governance is about process. It is the means to the Data Quality end. I have already said that Data Governance is that part of corporate governance that is dedicated to stewarding the corporation's data asset. It is exactly analogous to the role of Finance/Accounting with respect to the capital asset. Unfortunately, Finance has two things going for it that Data Governance doesn't have: GAAP and audits.
Generally Accepted Accounting Practice is a set of guidelines for money management processes that are accepted as the name implies and USED nationally and internationally. The use of these practices insures that processes will be auditable. The audit process verifies that GAAP was used and if there were exceptions, that they were clearly noted with enough information to allow the results to be brought back into alignment with GAAP. The underlying theme is that if the processes were sound then the result is believable.
Imagine if every company of any size whatsoever were able to devise and use its own bookkeeping structure and process. There could never be a stock market. Equity trading would be too risky for anyone and all business would essentially be sole proprietorships. Moreover, there would be no chance of oversight by outside bodies (Government).
This is a picture of the situation with respect to data today. When will it get better? Data Governance has no power to make the situation better. Without an externally defined data management framework and periodic audits by independent auditors, there will be no improvement. In the meantime, if data quality metrics improve, it's only because some particularly strong and charismatic personality is present.
No one questions the need for accounting nor the rigor of accounting procedures. Actually the same can be said for data governance and data management procedures. The difference being that in the case of money, the lack of question results in compliance while in the case of data it results in apathy or confusion.
Does the data world have something like GAAP that could become the necessary process infrastructure to support data management audits? I don't see it. Data is still too personal, too subjective, too misunderstood to attract the attention of researchers. Data management is a black box to virtually everyone and they like it that way.
People prefer to cleanse downstream data because their customers fell their pain being relieved. Happy customers is the goal after all. The bonus is that cleansing provides an unending source of employment for those doing the cleansing. It's win-win! People aren't going to be highly motivated to change a win-win scenario any time soon.
Thursday, October 22, 2009
The Problem With Quality
- Meeting the spec[ification]
- Documented adherence to established [process] norms
- The product's effects are primarily positive (for example, it tastes good and doesn't make me ill)
- I'll know it when I see it
Multiple choice: which of the above (choose only one) is the definition you want applied to your new car?
Does the answer change if we apply the definition to your morning cup of coffee?
Last question: which definition do you apply to the next example of "business intelligence" that comes before you?
I have two points in mind. First, defining quality is not an exact science even within a specific context. Second, #4 may be the deciding factor regardless of #s 1-3. In the end, the consumer/customer merely has to say, "That's not what I was looking for" to relegate a product to the trash heap. We all know that it does no good to say, "This is what you asked for" (meeting the spec) or "I did it just like you told me" (followed established procedure) or "One won't hurt you" (tastes good and not sick--yet).
So what is quality and especially, what is data quality? How we obtain data quality is completely dependent on the answer to this question.
I'd like to suggest that we divide the question in order to produce at least one useful answer. If we examine data quality from the perspective of a computer and its logic, we can come up with an answer that will allow us to progress. The second perspective is obviously the consumer/customer or the human perspective.
Recently I received an email with what at first glance seemed like an innocuous statement full of typographic and/or spelling errors, but when I actually looked at it, it was a nearly perfect illustration of a principle I have been talking about for years.
Teh hmuan mnid is cpalbae of miankg snese of amslot atynhnig taht ftis smoe bisac ptratnes.
This is the principle that draws the line between computer logic and human "logic". It is also what makes programmers (an outmoded term, I know, but best suited to the point I'm making) so vitally important. There is only one role in the continuum of roles involved in producing an information system product that must bear the full weight of responsibility for the integrity of the data quality at the boundary between computer and human. That role is best thought of as programmer.
Unless you earn your living as a programmer, some alarms may be going off. In fact, if you are a programmer, some alarms should be going off. I earned a degree in computer science and made a living as a programmer. From there I moved into data modeling, data administration and database administration. Now I'm involved in data quality and governance. In all of that time, I have never come into contact with any training, education, book or even a job description that addressed my accountability for preserving data quality at the man/machine interface.
This may be a poor forum for this, but my intention is to change this situation right here. My next few posts will present some background on how a programmer might live up to this responsibility and some of the forces that will need to be fended off in order to make it a reality.
Next: Programmer as Data Quality Champion
Thursday, July 9, 2009
Can and Should
Do you want your future to be composed of cans or do you want a future of shoulds?
Should is closely related to could.
If you could do what you should do, would you do it? If you should and could but don't, what kind of future do you have before you?
Is your past characterized by "might have", "could have", "would have", "should have", or as my father was fond of saying, "mighta, woulda, coulda, shoulda?"
What's the difference between could and can? It might be knowledge or it might simply be practice. For many people, the biggest difference is the realization that there is something beyond "I can." Parents fill this role as do teachers, mentors and good friends. The process of revealing the new world of could is known as coaching.
What we should do is a function of goals, history and current context. Most of us get paid to know what what should be done. Most of us also take the easy way out and do what we can rather than what we could or should. In fact, "Do what you can," has become a universally accepted surrender. When the boss says it, it means that
- they don't know what should be done
- they don't know what could be done
- they don't want to be bothered with knocking down roadblocks
- they don't really care about the outcome
When I say it ("I did what I could.") it means
- I know what should have been done
- I know that I could have done more
- I told them but they wouldn't listen
- I was not committed to a quality result
We nearly always allow ourselves to choose the familiar path. When faced with a choice between can and could, we choose to do what we have done in the past--can.
We cannot get the data quality we need unless we have the governance we need and we can have neither if we continue to do as we've always done. This is macro as well as micro advice. Governance is not committees and steering groups, though it may have need of such. Data quality is not one definition, though that may be helpful. Both are about contextual consistency and predictability. This goal could and should be achieved in whatever ways are appropriate to the context within which the consistency is desired.
Consistency is a product of process and the foundation of improvement. Once the process produces consistent output, you have freedom to classify and categorize its output in whatever ways are suitable to its customers. We are currently engaged in trying to classify, warehouse and use inconsistent products created by inconsistent processes.
What could we do? What should we do?
Tuesday, March 24, 2009
The Beginning (3)
From the standpoint of business intelligence and our four characteristics, we would want to pay special attention to what the programmers are doing or not doing with respect to definitions (semantics). The data architect will have spent considerable effort in researching and compiling information about the data. They will have learned about how various kinds of data relate to each other for different business functions and users and they will have defined quality rules for each kind of data.
The process standards, to be monitored by Quality Assurance and warrantied by Quality Control, will ensure that the programmers have those definitions and rules in a format that they can use and that they do, in fact, use them.
If your programmers work for someone else, the processes and standards will be about acquisition. They will ensure that the data definitions and relationship rules embodied in the application or system are compatible with those of your business. You are going to have to lean hard on your vendors and they will squirm and plead "proprietary." The best advice that I can offer is to walk away from this vendor. Another vendor will be happy you asked because it will allow them to really get close to you and they will be proud of their quality processes. The ones who drag their feet do so because they aren't able to produce the assurance you need. Proprietary is a euphemism for we don't know.
Become interested in these things. Ask questions. Expect answers that you can understand. Don't accept arm-waving and diversionary tactics. You will be well on the way to business intelligence from a high quality, reliable data resource.
Next time: the "business" role
Thursday, March 12, 2009
Measuring Governance
First, let's agree that data governance is like any other governance except that it focuses on data. A governance program directed at process or at competency or whatever, would have the same characteristics? OK, I'll attempt a justification for that statement.
What do we ask of a data governance process? What are the objectives? By the way, I use the term process here in the sense of a set of activities that are ongoing and have a consistent purpose. The purpose of the data governance process is to:
Optimize the value of the data resource by insuring that the capture, storage, retrieval, destruction and use of the resource is done in accordance with established policy, procedure and standards.
Do you buy it? If not, I'd be pleased to discuss alternative purposes, but the remainder of this discussion is based on this purpose.
Based on the purpose of data governance then, several perspectives on measurement suggest themselves. The most obvious one is the QA (quality assurance) perspective. How are we doing at following established standards? It is tempting to count the number of standards, policies and procedures because counting is easy to do and there is a tendency among the governors to equate many laws with good government. Strangely enough, among the governed the emphasis is on the quality of the laws rather than their quantity. A small number of effective and easily understood standards may deliver more benefit than a larger number of over-specialized or esoteric ones.
The most effective measurement will be part of the standard or process itself, but some organizations may find it useful in getting governance going, to do retrospective analysis to see how well/consistently processes are being applied. Health care makes extensive use of the "chart review" to gather this kind of data retrospectively. Measurement intrinsic to the process or standard has the potential to be much more nuanced and useful than that done retrospectively simply because all of the context is available.
Clearly, though, the nature of the metric(s) is very much determined by the process or standard itself. For this reason, it makes no sense to discuss metrics or KPIs (key process indicators), a special kind of metric, without first establishing the process context.
Other perspectives might differntiate among standard, process, and policy or might measure in conjunction with the data life cycle, specific subject areas or specific usages.
One last point, should you be tempted to think in terms of measuring accountability.
Accountability in the absence of a standard is really approval.
No governance mechanism can exist for long based on approval. Each change in "leadership" will create massive turmoil as everyone seeks to reorient to a new approval model.