It is a bit of a mouthful but it does make a little more practical sense. Each element of the sentence can actually be broken down to make a judgement call on whether the data sitting in front of us gets a tick or a big cross beside it for that part.
Let’s ask the data management professionals. permalink
The potential of stored data against “100% complete”.
It does allow for a judgement call on what “100% complete” means though. Are all data elements mandatory or critical to our data collection and business requirements?
If the purpose of our CRM is to store customer details for future marketing but we don’t collect their email address, would we count that data as complete? What if we collected first name and surname but not middle name?
Data are valid if it conforms to the syntax of its definition.
Huh? In other words, make sure it fits the type and format of what you expect it to.
A date can’t be on the 35th of any month. A phone number shouldn’t have characters in it (555-BETTER-CALL-SAUL doesn't count). If your credit card number should have 16 digits then a 14 digit number is invalid.
The degree to which data correctly describes the "real world" object or event being described.
This is closely tied in with the Validity assessment.
If a phone number is not in a valid format then the data can never be accurate. If the phone number is in a valid format but one digit is incorrect then it meets the Valid criteria but still fails the Accuracy test.
Are there differences in the information shown by the same data point on different systems or data tables across your organisation?
Is Alan’s date of birth different on the Customer table than it is on the Staff table? Have you closed bank branches but your weekly sales report still shows new sales against that closed location?
Both can’t be right. You just failed your Consistency test.
Do we need to have all of these to make our data quality “good”? permalink
No, it doesn’t really work like that. And it’s always dependent on the requirements set out by your business.
As we have seen above, data can pass on one test yet fail another or fail on several and pass another. You could have a customer’s email address record which is consistent across multiple tables but still fails both the Validity and Accuracy tests (e.g. alan@hylands - it has to have a dot and a domain extension at the end to be valid for a start.)
DAMA also make the fair point that data may pass all six dimension tests with flying colours but still fail a basic usability test.
If the data is all in English and your business is based in rural China then it’s still bad data quality regardless of how consistent, complete, accurate etc. etc. it may be.
So if data quality is essentially subjective, how do we know what high quality data looks like? permalink
DAMA’s six tests can certainly help you begin to judge the quality of your data. But we have to take things out of the theoretical realm to really find out how much quality there is lurking beneath the hood of our data systems.
If you are building a data report or analysis, can you trust that the data you are using is both complete and correct? Only one way to find out. Build some reports and start to see what it looks like.
I gave an example above of a bank closing local branches but still seeing those branches appear in their weekly sales reporting for years afterwards. We’ll call this a "hypothetical" example.
There may be nothing wrong with the source system or the data entry of the sales clerks who input the customer details. It might legitimately be a quirk of the way each individual customer is tagged to a certain branch location when they become a customer. And that’s fine.
But how does that impact our data quality score when this branch location is used for sales reporting further down the line?
What are the implications of our low data quality? permalink
It can erode confidence in the overall business intelligence being produced.
It can negatively impact the sales figures and potential bonuses for the branch staff who actually sold the new product.
And neither of those leads to positive results for your business.
Low data quality leads to multiple versions of the truth, conflicting reporting and increased support time and expense for the data factory who have to go around cleaning up the different messes.
And if it starts to impact the quality of reporting to government or regulatory bodies then it can become VERY costly indeed.
It sounds like good data quality is nearly impossible to achieve in the real world. permalink
Simple response is: yes, it is very difficult to achieve especially across a large organisation.
It has to become part of a full life-cycle quality control approach from the entire business to have any chance of becoming successful in the longer term.
From ensuring user interfaces don’t allow staff or customers to input garbage to quality control measures and reporting being implemented at each stage of data production and storage. This can become VERY expensive and onerous VERY quickly.
I have foreshadowed above (in a very subtle fashion) what I don’t think you should do. But that will come as little surprise. I’m no fan of meaningless bureaucracy in any form.
To get it right, you will have to start off with as simple a plan as possible.
Audit your existing data and see where you stand on the DAMA 6 principles. You can then start to see where your main problem areas are.
Planning to implement an email marketing campaign but find your customer details are spread across three systems that don’t talk to each other? Consolidate, cleanse, de-dupe and match your customer records into one master Customer database.
Ran an analysis to find your customer base’s product penetration levels are well below industry norms? Maybe you have customers sitting under multiple customer identifiers which again need to be matched and consolidated under one record.
Found thousands of your business customers are tagged as sole traders and are availing of sole trader discounts but have the phrases Ltd or Limited in their company name? Run a simple query to identify them by name and get the tagging re-calibrated.
None of these solutions are rocket science but they are daily data quality occurrences for tens of thousands of businesses of all sizes all over the world.
I’ve dealt personally with all of them. You don’t have to go looking too hard to find them, that’s for sure.
It’s not always someone’s fault. Sometimes the best of intentions to fix the situation can backfire and cause more data quality issues down the line.
Let’s say a bricks and mortar business collects customer data when they are buying a new consumer good in-store. And the business now wants to collect email addresses for future (cost-effective) marketing campaigns.
So they get their IT team to change input validation rules and enforce an email address being entered for each new record on the sales system.
Not every customer wants to give over their email address. Some don’t even have one.
So the customer support staff on the frontline have to input a dummy email address for every one they don’t legitimately have. This wreaks absolute havoc in terms of email bouncebacks and poor marketing reach metrics for the Marketing team further down the line.
These decisions, day in, day out, have real, long-term implications for the quality of your business data. That’s why it needs a Total Quality Management approach to the full life-cycle of such decision making.
And it needs proper data people to be involved at all stages. People who understand the consequences of getting dirty, poor quality data through the pipes that can cost millions to clean up.
What should we do today to help improve the quality of our business data? permalink
Sadly, it’s often easier (and quicker) to mess up a perfectly good quality data system than it is to tidy up a bad one. Such is life.
But that doesn’t mean we throw our hands up and just let our (data) world burn.
Taking simple steps to audit, monitor, clean and revitalise our data can have almost miraculous-looking results if we concentrate our focus.