21% of senior IT executives believed that poor data quality costs their company between $10 and $100 million per year, according to a recent survey. And less than 15% believe their data is high quality.
Data is coming in through all sorts of processes, manual data-entry, batch feeds from 3rd parties, ecommerce, and the odd quick patch.
It's little wonder that inconsistencies creep in. And these problems can have a significant impact on your business, its credibility and its bottom-line.
Data Profiling can help you understand your data, find issues and inconsistencies quickly.
So here are our 10-steps to understanding your data, with Data Profiling:
- Focus on your business goals. We all have limited time and resources, so understand the key questions you're asking of your data and focus you Data Profiling efforts on the data critical to those.
- What data do you actually have, and where is it?
- What matters, and what does it mean? Which attributes are critical to you business questions? Is everyone clear on the meaning of the key attributes or business terminology? For example, does everyone agree what constitutes a lapsed account?
- Attribute-level checks. This is a key area for Data Profiling, in part because Data Profiling tends to focus on a single dataset, but also because this is where you can get real returns for very little effort.
- How many unique values does this attribute have compared to the whole dataset?
- Do the min and max values look reasonable (for example dates of birth in the future)?
- Can you infer meaning from the values (e.g. "M" & "F") and do the other values look reasonable in that context?
- Is the same value represented in multiple ways ("UK", "United Kingdom", "US", "USA"...)?
- If there are large numbers of unique values, look instead at the patterns or format of the data. Do postcodes and social security numbers conform to known formats?
- Cross-attribute checks. Slightly more complex, but here you are looking for, and validating, relationships between attributes. Does one attribute imply the allowed set of values for another?
- Duplicates & keys. Here you need to check known keys for duplicates, Null's and missing values. You also need to look for attributes which may be keys (they are nearly all unique). Additionally, check for any unexpected duplicates, for example look for duplicate zip codes or social security numbers (though this depends on your business rules of course).
- Cross-database validation. While we've just said that data profiling is generally performed against a single dataset, there are obviously times when you will want to check data across databases, e.g. for a data load or migration project.
- Reuse and keeping up to date. Invariably, you will be asked to check a table or file more than once. At the very least you will what to check that issues raised during the Data Profiling exercise have been fixed. Therefore you need to perform your Data Profiling in a manner which allows you to rerun and compare your results at a later date.
- Action plan. Of course all these checks and reporting will be of little value if not actually put to some use. You should draw up an action plan to take your data analysis further and to generate a real return on the data profiling exercise.
- To it now. Data Profiling is easy and cost-effective. If you have data take a look at it today. I can guarantee that you will find something of interest.
Citrus Technology provide
Data Quality and
Data Profiling tools to help you understand your data; to find patterns, issues and opportunities. Visit our website for a free trial of our software.
Loading...