6 Ways to Improve Data Quality

Home \ 6 Ways to Improve Data Quality

Data & Analytics, Strategy & Architecture

Insights

Data is at the heart of every business decision, and is central to opportunity for competitive differentiation, innovation, process efficiency and powering new business models. The quantity and scale of data is exponentially rising, but the value of collected data is low if it is not fit for purpose. Worse still, with data use also scaling, poor quality data may strip business value due to the impact of the resulting business decisions.

Improving data quality is something many organisations are focused on and the first key step is a shared understanding, agreement and expectation by all stakeholders around the benefits of good data quality for it to deliver on its promised value. What follows is a continuous cycle of improvement. In this blog I will share some of the key areas to focus on that will help improve your organisation’s data quality for better customer experience, business decisions and operational efficiency.

Poor Data Quality: Causes and Impacts

Businesses of all sizes are at risk of poor quality data, which can result from a myriad of reasons such as human/user entry errors, system inconsistencies, physical data model inconsistencies and uncatered-for process variations. In our experience organisations lacking a strategic approach to automation and application integration to keep systems in sync are most likely to be at risk of having poor quality data.

Naturally the impact of poor quality can be far reaching and affect your business in a number of ways as summarised below:

Customer: reduced service levels, dissatisfaction, inconsistent experience

Revenue: loss through customer churn, missed revenue opportunities, inaccurate reporting

Compliance: inability to meet regulatory requirements or exposed audit issues, flawed insights

Operational: increased cost of service, and general technical and labour costs in data management

Technical: setbacks when updating or migrating systems

Gartner puts it a bit more bluntly: “Poor data quality destroys business value”. Their recent research showed organisations estimated the average cost of poor data quality at $10.8 million per annum.This number is likely to rise as business environments become increasingly digitalized and complex.

Ways to Improve Data Quality

If data quality is an issue for your organisation, you might be feeling a bit overwhelmed about how to start addressing it because – let’s face it – there is no overnight fix. It’s a journey of continuous improvement that needs to be addressed as such at the strategic level and everyone who touches or relies on data needs to understand its importance.

Below are six practical methods that will provide a solid basis for data quality improvement for any organisation.

Access to the Data

The first step of increasing your data quality maturity is to gain access and visibility of current data across your enterprise, which will look different depending on your IT architecture. Every business will have different challenges to getting visibility of their complete data estate, but below I have listed some strategies that can be applied to improve access to data throughout your organisation:

Centralise the back end data querying access point (this could be physical federation)
Virtualization (logical federation)
Centralised dashboard and reporting location
Making various different physical data formats accessible in a single structured way e.g. data lake

Data and Process Modelling

Data and process modelling analyses what the business is currently doing, projects what it desires to be doing in the future and discerns what data models are necessary to enable these processes. Data models in various systems need to be assessed, deriving a logical model for each domain and then analysing the business processes around the key data entities to determine whether the logical model/understanding of the data works with these processes. This too, reaps great results when it is managed in an ongoing capacity.

Modelling the data and processes forces data/business analysts to understand the data and its attributes and how it is used. This understanding of the data model is a first step towards knowing whether data is fit for purposes i.e. good or bad quality data.

Data modelling without an understanding of how the business processes work with the data leads to lower quality data or data not fit for purpose. An example I have seen where this has been an issue is in the staff domain where a person’s first name attribute originated from a preferred name in the HR domain instead of the legal name, the payroll process for tax encountered issues as it requires the legal name to be sent to the ATO not the preferred name. This is an example of how data modelling and business processes are inherently tied together.

Data Profiling

Once access and data purpose is squared away, the next action is to understand and rigorously ‘profile’ the data. This is where you can really start to uncover where the issues are. Data profiling can be tricky as there are a few different ways for data analysts to approach it and some tools that help to automate certain aspects of it. I typically run through a list of questions to kickstart the process:

Do you have a proper understanding of the conceptual data model, logical data model and physical data model across the major systems in your environment? Are there discrepancies between how the data is modelled in the various systems e.g. the definition of a region, definition of a customer.
Who are the stakeholders of the data and what are their priorities? Does the data meet the end users needs?
Do you understand how the data is used in business processes? Are there conflicts in how the process works and what the data looks like? E.g. mandatory fields required by business processes.
Are there common priority issues that the users of the data encounter with processes, and data?
Compare to internationally accepted standards e.g. ISO 3166 country codes
Are there duplicates?
Are there common integration errors or exceptions when integrating data between systems?
Are there examples of ‘bad’ data/ what are the trends in the bad data?

You have to understand your data model and concurrent business processes including exceptions/errors to understand what the data quality challenges could be in order to explore and rectify.

Data Quality Visibility

Creating reports, views or datasets that make the data quality issues identified visible is an important part of the data quality journey. To maintain data integrity, your organisation may appoint data stewards who consistently and continuously ensure the business is operating with high quality data. Automated data governance mitigates risk of error, depending on your individual use case, you may look at automating manual processes.

When data profiling identifies data quality issues, these need to be managed until they are resolved whether endpoint managed or “hub” managed. This could take the form of a PowerBI report, a view in a virtualization layer or an automated notification based on whether these data quality issues still exist etc.

Data Cleansing

Data cleansing is a crucial step where incomplete or inaccurate information can be corrected, such as incomplete fields, typos in data entries or out of date information. Any data aggregated from different systems that use different data standards may result in inconsistencies, so it’s important this step is adopted as an ongoing (and where possible automated) process to ensure continual consistency of data. Addressing and cleansing incorrect data ensures your business can be confident about the decisions it is making as a result of that information.

A few ways data cleansing can be achieved include:

Data engineering pipelines or virtualization views to ‘clean’ datasets and provide curated datasets in a ‘hub’ golden record approach.
Updating or changing business processes where operational/ functional staff enter data into systems to control data quality as it enters source enterprise systems e.g. the country for a new customer’s address is chosen from a dropdown list instead of being a free text field.
Robotic Process Automation (RPA) tools can be leveraged to step through records in a source enterprise system and apply business data quality rules to clean up bad data.
Data engineering pipelines provide ‘cleaned’ datasets which are uploaded into source enterprise systems to clean data using specific defined quality rules in bulk as a one off or regular activity.

Data Catalog

Adding meaning to data through metadata enrichment in a data catalog is a key part of having high quality data available across your organisation. To maintain data integrity, your organisation may appoint data stewards who consistently and continuously ensure the business is operating with high quality data.

Some key initiatives around this activity would be:

Understanding the physical data model
Defining and understanding the logical data model
Enriching the data models with a Business Glossary and classifications
Automating the scanning and application of glossary terms and classifications to data sets
Making these data sources, entities, attributes, glossary terms and classifications as well as other metadata (owner, expert, etc.) available to be easily found for self service purposes for approved users in the organisation

Summary

Achieving and maintaining high quality data is a critical component of business success. Making data more reliable, accessible and usable presents a number of opportunities, such as:

Informed decision making with accurate outcomes
Improved customer experience
Greater customer retention
Cross selling opportunities
Time and money alleviated from manual correction tasks

Pursuing data quality shouldn’t involve finding data problems and performing ad hoc fixes. Fostering a continuous improvement data quality mindset within all teams, and communicating benefits of better data quality to departments on an ongoing basis helps establish a new standard across the enterprise. In summary, good data quality in a data driven organisation helps organisations to deliver better services at better operating costs and remain competitive in a disruptive and challenging landscape.

Author Details

Riaan Ingram

Riaan is a Principal Consultant with over 20+ years’ IT architecture, design and development experience. Riaan has a strong technical, hands on background and specialises in developing and integrating enterprise business solutions. Riaan has in depth knowledge of integration as well as cloud patterns & technologies and is an expert in the planning, design and implementation of API first, event driven, microservices, low/no code integration approaches.

Got an enquiry? Intelligent Pathways wants to hear from you.

You might be interested in these related insights

Integration

Using Apache Nifi for Enterprise Workflow Automation

When building integration solutions, there is often a need for workflow automation and orchestration capabilities. With so many options on the market, most of which

Process Automation

Streamlining Compliance with Business Process Automation

In the modern world, there are no secrets and businesses face high expectations in their legal and ethical responsibilities. Managing compliance, however, is often a