Frequently Asked Questions

Data governance is a collection of processes, roles, policies, standards, and metrics that ensure the effective and efficient use of information in enabling an organization to achieve its goals. It establishes the processes and responsibilities that ensure the quality and security of the data used across a business or organization. Data governance defines who can take what action, upon what data, in what situations, using what methods.

A well-crafted data governance strategy is fundamental for any organization that works with big data, and will explain how your business benefits from consistent, common processes and responsibilities. Business drivers highlight what data needs to be carefully controlled in your data governance strategy and the benefits expected from this effort. This strategy will be the basis of your data governance framework.

For example, if a business driver for your data governance strategy is to ensure the privacy of healthcare-related data, patient data will need to be managed securely as it flows through your business. Retention requirements (e.g. history of who changed what information and when) will be defined to ensure compliance with relevant government requirements, such as the GDPR.

Data Governance helps enterprises/organizations save both money and time on bad data, helps to improve customer relationships, and focuses on generating your revenue. With quality and well-defined data, companies are more likely to acquire customers and retain the existing customers and contribute to the total well-being of the company.

Key benefits:

  • Clear Understanding of data across enterprise
  • Improved Data Quality
  • Master Data Management/Golden Record
  • Regulatory compliance
  • Improved Data Management

A Data Owner is an individual who ultimately is accountable for the quality of one or more data sets. Most of the senior-level employees are equipped with the resources, budget, and authority to define, clean, and maintain the data which they own. One of the important points in this context is that an ideal Data Owner is usually not the same person who is responsible for managing the data related to day-to-day work.

Suggestion to refine with following:

Data owners are either individuals or teams who make decisions such as who has the right to access and edit data and how it’s used. Owners may not work with their data every day, but are responsible for overseeing and protecting a data domain.

Data Catalog is all about the detailed inventory of all the data assets in a company that is prepared to help data professionals swiftly find the most specific or appropriate data for the business purpose. Data catalog has the capabilities for gathering and continually curating the metadata.
Data Lineage is all about tracking the lifecycle of a particular data element or set right from where it comes from, where it has transformed or stored and how it is put to use. Data Lineage is useful for determining the trustworthiness of the data. It is the process of recording, understanding, and visualizing the data as it travels from the data sources to consumption. Data Lineage helps in tracking errors, and implement process changes if any, and perform system migrations successfully.
Business Glossary facilitates a common understanding of the proper usage of business terms and meaning. It helps to improve communication between different departments, employees, and between any 3rd party entities. It establishes ownership and enhances training for new and existing employees and also sustains data stewardship.

To prevent inconsistent data silos in different business units
For a shared understanding of data, to agree on a common data definition
To focus on data quality through efforts to identify and fix errors if any in data sets
To provide reliable information for the decision-makers
To implement policies that will prevent misuse of the data and also data errors
To ensure compliance with data privacy laws and other related regulations.

The data dictionary is a set of tables that serves as a centralized repository for technical metadata and is very rarely used outside of information technology. Business glossary contextualizes and defines the critical data and reporting elements for the entire enterprise. For greater clarity, they are written in plain text, accessible, and will often be used for cross-reference terms.

The primary step in building a business glossary is to map out the critical business processes. Ensure each business term, KPI, and metric used in the process is defined and documented. Focus on including definitions of how data elements are being formatted, where are they being stored and who’s managing them.
Master Data Management (MDM) is the technology tool that is used by companies to identify, collect, merge, and match the data for a single view of the data or single source of truth. It helps to harness the data and generate valuable insights by increasing operational and business efficiency and reducing the operating costs regarding data management.
Reference data is the data used to categorize or classify other data for instance which is dynamic changing over a period of time and only static. There are many examples of reference data management such as country codes, units of measurement, calendar constraints and structure, corporate codes, and fixed conversion rates such as temperature, weight, and length. Reference data mainly focuses on classification and categorization.
A Data Steward is an individual who is on a day-to-day basis responsible for managing one or more datasets. The Data Steward reports to the Data Owner and focuses and works towards maintaining the security and quality of the data. However, an ideal data steward sometimes may or may not have any authority regarding the decision-making over their data.