Glossary
This glossary provides key definitions for terms that are relevant for data management. It consolidates the extensive body of knowledge developed by the Competence Center Corporate Data Quality (CC CDQ), in collaboration with data experts from 40+ Fortune 500 companies since 2006. We look forward to all thoughts and suggestions - please feel free to reach out to us at anytime: cc@cdq.com
A
Advanced analytical data is a particular subtype of enterprise data. It is created by applying methods of data science that go beyond purely descriptive analytics (i.e., towards predictive/prescriptive analytics). It is used to identify patterns or correlations in complex (i.e., structured and unstructured) datasets, such as text, images, geospatial or sensor data.
Exclusively for CC CDQ members: Martin, F., Walter, V., Hasan, M.R., Legner, C. (2021): CC CDQ Data Quality Handbook
Related topics
Master data, Media data, Metadata, Reference data, Analytical data, Observational data, Transactional data, Enterprise data
Advanced analytical data products are a particular subtype of data product. They use machine learning methods to create predictive and prescriptive knowledge and improve self-learning capabilities. Examples include:
- Regression analysis such as sales forecast that predicts how much sales will take place over a certain period of time for sales team to adapt their planning
- AI/ML products such as predictive maintenance that smartly predicts possible upcoming machine problems and notifies responsible person to conduct repairs
Learn more: CC CDQ Research Briefing - Data product
Academic source: Hasan, M.R. and Legner, C. (2023) ‘Understanding Data Products: Motivation, Definition, And Categories’, in Proceedings of the Thirty-first European Conference on Information Systems (ECIS2023). Kristiansand, Norway, pp. 1–17.
Related topics
AI-enabled data management comprises data practices that are demonstrably influenced, enhanced, or made possible through the application of artificial intelligence.
Exclusively for CC CDQ members: Schulte, K. and Legner, C. (2024) ‘AI-enabled Data Management: A Framework‘, CC CDQ Work Report
Analytical data is a particular subtype of enterprise data. It is derived from business operations and transactional data and is mainly used to meet standard reporting and analytics requirements by applying descriptive analytics.
Exclusively for CC CDQ members: Martin, F., Walter, V., Hasan, M.R., Legner, C. (2021): CC CDQ Data Quality Handbook
Related topics
Master data, Media data, Metadata, Reference data, Advanced analytical data, Observational data, Transactional data, Enterprise data
Analytical data products are a particular subtype of data product. They are created using descriptive analytics and deliver key insights to support decision making. Examples include:
- Metrics such as email open rate for marketing teams target potential new leads
- Dashboards such as tail-spend analysis that show the smallest vendors of a company for purchasing teams to reallocate purchase orders to larger vendors
- Reports such as carbon emission reporting that show how much CO2 has ben produced for sustainability teams to identify areas of emission reduction
Learn more: CC CDQ Research Briefing - Data product
Related topics
Advanced analytical data product, Basic data product, Data product
The term "Artificial intelligence" (AI) was first coined by John McCarthy at the renowned Dartmouth workshop in 1956, defining it as “the science and engineering of making intelligent machines”.
Academic source: McCorduck, P., & Cfe, C. (2004). Machines who think: A personal inquiry into the history and prospects of artificial intelligence. AK Peters/CRC Press.
The terms related to Artificial intelligence are often used interchangeably and sometimes incorrectly in general discourse. To ensure clarity and precise communication, it is crucial to define these terms accurately. Below, the relationships between these key concepts are visualized.
Related topics
Machine learning, Neural network, Deep learning, Generative AI
B
Basic data products are a particular subtype of data product. They are ready-to-use datasets giving foundational insights of the domain(s) represented by the data. Examples include:
- Master data such as customer information which contains their address, location, name etc.
- Aggregated datasets such as shop floor data from harmonizes data from different assembly lines
- Enriched datasets such as transportation data augmented with weather data from external provider
Learn more: CC CDQ Research Briefing - Data product
Academic source: Hasan, M.R. and Legner, C. (2023) ‘Understanding Data Products: Motivation, Definition, And Categories’, in Proceedings of the Thirty-first European Conference on Information Systems (ECIS2023). Kristiansand, Norway, pp. 1–17.
Related topics
Advanced analytical data product, Analytical data product, Data product
Big data are data that are so large and diverse that they require cost-effective, innovative forms of data collection, storage, management, analysis and visualization. Big Data are typically characterized by 3 V's: Velocity is the speed at which the data is created and the speed at which the data shoud be analyzed and used. Volume refers to the size of the data which is typically in the range of terabytes and exabytes, whereas variety refers the changing data types in scope ranging from more traditional structured source (spreadsheets, SQL database tables) to semi-structured data (XML, JSON, Semantic web data) as well as unstructured data (images, texts, files). In recent years, three more V's have been added to the the traditional 3 V's framework to charactarize Big Data: Variability, Veracity and Value. Veracity refers to different levels in reliability and truthfulness of big data sources, while variability describes the high frequency of changes within a data sources. Last but not least value describes the fact that while single data points may not be of high value, value from big data comes from analyzing huge amounts and trends within and between datasets.
Business analytics is defined as the exploration and investigation of past business data to gain valuable insights and drive business planning. These activities depend on a sufficient volume of data as well as on a sufficient level of data quality. This requires data managers to integrate and reconcile data across various sources (i.e. from various business units, divisions, departments, branches, and information systems), with the goal of compiling a complete picture of the company’s past and current state for deriving future scenarios.
Learn more: CDQ Trend Study: Where data management is heading
A business object is the representation of real-world concepts. It describes reoccurring sets of information used in multiple business contexts and minimum one data domain. It is specified by attributes.
A business rule is a statement that defines or constrains some aspect of the business. It is intended to assert business structure or to control or influence the behavior of the business. Business rules may be defined as business definitions for business use (to represent policies, practices and procedures), or defined as executable business rule statements for use in rule-driven systems, or both.
Academic source: Business Rules Group. (2000). Defining business rules: What are they really?
Refers to the impact of data management on business with regard to financials, business processes, customers, and organizational growth.
Exclusively for CC CDQ members: Pentek, T., Legner, C., & Otto, B. (2020). Data Excellence Model: Reference Model for Managing Data Assets. CC CDQ Working Report.
C
The Chief Data Office is a strategic function responsible for defining and overseeing an organization's enterprise-wide data strategy, data governance framework, and data product portfolio.
Learn more: CC CDQ Research Briefing on Federated Data Governance and the Role of the Chief Data Office
Academic source:
Fadler, M., & Legner, C. (2021). Toward big data and analytics governance: Redefining structural governance mechanisms. Proceedings of the 54th Hawaii International Conference on System Sciences.
Related topics
Data management, Data strategy, Data governance, Chief Data Officer
The Chief Data Officer (CDO) is a data role and typically leads the Chief Data Office. One is in charge of developing and implementing the data strategy and data governance framework required to execute the business strategy. One drives the data transformation and the development of data capabilities.
Exclusively for CC CDQ members: Reference Model for Data & Analytics Governance
Academic source:
Fadler, M., & Legner, C. (2021). Toward big data and analytics governance: Redefining structural governance mechanisms. Proceedings of the 54th Hawaii International Conference on System Sciences.
Related topics
Data management, Data strategy, Data governance, Chief Data Office
The central actors (business partners, customers, suppliers and employees), products (incl. materials) and operating materials (systems, etc.) of a company and its ecosystem. These objects are represented as master data for purposes of IT.
Academic source: Otto, B., & Oesterle, H. (2015). Corporate data quality: Prerequisite for successful business models
D
The Data Analyst is a data role who collects, processes and analyze data to help the organization make informed decisions. One is responsible for the development and maintenance of KPIs, reports, and dashboards using business intelligence and data visualization tools.
Exclusively for CC CDQ members: Reference Model for Data & Analytics Governance
Academic source:
Fadler, M., & Legner, C. (2021). Toward big data and analytics governance: Redefining structural governance mechanisms. Proceedings of the 54th Hawaii International Conference on System Sciences.
Related topics
Business intelligence, Analytical data, Analytical data product
Data applications are software applications for managing data. Examples are applications for managing master data, managing data quality, and cataloging or curating data.
Exclusively for CC CDQ members: Pentek, T., Legner, C., & Otto, B. (2020). Data Excellence Model: Reference Model for Managing Data Assets. CC CDQ Working Report.
Related topics
Data quality management, Master data management, Data product, Data catalog, Data quality tool
The data architect is a data role who ensures the data is properly stored, integrated, and consumed across an organization's IT landscape. One is responsible for designing, creating, deploying and managing conceptual and logical data models and for the mapping to physical data models.
Exclusively for CC CDQ members: Reference Model for Data & Analytics Governance
Related topics
Data Catalogs are a set of electronic resources and associated technical capabilities for creating, searching and using enterprise data. The content of the data catalogs includes data, metadata that describe various aspects of the data (e.g. representation, creator, owner, reproduction rights), and metadata that consist of links or relationships to other data or metadata, whether internal or external to the data catalog. Data Catalogs are constructed, collected and organized, by and for, a community of users, and their functional capabilities support the information needs and uses of that community - ultimately matching data supply and demand.
Exclusively for CC CDQ members: Martin, F., Labadie, C., Korte, T., Eurich, M., Legner, C., Otto, B., Spiekermann, M. (2019): Data Catalogs: Integrated Platforms For Matching Data Supply And Demand
Related topics
Enterprise data, Data product, Data applications, Metadata, Data quality tool
A data citizen is an employee who uses data for their daily work. One has data-related rights (e.g., access to relevant data) and obligations (e.g., adherence to data policies and standards).
Exclusively for CC CDQ members: Reference Model for Data & Analytics Governance
Related topic
A data community is a group of people who share a common domain of interest and engage in a process of collective learning to develop data practices. Three types of data communities can be distinguished: Data communities focused on developing skills around tools and methods; Data communities focused on specific data object or data domain; and Data communities spreading general data awareness.
Learn more: CC CDQ Research Briefing Data Democratization
Academic source:
Lefebvre et al. (2023) 5 Pillars for Democratizing data at your organization. Harvard Business Review.
Related topic
A data contract is a formal agreement between the producer(s) and consumers(s) of data products which guarantees their provision at a desired level of service in return for adhering to conditions to facilitate the products’ reliable usage. A data contract may contain metadata on the following:
- Structural: Captures the overall composition of the data products, eg: schema, data models, field description etc.
- Administrative: Exhibits the general details about the data products, eg: name, version, purpose etc.
- Data Quality: Focuses on the fitness-of-use of the data, eg: data quality rules, dimensions, score etc.
- Ownership: Outlines who owns and is accountable for the data product, eg: data product owner name, business domain, contact details etc.
- Pipeline: Captures the infrastructural aspects of the data product, eg: server, platform, format etc.
- Service Level Agreement: Highlights the service guarantee provided to the consumers, eg: update frequency, latency, availability etc.
- Licensing: Underpins the usage elements of the data product, eg: restrictions, rights, terminations etc.
- Pricing: Concerns the various pricing mechanisms of the data product, eg: price amount, pricing model, billing frequency etc.
- Access: Focuses on the security aspects of the data product, eg: access type, authentication method, approver etc.
Related topics
Data democratization is an organization’s capability to motivate and empower a wider range of employees - not just data experts - to understand, find, access, use, and share data in a secure and compliant way. It is a new management paradigm in data-driven organizations and ensures that employees without “data” in their title, or “regular people,” feel comfortable enough to incorporate data into their daily activities, to become data citizens (with rights and obligations).
Data democratization initiatives are generally supported by five key enabling areas (or pillars):
- Broaden data access by rolling-out data catalogs and marketplaces,
- Stimulate data use and the generation of data-driven insights through self-service,
- Level up data literacy with specific curricula for personas or role families,
- Create data communities to advance data practices,
- Promote data value across the organization.
Learn more: CC CDQ Research Briefing Data Democratization
Academic source:
Lefebvre et al. (2023) 5 Pillars for Democratizing data at your organization. Harvard Business Review.
Related topics
Data citizen, Data management, Data literacy, Data catalog, Data community
Data excellence is an umbrella term that defines properties of data, comprising data quality (defined as “fitness for purpose”) but also additional dimensions, such as regulatory compliance, data security, or data privacy.
The Data Excellence Model (DxM) is a data management framework supporting companies in developing the capabilities to manage data as a strategic asset. The DXM has been developed by the Competence Center for Corporate Data Quality (CC CDQ) in a collaboration between researchers and Fortune 500 companies from different industries (including, but not limited to, automotive, pharma, retail and consumer goods industries).
The DxM comprises three main building blocks:
- Goals (Why?) break down the data-informed business capabilities that are required, which of these are already in place to some extent and need to be enhanced, and which ones need to be established from scratch.
- Enablers (How?) outline six areas for execution, as the main constituents (or domains) of data management, that need to be addressed to achieve the Goals: People, Roles and Responsibilities; Performance Management; Processes and Methods; Data Lifecycle; Data Applications; Data Architecture.
- Results (What?) indicate to what extent the Goals have been achieved in terms of two quantifiable aspects: Business Value and Data Excellence.
These categories are interlinked by a continuous improvement cycle that involves the monitoring of Goals, Enablers and Results and allow for the necessary adjustment. This acknowledges the dynamic nature of data management that requires continuous effort and evolves over time.
Learn more: CC CDQ Research Briefing - Data Excellence Model
Academic source: : Legner, Pentek & Otto (2020). Accumulating Design Knowledge with Reference Models: Insights from 12 Years’ Research into Data Management. Journal of the Association for Information Systems, 21(3), 735-770.
Related topics
Data governance establishes an organization-wide framework for managing and using data as an enterprise asset and assures compliance with corporate strategy and regulations. Effective data governance frameworks encompass five key components:
- Standards, guidelines, and rules that ensure consistent data management and usage.
- Clearly defined roles and responsibilities, specifically data ownership and stewardship.
- Complete and consistent documentation of metadata, semantic definitions, glossaries, data flows, and data models.
- A metrics framework to define, monitor and improve data quality, usage, enablement, and the overall value derived from data.
- Definition of the critical components of the data and analytics application landscape, such as the data catalog, data quality management (DQM) tools, master data management (MDM)data systems, data science workbenches, and self-service analytics tools.
Learn more: CC CDQ Research Briefing - Modern Data Governance
Related topics
Data integration is the task of presenting a unified view of data owned by heterogeneous and distributed data sources". The need for data integration may stem from (1) technological heterogeneities (different database technologies) (2) schema heterogeneities (different data models and data representations) and (3) instance-level heterogeneities (conflicting values in different sources for the same data object). Data can be physically integrated or virtually, meaning that the data will remain in the source systems, however will be accessed using a uniform view.
Academic source: Data and Information Quality (2016), Carlo Batini, Monica Scannapieco
The data lifecycle comprises all steps required for the creation, maintenance, use, archiving, and deletion of data.
Exclusively for CC CDQ members: Pentek, T., Legner, C., & Otto, B. (2020). Data Excellence Model: Reference Model for Managing Data Assets. CC CDQ Working Report
Data literacy refers to the ability to work with data in a relevant context, most typically in a professional context. It includes all relevant skills for the creation, curation, processing, and use of data. The level of proficiency for these skills depends on the workplace expectations and more broadly evolves around three key personas:
- Casual data users are individuals where data is necessary for their job. They create, curate, analyze and use data to ensure its quality and suitability for different business goals.
- Data specialists are individuals where data is necessary for their job. They create, curate, analyze and use data to ensure its quality and suitability for different business goals.
- Data experts are individuals for whom data is their profession. Their expertise spans data management as well as analytics and artifical intelligence, ensuring data is created, curated, and analyzed so as to be ready to use at scale.
Learn more: CC CDQ Research Briefing - Data Literacy
Academic source: Lefebvre, H., & Legner, C. (2024). Toward a Curriculum for Data Literacy in Enterprises.
Related topic
Data management aims at the efficient usage of data in companies. It makes decisions and executes measures that affect the company-wide handling of data (whereas data governance creates the framework for such through the definition of responsibilities and so forth). It comprises all tasks related to the data lifecycle on a strategic, governing, and technical level: the formulation of a data strategy, the definition of data management processes, standards, and measures, the assignment of roles and responsibilities, the description of the data lifecycle and architecture– covering data models and data modeling standards –, and the management of applications and systems.
Related topics
A data management framework provides a structured approach to develop data management in an organization. It synthesizes organizational, architectural and systems-related considerations and is typically formulated as a reference and/or capability model.
The most popular examples are:
- Data Management Body of Knowledge (DAMA-DMBoK®), a reference model developed by the community of experts in the Data Management Association (DAMA).
- Data Management Capability Assessment Model (DCAM), a capability model and maturity model for data management developed by the EDM Council via cross-industry collaboration.
- Data Excellence Model (DxM), comprising a capability and maturity model developed by the Competence Center for Corporate Data Quality (CC CDQ) in an industry-research collaboration.
Related topics
The data owner is a core data & analytics role in the CC CDQ Reference Model for Data & Analytics Governance. Two different role types of data owner are usually being distinguished in practice: data definition owner and data content owner.
The data definition owner is a decentralized data governance role which is assigned typically to senior business executives with global outreach (e.g. Global head of sales). (S)he is accountable for the data definition in specific areas of responsibility (e.g. a specific data domain like product or customer). Here, (s)he ensures that business requirements are fulfilled and data is compliantly accessed and used. Her/his tasks include collecting/defining data requirements and delegating the detailling of a data definition to a data steward.
The data content owner is a decentralized data governance role which is assigned to local business executives/ team leaders with operational responsibilities. (S/he) is accountable for data creation and maintenance (data lifecycle) according to the data definition for a specific area of responsibility. (S)he coordinates the creation and maintenance of data by data editors.
Exclusively for CC CDQ members: CC CDQ Reference Model for Data & Analytics Governance
Data practices refer to the specific activities and processes that organizations engage in to manage, process, and utilize data effectively. This includes tasks such as data curation, storage, maintenance, and accessibility, as well as activities beyond the traditional data lifecycle that support and enable the effective use of data.
Exclusively for CC CDQ members: Schulte, K. and Legner, C. (2024) ‘AI-enabled Data Management: A Framework‘, CC CDQ Work Report.
A data product is a managed artifact that satisfies recurring information needs and creates value through transforming and packaging relevant data elements into a consumable form. Data products have particular subtypes: Basic data product, Analytical data product, Advanced analytical data product. Firms build data products in order to enhance access and reuse of data, to improve its governance and ownership as well as to reduce the time-to-insight. Data products have five main characteristics:
- Data products fulfill recurring information needs
- Data products must have a well-defined consumer base
- Data products generate tangible value that can be tracked and measured
- Data products are built from data that comes from reliable sources
- Data products must be delivered in a consumable form
Learn more: CC CDQ Research Briefing - Data product
Academic source: Hasan, M.R. and Legner, C. (2023) ‘Understanding Data Products: Motivation, Definition, And Categories’, in Proceedings of the Thirty-first European Conference on Information Systems (ECIS2023). Kristiansand, Norway, pp. 1–17.
Related topics
Data product lifecycle, Data product portfolio management, Data product canvas, Data product owner, Data product manager
A data product canvas is a visual tool that supports organizations in designing and documenting data products. An example is the CC CDQ Data Product Canvas that addresses three key dimensions in the design of data products: desirability (do consumers want it), feasibility (can we deliver it) and viability (is it worth it).
Academic source: Hasan, M. R., & Legner, C. (2023). Data Product Canvas: A visual inquiry tool supporting data product design. In International Conference on Design Science Research in Information Systems and Technology (pp. 191-205). Cham: Springer Nature Switzerland.
Canvas template here :
Download
Related topics
Data product, Basic data product, Analytical data product, Advanced analytical data product
The data product lifecycle is an end-to-end approach which oversees the evolution of data products from its cradle to grave within organizations. It consists of six phases:
- Ideation & qualification,
- data sourcing,
- development & testing,
- deployment,
- consumption & monitoring,
- retirement.
Related topics
Data product, Basic data product, Advanced analytical data product, Analytical data product, Data product manager, Data product owner, Data product portfolio management
A data product manager is a data role that is accountable for the creation, implementation and maintenance of data products. The data product manager closely collaborates with the data team consisting of various other roles such as data owner, data analyst, data scientist, data engineer and data architect. The data product manager works closely with the data product owner to ensure that data products remain fully functional and relevant for the business throughout its lifecycle.
Related topics
A data product owner is a data role that represents the business interests and is accountable for the specification of business requirements of data products. In many cases, the data product owner is also the sponsor of the data product and has a final say in its acceptance. The data product owner collaborates with the data product manager throughout the data product lifecycle to ensure that business requirements are addressed while creating the data products.
Related topics
Data product portfolio management is the process of systematically selecting product ideas, continuously assessing and optimizing the portfolio to maximize its value over time through alignment with organizational goals. It mainly consists of three phases: selection, monitoring and optimization. Data product portfolio management allows organizations to create transparency of all their data products, efficiently allocate resources to the right products, manage intricate dependencies between data products and ensure fitness to strategic, technical and regulatory requirements.
Exclusively for CC CDQ members: Hasan, M.R. and Legner, C. (2024): Data Product Portfolio Management
Related topics
Data quality is a multi-dimensional, context-dependent concept that cannot be described and measured by a single characteristic, but rather by various data quality dimensions. The desired level of data quality is thereby oriented on the requirements in the business processes and functions, which use this data, such as Purchasing, Sales or Reporting. A low level of data quality will reduce the value of the data assets in the company, because its usability is minimal. Companies are therefore striving to achieve a quality of data required by the business strategy using data quality management (DQM).
Academic source: Otto, B., & Oesterle, H. (2015). Corporate data quality: Prerequisite for successful business models
Related topics
Data quality dimensions, Data quality Key Performance Indicator, Data quality tool
A data quality dimension is a measurable feature or characteristic of data. The most important dimensions whose data quality can be assessed are:
- Correctness: Factual agreement of the data with the properties of the real-world object that it represents.
- Consistency: Agreement of several versions of the data related to the same real objects, which are stored in various information systems.
- Completeness: Complete existence of all values or attributes of a record that are necessary.
- Actuality: Agreement of the data at all times with the current status of the real object and timely adjustment of the data as soon as the real object has been changed.
- Availability: The ability of the data user to access the data at the desired point in time.
Academic source: Otto, B., & Oesterle, H. (2015). Corporate data quality: Prerequisite for successful business models
Related topics
Data quality, Data quality Key Performance Indicator, Data quality tool
A quantitative measure of data quality. A data quality measurement system measures the values for the quality of data at measurement points at a certain frequency of measurement. Data quality key performance indicators operationalize data quality dimensions. One example is the validation of a data element based on business rules.
Academic source: Otto, B., & Oesterle, H. (2015). Corporate data quality: Prerequisite for successful business models
The mandate of Data Quality Management (DQM) is to analyze, improve and ensure the quality of the data. DQM generally differentiates between preventive and reactive measures. Preventive DQM measures target the avoidance of defects in the data with negative effects on the quality of the data. In contrast, reactive DQM measures target the discovery of existing defects in the data and their correction.
Academic source: Otto, B., & Oesterle, H. (2015). Corporate data quality: Prerequisite for successful business models
Related topics
Data quality, Data quality Key Performance Indicator, Data quality dimensions, Data quality tool
Periodic examination of the data quality of the central records as part of DQM. For example, the data quality of the most important attributes could be measured based on defined business rules on a monthly basis. A record that does not fulfill all rules will be considered defective.
Academic source: Otto, B., & Oesterle, H. (2015). Corporate data quality: Prerequisite for successful business models
Related topics
Data quality, Data quality Key Performance Indicator, Data quality dimensions, Data quality tool
Data quality tools are software solutions designed to ensure the trustworthiness and reliability of the data by identifying and fixing their quality issues. This may involve measuring various data quality dimensions, such as, accuracy, completeness, consistency and validity of the data. Ensuring high data quality affects the performance of various other applications that use data to support decision making.
Exclusively for CC CDQ members: Martin, F., Walter, V., Hasan, M.R., Legner, C. (2021): CC CDQ Data Quality Handbook
Related topics
Data quality, Data quality Key Performance Indicator, Data quality dimensions, Data applications, Data catalog
Deep learning networks are neural networks with many layers. The layered network can process extensive amounts of data [and therefore] requires a great deal of computing power, which raises concerns about its economic and environmental sustainability.
Academic source: Machine learning, explained (2021). MIT Sloan
Related topics
Artificial intelligence, Machine learning, Neural network, Generative AI
E
Enterprise data describes all data that are created, maintained and used by enterprises. The enterprise data taxonomy developed in the Competence Center Corporate Data Quality distinguishes eight different categories of enterprise data and depicts their relationships: Master data, Transactional data, Observational data, Media data, Analytical data, Advanced analytical data, Metadata, and Reference data.
Exclusively for CC CDQ members: Martin, F., Walter, V., Hasan, M.R., Legner, C. (2021): CC CDQ Data Quality Handbook
External data refers to any type of data that is captured, processed, and provided from outside the company. The major external data types include open, paid, shared, and web data. Despite their increasing relevance, external data remain an untapped resource for most companies. External data can be used to complement internal data and help to improve advanced analysis, optimize business processes (e.g. with geolocation, weather, or traffic data), reduce internal data maintenance efforts (e.g. to enrich or validate internal data), and create new services. However, despite their increasing relevance, external data remain an untapped resource for most companies.
Learn more: CC CDQ Research Briefing - External Data
Academic source: Krasikov et al. (2022). Unleashing the Potential of External Data: A DSR-based Approach to Data Sourcing. In ECIS.
Related topics
External data sourcing involves the process of procuring, licensing, and accessing data from sources outside an organization. A systematic approach to sourcing and managing external data comprises the following phases:
- Request: Analyze requests for external data and understand their business context and requirements.
- Screen: Search for suitable datasets and identify relevant data sources (open, paid, shared, and web data).
- Assess: Assess candidate datasets against criteria (such as provenance, price, license, structure, data quality) and select dataset and provider.
- Integrate: Access and onboard the external dataset, and map it with internal data.
- Manage & use: Monitor dataset for updates, analyze its use in business processes or analytical products.
- Retire: Archive, delete and/or cancel the subscription.
Learn more: CC CDQ Research Briefing - External Data
Academic source: Krasikov et al. (2022). Unleashing the Potential of External Data: A DSR-based Approach to Data Sourcing. In ECIS.
Related topics
F
Federated data governance refers to an operating model that balances centralized and decentralized data responsibilities. It empowers local data roles to manage data autonomously within an enterprise-wide framework aligned with global standards and adapted to local requirements. This model typically follows a hub-hub-spoke structure, comprising a global data office, multiple local data offices, and operational data roles.
Learn more: CC CDQ Research Briefing Federated Data Governance and the Role of the Chief Data Office
Related topics
A principle of preventive data quality management (DQM) according to which data should be acquired by an information system as correctly as possible in order to avoid retroactively correction (at generally higher levels of expenditure)
Academic source: Otto, B., & Oesterle, H. (2015). Corporate data quality: Prerequisite for successful business models
Related topics
Data quality management (DQM), Data lifecycle, Zero maintenance
G
Generative AI (Artificial intelligence) can be thought of as a machine learning model that is trained to create new data, rather than making a prediction about a specific dataset. A generative AI system is one that learns to generate more objects that look like the data it was trained on.
Academic source: Zewe, A. (2023). Explained: Generative AI. MIT News
Related topics
Artificial intelligence, Machine learning, Neural network, Deep learning
H
I
The "Internet of Things" refers to the idea of an extended Internet that, in addition to classic computers and mobile devices, also integrates any physical objects into its infrastructure by means of sensors and actuators, thus turning them into providers or consumers of a wide variety of digital services.
Academic source: Fleisch, E. & Tiesse, F. Enzyklopaedie der Wirtschaftsinformatik
J
K
L
Linked Open Data defines a vision of globally accessible and linked data on the internet based on the RDF standards of the semantic web. This structured web data is interlinked with other data and can be accessed through semantic queries. Linked open data is released under an open license, which does not impede its reuse for free.
Academic source: W3C, Tim Berners-Lee
M
Machine learning is a subfield of artificial intelligence and covers algorithms and techniques that allow machines to learn from data. Two main categories of machine learning techniques are supervised machine learning (SML) and unsupervised machine learning (USML).
Academic source: Kalota (2024). A Primer on Generative Artificial Intelligence. Education Sciences, 14(2), 172.
Related topics
Artificial intelligence, Generative AI, Neural network, Deep learning
Master Data is the most fundamental enterprise data subtype. Master data represents core business objects (i.e., customers, suppliers, or products) which are agreed upon and shared across the enterprise. They remain largely unaltered and are often referenced and reused in business documents and data analysis. They must be unambiguously identifiable and interpretable across the entire organization (i.e., across organizational departments, divisions, and units).
Exclusively for CC CDQ members: Martin, F., Walter, V., Hasan, M.R., Legner, C. (2021): CC CDQ Data Quality Handbook
Related topics
Advanced analytical data, Media data, Metadata, Reference data, Analytical data, Observational data, Transactional data, Enterprise data
Master data management consists of all the activities, methods and IT tools for modelling, managing and providing master data as well as its data quality management (DQM). The goal is to provide and ensure a company-wide truth about the core business object (single source of truth) and thereby to support data users in various business processes throughout the company.
Related topics
Media data is a particular enterprise data subtype that represents documents, digital images, geospatial data, and multimedia (video/audio) files. Media data is mainly unstructured in nature.
Exclusively for CC CDQ members: Martin, F., Walter, V., Hasan, M.R., Legner, C. (2021): CC CDQ Data Quality Handbook
Related topics
Advanced analytical data, Master data, Metadata, Reference data, Analytical data, Observational data, Transactional data, Enterprise data
Metadata is "data about data". This is a particular enterprise data subtype that aims to facilitate access, management and sharing of large sets of structured and/or unstructured data. There are six categories of metadata:
- Structural metadata that describes the general data model such as, type, attributes of objects and relationships between objects.
- Administrative metadata that provides information to help manage a resource such as, users (with rights) and dates (creation, last update)
- Terminological metadata that provides an understanding on the data such as, definitions, abbreviations, cataloging records and comments from creators and users.
- Governance metadata that provides an overview of the data landscape from a management point of view such as, ownership, roles, responsibilities and level of confidentiality
- Context metadata provides information on the environment in which the data exists such as, business processes and business purposes (use cases).
- Use metadata that provides information on how data are consumed such as search logs, usage statistics, processing systems.
Related topics
Advanced analytical data, Master data, Media data, Reference data, Analytical data, Observational data, Transactional data, Enterprise data
N
Neural networks are a commonly used, specific class of machine learning algorithms. Artificial neural networks are modeled on the human brain, in which thousands or millions of processing nodes are interconnected and organized into layers. In an artificial neural network, cells, or nodes, are connected, with each cell processing inputs and producing an output that is sent to other neurons.
Academic source: Machine learning, explained (2021). MIT Sloan.
Related topics
Machine learning, Artificial intelligence, Deep learning, Generative AI
O
Observational data is a particular enterprise data subtype that captures experiences and behavior at a very detailed and fine granular level. It is generated by human or things. Observational data includes IoT/sensor data from connected devices (often in the form of data streams), web data generated by user activities on social media platforms or commercial websites, as well as survey data from questionnaires.
Exclusively for CC CDQ members: Martin, F., Walter, V., Hasan, M.R., Legner, C. (2021): CC CDQ Data Quality Handbook
Related topics
Advanced analytical data, Master data, Metadata, Media data, Reference data, Analytical data, Transactional data, Enterprise data
Open data can be defined as "data that is freely available, and can be used as well as republished by everyone without restrictions from copyright or patents”. As specific type of external data, open data holds great business potential and is expected to fuel advanced analytics, optimize business processes, enrich data management, or even enable new services.
Learn more: CC CDQ Research Briefing - External Data
Academic source: Krasikov et al. (2021). Sourcing the right open data: a design science research approach for the enterprise context. In International Conference on Design Science Research in Information Systems and Technology (pp. 313-327).
Related topics
External data, External data sourcing, Paid data, Web data, Shared data
P
Paid data, also known as commercially available data, refers to the datasets available directly from specialized data providers (or brokers) and data marketplaces, and offered at a certain cost. It is a specific type of external data and is typically coupled with specific services which facilitate its use, such as identification and classification of data by categories, description of the intended use, metadata documentation, and integration services.
Learn more: CC CDQ Research Briefing - External Data
Related topics
External data, External data sourcing, Open data, Web data, Shared data
From a regulatory perspective, personal data can be defined as “data enabling direct or indirect identification of a single physical person, data that is specific to a single physical person without enabling identification, data that can be linked to a physical person, data regarding which anonymization techniques cannot completely mitigate the risk of re-identification” (Debet et al. 2015). From a practical perspective, most companies collect personal data about their customers, employees, suppliers and vendors. A particular area of concern typically are customer data that can be defined as “a set of data that represents and is associated with the identity, activities and service offering associated with a unique individual” (Tapsell et al. 2018).
Source: Debet, A., Massot, J., Metallinos, N., Danis-Fantôme, A., Lesobre, O.: Informatique et libertés. La protection des données à caractère personnel en droit français et européen (2015).
Tapsell, J., Akram, R.N., Markantonakis, K: Consumer-Centric Data Control, Tracking and Transparency (2018).
Q
R
Reference data is a particular enterprise data subtype used to characterize, categorize, validate or constrain other data. The most basic reference data are codes or key value lists, but they can also be more complex and incorporate hierarchies or vocabularies. Reference data can be defined and created internally (i.e., customer classifications, product groups) or received from external sources (i.e., country or currency codes defined by ISO standards, product classifications defined by e-commerce standards).
Exclusively for CC CDQ members: Martin, F., Walter, V., Hasan, M.R., Legner, C. (2021): CC CDQ Data Quality Handbook
Related topics
Advanced analytical data, Master data, Metadata, Media data, Observational data, Analytical data, Transactional data, Enterprise data
A regulation is a document written in natural language containing a set of guidelines specifying constraints and preferences pertaining to the desired structure and behavior of an enterprise. Examples of regulations are a law (e.g., the General Data Protection Regulation - GDPR), a standardization document, a contract, etc. A regulation specifies the domain elements it applies to and oftentimes has implications for data management.
Academic source: El Kharbili, M. (2012). Business process regulatory compliance management solution frameworks: A comparative evaluation.
Regulatory Compliance Management (RCM) is the problem of ensuring that enterprises (data, processes, organization, etc.) are structured and behave in accordance with the regulations that apply, i.e., with the guidelines specified in the regulations.
Academic source: El Kharbili, M. (2012). Business process regulatory compliance management solution frameworks: A comparative evaluation.
A regulatory guideline specifies the expected behavior and structure on enterprise domain elements. It additionally defines tolerated and non-tolerated deviations from the ideal behavior and structure, and also defines the possible exceptional cases. A regulation may also specify how the enterprise ought to or may react to deviations from ideal behavior and structure.
Academic source: El Kharbili, M. (2012). Business process regulatory compliance management solution frameworks: A comparative evaluation.
S
Shared data refers to external data which is shared between companies within dedicated business ecosystems. Examples for sharing and exchange environments include Global Data Synchronization Network (GDSN) provided by GS1 or CDQ Data Sharing Community.
Learn more: CC CDQ Research Briefing - External Data
Related topics
External data, External data sourcing, Open data, Paid data, Web data
T
Transactional data is a particular enterprise data subtype that is created by business processes and documents key business events or the results of business activities. Transactional data often references master data, but in contrast to master data, it naturally changes during its lifecycle (i.e., status changes). Furthermore, the volume of transactional data (i.e., number of sales orders) increases with ongoing business activities. Examples are sales or purchase orders, invoices, delivery notes or incidents.
Exclusively for CC CDQ members: Martin, F., Walter, V., Hasan, M.R., Legner, C. (2021): CC CDQ Data Quality Handbook
Related topics
Advanced analytical data, Master data, Metadata, Media data, Observational data, Analytical data, Shared data, Enterprise data
U
V
W
Web data refers to the data made available on the Web (e.g., online sources, websites) and also shared by users (e.g., user-generated content, reactions, comments) of social media platforms, including the metadata (e.g. location, time, language, biographical data). Web data is one of the subtypes of external data.
Learn more: CC CDQ Research Briefing - External Data
Related topics
External data, External data sourcing, Open data, Paid data, Shared data
X
Y
Z
A principle of preventive preventive data quality management (DQM) where data maintenance tasks are automated to the extent that they require minimal to no manual intervention. This concept aims to ensure that data remains up-to-date and accurate without the need for continuous human oversight.
Related topics
Data quality management (DQM), Data lifecycle, First time right