Data and data governance duties for high-risk AI — 2026-06-22
Data and Data Governance Duties for High-Risk AI
TL;DR — The EU AI Act requires providers of high-risk AI systems to implement robust data governance, including data quality standards, documentation, bias monitoring, and traceability. Organizations must ensure training, validation, and test datasets are representative, relevant, and free from unlawful bias. Data governance plans must be maintained, auditable, and updated throughout the system's lifecycle to comply by the compliance deadlines outlined in the regulation.
What are the core data governance obligations under the EU AI Act?
High-risk AI providers must establish and maintain data governance practices that ensure the quality, integrity, and appropriateness of datasets used in development and deployment. The EU AI Act mandates that training, validation, and test data be sufficiently representative, relevant, and free from unlawful bias. Organizations must document their data collection, processing, and governance procedures and keep these records available for competent authorities.
Who is responsible for implementing data governance?
Providers of high-risk AI systems bear primary responsibility for data governance. This includes establishing internal processes, assigning accountability, and ensuring compliance across the AI system's lifecycle. Where providers use processors or other third parties to handle data, they must ensure contractual arrangements guarantee the same level of data governance compliance.
What documentation is required?
Organizations must maintain:
- Data governance policy: Outlining how datasets are sourced, curated, and managed
- Data quality assessments: Demonstrating that training, validation, and test data meet representativeness and relevance standards
- Bias monitoring records: Showing ongoing checks for unlawful bias
- Data lineage documentation: Tracing the origin, transformation, and use of datasets
- Update logs: Recording changes to datasets or governance procedures over time
All documentation must be kept readily accessible for regulatory inspection.
How should organizations address bias in datasets?
Under the EU AI Act, providers must actively identify and mitigate unlawful bias in training and test data. This includes:
- Conducting pre-deployment bias audits
- Implementing ongoing monitoring during operation
- Documenting bias detection and remediation actions
- Adjusting datasets or model parameters when bias is identified
- Recording the rationale for any bias-mitigation decisions
What is meant by "representative and relevant" data?
Representative data reflects the diversity of real-world use cases and populations that the AI system will encounter. Relevant data is appropriate to the specific task and objectives. Organizations must assess whether their datasets:
- Cover demographic, geographic, and contextual variations
- Match the intended deployment environment
- Exclude irrelevant or distorting information
- Are sufficient in volume and quality for reliable system performance
When do these requirements take effect?
The EU AI Act applies to high-risk systems with a phased timeline. Most high-risk AI obligations, including data governance duties, become enforceable approximately 18–24 months after the regulation's entry into force. Providers should begin implementing these measures immediately to avoid compliance gaps.
Can data governance requirements be shared with third parties?
Yes, but responsibility remains with the provider. If a processor or other third party handles data on behalf of the provider, the regulation requires written contracts that ensure equivalent data governance standards are maintained. The provider must verify and audit third-party compliance.
Frequently Asked Questions
Q: Must we retain all training data indefinitely?
A: No. You must retain documentation of your data governance practices and samples of data used, but indefinite retention of all raw data is not required. Retention periods should be proportionate to the system's risk level and regulatory requirements.
Q: Does the EU AI Act apply to data used before the regulation enters into force?
A: Systems already deployed before enforcement may be subject to transitional rules. Consult the regulation's text and competent authority guidance for your specific situation.
Q: What happens if we discover bias after deployment?
A: You must document the discovery, assess its impact, and implement corrective measures. This may include retraining the model, adjusting datasets, or restricting the system's use. Failure to act on known bias is a compliance violation.
Q: Are synthetic datasets treated the same as real-world data?
A: Synthetic data must also meet representativeness and quality standards under the EU AI Act. You must document how synthetic data was generated and demonstrate that it is suitable for your use case.
Q: What if we use open-source datasets?
A: You remain responsible for assessing and documenting the quality, representativeness, and bias profile of any dataset you use, regardless of its source. External provenance does not transfer compliance responsibility to you.
Sources
This article is informational and does not constitute legal advice. Consult qualified counsel for your specific situation.