Starting in 2024, the ISO 8800 standard for road vehicle safety in the realm of artificial intelligence (AI) will take effect. This standard has significant implications for all AI-based automotive applications. Companies involved in AI development for such applications face the challenge of meeting these new requirements. Approaching the end of the project period in December 2023, the publicly funded collaboration project, KI Data Tooling, is unveiling a reference architecture and accompanying methods for a data kit designed to generate, process, and evaluate sensor data used for automated driving perception. This data kit has been influenced by the currently known ISO 8800 specifications and creates a basis to pass future safety audits with a focus on data quality and Machine Learning Operations (ML-Ops) tooling.
More than three years of collaborative research and development have been necessary to create a technology solution that represents a crucial building block for deploying safe and reliable AI in highly automated driving functions: a resilient AI data kit for AI function training. This data kit guarantees the effective and efficient training of AI functionalities for highly automated driving by employing a carefully selected dataset, approved methods, and reliable tools.
Training AI across various driving domains and complex traffic scenarios is paramount for achieving safe, robust, and reliable AI functionalities throughout the entire process chain. This journey begins with data acquisition, followed by processing, enrichment, curation, validation, and ultimately the application of refined data within the ML training loop. A specific framework architecture, complete with relevant tools and methods, defines the workflow.
Throughout the project's progression and through close collaboration with the sister project, KI Absicherung, it became evident that ISO 8800 standardization activities had an influence on the project's work packages and results. As a result, all 17 consortia partners from industry, technology, and scientific institutions unanimously agreed to design and develop the AI data kit with standardized specifications in mind, particularly concerning the ML framework architecture of the AI data kit.
User story helps understanding the project’s approach
To better understand the project's approach, a user story has been developed. This story provides a glimpse into the perspective of a team responsible for building data and ML loops that support the development of data-driven perception functions for automated driving. Ultimately, this project team must pass an ISO 8800-based safety audit, certifying its ML-Ops data loop.
To begin, the team established a clear understanding of ISO 8800's requirements for organizations developing AI-based functionalities. This understanding is a prerequisite for gaining confidence in the AI-based ML lifecycle framework in use. In order to justify this confidence, the team needed to present a description of the lifecycle framework in use.
KIDT framework architecture provides solid and trustworthy basis
The proposed solution is the KIDT framework architecture, which outlines the data and ML workflow in a structured manner. It encompasses data acquisition, processing, enrichment, curation, validation, ML training, validation, and a mapping of potential tool building blocks that support this process. Describing such a structured framework architecture forms the foundation for confidence in the completeness, quality, and efficiency of the interacting elements.
In the next step, specific methods and tools for the different building blocks in the framework architecture are considered. The entire framework can be grouped into four main sections: (1) scenarios and the production of synthetic data, (2) real-world data recording, curation, and refinement, (3) data storage, analysis, and discoverability, and (4) data set usage for ML function development.
Within these four sections, designated methods, such as taxonomies for describing and detecting corner cases, metadata enrichment, data search, curation, and active learning, synthetic data generation, style transfer, and usage in mixed training approaches, are being developed.
It is crucial that the theoretical framework architecture aligns with applicable methods to meet the requirements for a comprehensive data and ML training loop. When addressing future ISO 8800 requirements, evaluating each of the above-mentioned methods and solutions is essential to gain trust and confidence. Furthermore, presenting and even demonstrating suitable method examples are indispensable steps toward real-world application and establishing credibility in the approach.