The Basis: Data

Data types

Real data is data that is collected and recorded in real road traffic. They have the advantage that they accurately reflect reality. Their collection, however, is very time-consuming and cost-intensive. They can also only depict a limited section of reality, as it is impossible to collect all possible traffic situations and scenarios. In particular, data on accidents and critical situations cannot be collected from test vehicles. In addition, real data has to be manually labelled in a very time-consuming way before the AI can be reliably trained with it.

Due to these limitations in the use of real data, synthetically generated data is increasingly used for the training of AI functions. These data can be used to systematically create the most diverse traffic situations and to vary them as needed. In the same way, critical situations can be modelled without danger. However, the realism of these data as well as the transferability of AI models trained with such data to reality must be examined and proven.

If real data are subsequently extended or augmented with synthetic data, this is called augmented data.

Real data in the project

In KI Data Tooling, real data is being collected by two test vehicles throughout Germany and additionally at two research intersections in Braunschweig and Aschaffenburg. The project thus creates an extensive database of camera, radar and lidar data.

On the basis of these collected real data, various methods and tools are being developed in the project, which should enable automated labeling and an efficient preparation and refinement of real data in the future. For this purpose, different sensor and infrastructure data will be linked and context information will be made available in order to better analyze and process such data.

Synthetic data in the project

In order to take full advantage of synthetic data for training AI-based functions, AI Data Tooling generates camera, lidar and radar data synthetically. Based on digital twins of the two research intersections and a scene catalog developed in the project, a toolchain architecture for the systematic generation of synthetic sensor data will be extended and implemented. The main focus is on the development of metrics for the evaluation of quality and concepts for the validation of these synthetically generated data.

Augmented data

In order to be able to augment collected real data with synthetically generated objects and road users, the project develops appropriate augmentation procedures for camera, lidar and radar data. Metrics to systematically evaluate the quality of such augmented data are also being developed.

Using efficiency potential

In order to be able to optimally use and process the ever-increasing amounts of data, KI Data Tooling investigates various aspects of efficiency in data provision. In addition to new methods for more efficient compression and storage of data, the main focus is on the development of new tools for the abstraction of sensor technology and the automated detection and synthesis of corner cases.

Optimized AI training strategy

Based on the database created in this way, AI Data Tooling optimizes and further develops training procedures for AI-based functions. The potentials resulting from the use of different data combinations of real, synthetic and augmented data are identified and evaluated. In addition, the transferability to reality will be tested: training strategies will be investigated, which train AI functions using a mixture of synthetic, real and augmented data. This will ensure that AI functions for highly automated and autonomous driving can be trained optimally and efficiently with the right data.