Architecture & Concept

We want to build a software platform to address the needs for annotation, recognition and fusion of video and vehicle data by building a software platform for efficient and collaborative semiautomatic labelling and exploitation of large-scale video data.

By making both the ADAS and the dynamic mapping sectors capable of handling large amount of data we will be able to:

  • Create large training datasets of visual samples for training models to be used in vision-based detection.
  • Generate ground truth scene descriptions based on objects and events to evaluate the performance of detective algorithms and systems

By automating or semi-automating the video annotation process we can bypass the human factor and open up the possibilities for video analytics triggered innovation in the automotive domain.

Main Conceptual Modules

Mobile sensors in vehicles

These are the vehicles that gather sensor data via recording software and a given specified format. There will be procedures defined for transmitting incremental streams to the cloud, and ways to update not only raw data from sensors, but also the (pre)processed metadata from ADAS systems. This input corresponds to large volumes of data, in the order of terabytes recorded per day and per car.

Data fusion

Multiple sources/vehicles may differ in type, volume or nature of the sensorial data. Also, 3rd parties may interact with the Cloud-LSVA platform by uploading content from other open datasets. Therefore, a data fusion stage is devised in order to work as a normalization interface layer to the Cloud-LSVA system.

Large scale cloud-infrastructure

Computation and storage will leverage a cloud-based network that provides accurate and secure transfer of large data sets from the sensing location to the cloud. Particular attention will be paid to maximizing data throughput and reliability and the processing capabilities required to launch the video and semantic analytics routines to semi-automate annotation. Cloud-LSVA will progress from TRL 5 to 7

Business logic

This is the main orchestration module managing the interoperability and control of the separate modules. It also defines the external interface to 3rd party and public domains

Video annotation

The main inputs for adaptive automatic analysis are the initial annotations of objects, events and scenes as understood by a human observer. From these, the complex understanding of the entire scene can be transferred to computerised recognition models. A semi-automated annotation task will be considered, supported by automatic large scale video analysis in the cloud as well as local, task specific, machine learning algorithms. Interaction mechanisms with the human annotators will be specifically designed to minimize the time and cost of the labelling task

Video analytics

This module will automatically extract the traffic related information, considering ADAS and dynamic cartography scenarios, for automated annotation, traffic security related event recognition and scene classification. It will employ distributed and scalable machine learning approaches in order to account for variable volumes of input data. This is the main process that will automatically generate high value metadata as a concise representation of the inputs.

Supervised learning

Human observations in the form of annotations and corrections will be continuously collected for on-line update of the automatic data analytics modules.

Evaluation & benchmarking

The assessment of the readiness of the generated computer vision models for automatic on-the-cloud analysis as well as for the export of local processing models, a continuous evaluation of the models will be pursued. Task specific ground truth datasets will be synthesised which will be provided for public benchmarking of related automotive (ADAS, maps) systems and novel machine learning algorithms.

The 3 Cycle Approach