Abstract: The acquisition of large, annotated image datasets,
required for the training of semantic segmentation models,
is often an arduous task. This is because of the timeconsuming,
complicated and error-prone nature of the
process of manual image labelling. This process also often
requires specialized software and domain knowledge. These
problems can be circumvented by utilizing a generative
model to create synthetic automatically labelled datasets. In
this paper, we propose a generative model in the form of a
3D scene, representing an urban environment. A virtual
camera setup is used to acquire labelled images from the
virtual urban environment. Each image is stored as a multichannel
EXR file, containing RGB data as well as an
additional channel for each object class. These channels
contain binary values which indicate whether a pixel
belongs to the target class. These images are used to form a
dataset for the training of semantic segmentation models.
The viability of the generated dataset is evaluated by testing
the trained semantic segmentation model on real world
manually annotated images. |