We present the novel View-of-Delft (VoD) automotive dataset. It contains 10.000 frames of synchronized and calibrated 64-layer LiDAR-, (stereo) camera-, and 3+1D radar-data acquired in complex, urban traffic. It consists of more than 76000 3D bounding box annotations, including more than 15000 pedestrian, 7100 cyclist and 17000 car labels.
This example introduces the classes that help to load data from the dataset for a specific frame. The visualization of the data is also included to aid understanding. The outline below provides an overview of the topics covered in this notebook.
The KittiLocations
class stores the location paths of where the data is expected. By default, the following scheme is used:
View-of-Delft-Dataset (root)
├── lidar
│ │── ImageSets
│ │── training
│ │ ├──calib & velodyne & image_2 & label_2 & pose
│ │── testing
│ ├──calib & velodyne & image_2 & pose
|
├── radar
│ │── ImageSets
│ │── training
│ │ ├──calib & velodyne & image_2 & label_2 & pose
│ │── testing
│ ├──calib & velodyne & image_2 & pose
|
├── radar_3_scans
│ │── ImageSets
│ │── training
│ │ ├──calib & velodyne & image_2 & label_2 & pose
│ │── testing
│ ├──calib & velodyne & image_2 & pose
|
├── radar_5_scans
│── ImageSets
│── training
│ ├──calib & velodyne & image_2 & label_2 & pose
│── testing
├──calib & velodyne & image_2 & pose
If the locations need to be altered, a class similar to KittiLocations
can be created.
The class required the following arguments:
root_dir
: The root directory of the dataset.output_dir
: Optional parameter of the location where output such as pictures should be generated.frame_set_path
: Optional parameter of the text file of which output should be generated.pred_dir
: Optional parameter of the locations of the prediction labels.Based on these parameters, the location of sub-folders are automatically defined, as shown in the example below:
from vod.configuration import KittiLocations
kitti_locations = KittiLocations(root_dir="view_of_delft_PUBLIC",
output_dir="example_output",
frame_set_path="",
pred_dir="",
)
print(f"Lidar directory: {kitti_locations.lidar_dir}")
print(f"Radar directory: {kitti_locations.radar_dir}")
The FrameDataLoader
class is responsible for loading any possible data from the dataset for a single specific frame. The constructor requires a KittiLocations
object, and a frame number as an input and creates the properties which can load and store data from the dataset upon request. This means, data is only loaded when required, then stored for further use.
The code snippet below shows how the class can be instantiated.
from vod.frame import FrameDataLoader
frame_data = FrameDataLoader(kitti_locations=kitti_locations,
frame_number="01201")
The camera provides colored, rectified images of 1936 × 1216 pixels at around 30 Hz. The horizontal field of view is ~64° (± 32°), vertical field of view is ~ 44° (± 22°). Images are stored in jpg
files. They can be easily visualized as shown below. Identifiable features such as faces and license plates have been blurred in the example dataset that is located in the GitHub repository.
import matplotlib.pyplot as plt
imgplot = plt.imshow(frame_data.image)
plt.show()
The LiDAR sensor is a Velodyne 64 sensor mounted on the top of the vehicle, operating at 10 Hz. The provided LiDAR point clouds are ego-motion compensated both for ego-motion during the scan (i.e. one full rotation of the LiDAR sensor) and ego-motion between the capture of LiDAR and camera data (i.e. overlaying camera and LiDAR data should give a consistent image).
LiDAR point clouds are stored in bin files, where each bin file contains a 360° scan in a form of a Nx4 array, where N is the number of points, and 4 is the number of features: [x,y,z,reflectance]
.
The example below shows an example of how the raw data can be retrieved, and the points are also plotted in a 3D plot using the vod_visualization
package, included in the repository.
print(frame_data.lidar_data)
# 3D Visualization of the point-cloud
from vod.visualization import Visualization3D
vis_3d = Visualization3D(frame_data=frame_data)
vis_3d.draw_plot(lidar_origin_plot=True, lidar_points_plot=True)
The radar sensor is a ZF FRGen21 3+1D radar (∼13 Hz) mounted behind the front bumper. The provided radar point clouds are ego-motion compensated for ego-motion between the capture of radar and camera data (i.e. overlaying camera and radar data should give a consistent image). The radar point clouds are stored in bin files, where each bin file contains a set of points in the form of a Nx7 array, where N is the number of points, and 7 is the number of features: [x, y, z, RCS, v_r, v_r_compensated, time]
Similiar to Lidar, the example shows the retrieval and plotting of the data. The 3D plot also shows one of the superior features of the radar point-cloud compared to lidar, namely that the radial velocity of each measurement is also captured.
print(frame_data.radar_data)
vis_3d.draw_plot(radar_origin_plot=True, radar_points_plot=True, radar_velocity_plot=True)
The labels include the ground truth data for the frame in kitti format including:
In total, 13 object classes were annotated: 13 object classes were annotated:
Note: This class only provides the raw data, which needs to be processed, as shown in upcoming notebooks.
The example shows the 3D plot of the annotated objects.
vis_3d.draw_plot(annotations_plot=True)
The following plot displays the information that a single frame contains in a combined view. The coordinate systems present, are further explained in Notebook 2: Frame Transformations Example.
vis_3d.draw_plot(radar_origin_plot=True,
lidar_origin_plot=True,
camera_origin_plot=True,
lidar_points_plot=True,
radar_points_plot=True,
radar_velocity_plot=True,
annotations_plot=True)