Deep learning (DL) using convolution neural networks (CNN) architectures is now a standard for solving classification tasks in images. When dealing with 3D data the problem becomes more complex. First, 3D data may be represented using various structures which include:
- Voxel grid
- Point clouds
- Multi-view
- Depth map
For the case of Multi-view and depth maps, the problem is converted into using 2D CNNs on multiple images. A Voxel grid can use extensions of the 2D CNN into 3D by simply defining 3D convolution kernels. However, for the case of 3D point clouds, it is not clear how to apply DL tools.
Second, the available data for images is still much larger (although, recently there is an increase in 3D datasets – more on that in a later post). However, for the 3D case, synthetic data may be generated easily.
Attached below are a list of papers which used DL tools on 3D data
- Voxel Grid – Volumetric CNN:
- Voxnet: A 3D convolutional neural network for real-time object classification
- Volumetric and multi-view CNNs for object classification on 3d data – compared volumetric CNNs to Multi-view CNNs for object classification. They showed that the multi-view approach performs better, however, the resolution of the volumetric model was limited
- 3D shapenetes: A deep representation for volumetric shapes
- Multi-View CNNs:
- Point clouds:
- Pointnet: Deep learning on point sets for 3d classification and segmentation – In this work they applied a convolution kernel on each point separately, creating a higher dimensional representation of each point and then max-pooling over the entire point set (max pooling used as a symmetric function) to get invariance to permutations of the input cloud (since there is no geometrical significance to the point order).
- Hand-crafted features + DNN :
- 3D deep shape descriptor – fed heat kernel signatures (HKS) descriptor into an NN to get an Eigen-shape descriptor and a Fischer shape descriptor.