Recently we published a paper on 3D point cloud classification (and segmentation) using our proposed 3D modified Fisher Vector (3DmFV) representation and convolutional neural networks (CNNs). The preprint is available on ArXiv and the final version is available in Robotics and Automation Letters (RA-L) journal.
I believe in making research accessible to everyone so I give here a brief explanation about the 3DmFV representation for 3D point clouds.
Before reading this post it is important to have a solid understanding of Gaussian Mixture Models (GMMs), Fisher Vectors (FVs), and our 3D modified Fisher Vector representation for 3D point clouds. If you are a bit rusty on these subjects I have three previous posts about them:
- Gaussian Mixture Models (GMM)
- Fisher Vectors (FV)
- 3D modified Fisher Vectors (3DmFV) representation for 3D point clouds
I recently posted a short literature overview of 3D point cloud classification methods. Most notable ones are PointNet, PointNet++ and K-d Network.
Recall that the 3DmFV representation converts the 3D point cloud (which is unstructured, unordered and may have a variable number of points) into a special kind of statistical representation on a 3D grid of Gaussians with constant size.
The image below shows a visualization (in 2D, the representation is actually 4D: 3D grid with 20 channels) of the 3DmFV representation for several 3D point clouds. Each column in the image represents a single Gaussian and each row represents a symmetric function over derivative with respects to a Gaussian parameter over the points.
Input: A 3D point cloud (a matrix with n rows and 3 columns representing the XYZ coordinates.
Output: Classification score
The method consists of two main modules:
- 3DmFV module – converting the 3D point cloud into the 3DmFV representation on a 3D grid.
- Network module – consisting of multiple 3D CNN layers (inspired by inception) followed by several fully connected layers.
The image below summarizes the architecture details.
We train and test on the ModelNet40 /ModelNet10 datasets from Princeton. It includes 40/10 classes divided into 9843/3991 point clouds for training and 2468/908 point clouds for testing.
I will not include the charts and tables, if you are interested, make sure to read our 3D point cloud classification paper. However, I will summarize:
- 3DmFV-Net achieves good accuracy for point cloud classification.
- It operates in real-time using GPU (despite the representation computation time).
- 3DmFV-Net is robust to various types of data corruptions (rotations, outliers, Gaussian noise, occlusions)
- An ablation study reveals that a grid of 8 x 8 x 8 Gaussians is enough to achieve great performance.
The code for training 3DmFV-Net on ModelNet datasets is available on my GitHub repository.