What is 3D modified Fisher Vector (3DmFV) representation for 3D point clouds

Recently we published a paper about 3D point cloud classification (and segmentation) using our proposed 3D modified Fisher Vector (3DmFV) representation and convolutional neural networks (CNNs).  The preprint is available on ArXiv and the final version is available in Robotics and Automation Letters (RA-L) journal.

I believe in making research accessible to everyone so I give here a brief explanation about the 3DmFV representation for point clouds.

Before reading this post it is important to have a solid understanding of Gaussian Mixture Models (GMMs) and Fisher Vectors (FVs). If you are a bit rusty on these subjects I posted two primers in the links below:

  1. Gaussian Mixture Models (GMM). 
  2. Fisher Vectors

The 3D modified Fisher Vector (3DmFV) generalizes the Fisher Vector along two directions

  1. GMM choice – 3DmFV uses a uniform Gaussian grid – The Gaussians parameters ( mean, standard deviation and weights) are not estimated from the data, they are predefined and position on a 3D grid, with equal weights and standard deviation.
  2. Symmetric functions – adding minimum and maximum (in addition to the summation) operating on all points, for each Gaussian.

The Math

3DmFV is formally defined by :

3DmFV_{\lambda}^X = \left[ \begin{array}{c}\left. \sum_{t=1}^TL_\lambda\nabla_\lambda\log u_\lambda(p_t) \right|_{\lambda=\alpha,\mu,\sigma} \\\left. \max_t(L_\lambda\nabla_\lambda\log u_\lambda(p_t)\right|_{\lambda=\alpha,\mu,\sigma} \\\left. \min_t(L_\lambda\nabla_\lambda\log u_\lambda(p_t))\right|_{\lambda=\mu,\sigma} \end{array}\right]

The Intuition

It is much easier to understand the 3DmFV for points with some nice visualizations.

Let’s take a single 3D point in a single Gaussian and show its 3DmFV (here the grid is  basically 1x1x1):

Now let’s see what happens when we move the point (hint 3DmFV changes)

Next, we can see what happens when we move the point all around (the .gif file might take a few seconds to load).

Finally, let’s see how the 3DmFV looks like when we take multiple points on a GMM of 2x2x2:

Reconstruction from 3DmFV

Some may argue that 3DmFV is simply another handcrafted feature. However, we argue that it is simply another form to represent the data and therefore, the process is reversible. It is possible to show analytically that for simple cases it is reversible (single point, single Gaussian, points on a plane in a Gaussian). It gets a bit more complex in the general case when more points and more Gaussians are present. Therefore, we trained a simple 3DmFV decoder that is able to take a 3DmFV representation as input and produce a 3D point cloud as output.

Here is an image of a reconstructed point cloud of an airplane:

The Code

In order to recreate the images above you can use my repository for this 3DmFV tutorial on my GitHub.

For 3D point cloud classification using the 3DmFV representation use the 3DmFV-Net repository which uses TensorFow to perform the computation on the GPU.