Firstly, thank you for the great work! As the paper mentioned I notice that the bandwidth is set to 30 when you generate the new MNIST dataset. For each I wonder how to decide the parameter Why you set Thank you! |
Ultimately it is a hyperparameter, similar to the spatial resolution in a planar CNN. In a planar CNN this is purely determined by by the stride of the convolutions and the pooling, but in spherical CNNs you have more flexibility: you can in principle choose the resolution freely in each layer. There are currently no good "best practices" for spherical CNN architecture design, and this includes the bandwidth/resolution, but there are a couple of considerations that would factor into the decision:
We choose bandwidth=30 because it's not too large, but still allows us to represent MNIST digits without losing too much detail. MNIST images are 28x28, but we project them only on the top of the sphere, so using a spherical grid with 2*b=60 samples per dimension, we can represent it fairly accurately. |
Thank you for your quick reply! As the paper describes, Thank you! |
Also, for VGG Net, there are some nn.MaxPool2d(kernel_size=2, stride=2) layers. Is there any implementations of MaxPool2d() operation or MaxPool3d() operation for Spherical CNN? Probably Or probably we don't have to think about pooling at all since there is no Thank you for your suggestion. |
We projected onto the northern hemisphere because that way the digit doesn't get stretched too much. It's just a toy experiment so we didn't think about this too much. Projecting it on the whole sphere would most likely work as well. Max pooling is a bit tricky. You could just do nn.MaxPool2d or 3d on the array that stores the feature map, but due to the inhomogeneous sampling grid, this would not be equivariant. It would probably still be approximately equivariant, and may work in practice. so3_integrate() does a global average pooling. If you want to do a local average pooling, you could use a convolution with a fixed Gaussian blur filter, and sample the result on a low-resolution (low-bandwidth) grid. |
Great thanks to your quick reply! Your work is so fascinating! |