“Going Deeper With Convolutions,” by Szegedy et. al. (2015) is a great read, because it is a very thoughtful architecture. Reading about it and thinking about it really gives you a new perspective on neural network architectures, because it provides a new way to combine layers and the outputs of layers. The main idea is that, in deeper layers of a neural network, it might be helpful to use filters bigger than 3x3 because the neurons in those layers are covering more of the image in the abstractions that they are making.
There were a couple other notable observations in this paper for me. One was that the authors use 1x1 convolutions for dimensionality reduction (and increasing nonlinearity as a side effect). Here is a great blog post about 1x1 convolutional networks.
Another was that the researchers in this paper used a single 7x7 average pooling layer instead of a stack of fully connected layers before generating the output. They link to the “Network in Network” paper, by M. Lin, Q. Chen, and S. Yan. In 2013 to justify this replacement, which I have only read the abstract of, but which looks interesting. Supposedly average pooling makes the model more interpretable and less prone to overfitting.
While there are several other observations I could make, I want to note one that is most relevant to my work on the heart attack detection from ecg project. After each inception module, the researchers use a max-pooling layer which cuts dimensionality by half. This is very important, because the inception module requires same padding for each convolution inside of the inception module, so you can end up with wayyy too many parameters if you aren’t cutting the width of your data at each stage. (1x1 convolutions would allow you to modify the depth).
My overall conclusion is that this paper flings open the door to novel ways of combining layers in parallel as well as in stacks. While I am experimenting with inception modules in my own work, it’s exciting to look forward to things like LSTMs and ResNets, which also combine layers and outputs of layers using novel logics with novel hypotheses.