I recently read a paper in which the researchers were able to achieve up to ~90% sensitivity and specificity using a convolutional neural network. I tried to replicate their research, and I did not do as well. Despite using a similar architecture to the one presented in the paper, my models overfit big time. The one thing I am aware of in their paper that I did not do is downsample my data, largely because I could not parse the method they used to smooth and reduce the size of their data. They linked to another paper for directions on downsampling the ekg data, but I am not a member of an academic institution, so I could not get access.
In the model that I am working on, I use layered inception modules like those used in the GoogLeNet Paper. I grid searched through models with every combination of the following:
layers: 1, 2, 4
Number of filters in tower of each layer: 30, 15
I trained each of these 6 models using a batch size of 50 and a learning rate of 0.001.
The models with more than one layer all overfit to the training set. For example, the model with the following parameters reached around 90% accuracy on the training set, but barely better than random on the validation set.
number of filters: 15
number of layers: 2
The single layer models did not overfit as much. They still overfit though. Here are the parameters and results for the best of these single layer models.
number of filters: 30
number of layers: 1
To improve on this work, I need to work hard on getting the overfitting under control. Here are some options for doing that:
Augment my training dataset. I could do that by slicing all of my samples in half again so that they are each 5 seconds long. I could probably do that twice if necessary. So long as I am only using a small portion (.15) of that data for validation, that would more than double the size of my training set for each round of augmentation. I could also horizontally translate my data.
Simplify the model. Right now I am using a several-layer deep ineption-module network. I am doing this because I’m not sure what size filters to use or when to use pooling layers. I admit that that is not the most advanced thinking about how to go about model architecture decisions.
Add drop out regularization in between the inception modules. According the the AlexNet paper, his reduces “complex co-adaptions” between neurons, “since a neuron cannot rely on the presence of particular other neurons.” (AlexNet paper).