An Experience from Kaggle Competition on Human Protein Classification

An Experience from Kaggle Competition on Human Protein Classification

January 15, 2022

"In this blog post, I wish to explain my experience in a Kaggle competition where I secured Top 15%. "

Competition synopsis:

This is a classification task. It tries to classify human protein images based on their type. It is one of the complex image classification approaches obtaining significantly lesser scores. F1 score is the performance measure for this task. The baseline model usually fetches the F1 score of 21%. All the input images are of size 512*512*3.

Solution Approach:

I've decided to use Convolutional Neural Network (CNN) for this model as it has to learn protein parts, and a single image will have many proteins. So, it is a multi-class classification approach. I implemented CNN using PyTorch. Henceforth, I'd like to explain the different models I used and learn from them.

A Basic CNN model:

This model is a starter one. It has eight convolution layers, and all these convolutions will have a Relu activation function and a max-pooling. Following convolution, a few linear layers will be in connection. Finally, softmax provides the output. The results of this model are as follows: Before any training, the evaluation on validation dataset gave the below result: {'val_loss': 0.4375806450843811, 'val_score': 0.0}

I opted for ten epochs and a learning rate of 1e-4 for training.

Transfer Learning:

Since the basic CNN has not provided a good result, it may not offer higher performance even after hyperparameter tuning. So, I shifted to the transfer learning approach. Transfer learning is obtaining an already trained model with its parameter for a similar problem and using it for our situation. It has been giving better results for many image processing problems specifically. So, I've opted for it.


Kaggle provides 16 GB RAM size. Many models like Vgg16 consume a lot of memory and do not have room for training data load. So, I was not able to use these models.


In a residual network model, the l-2 layer outputs and the l-1 layer outputs are inputs to the l layer.

  • This method can reduce the residuals because there is less variability.
  • There are four main tiers to it.
  • There are multiple convolution operations in each layer.
  • Batch normalization and Dropout are also part of the package.

Relu activation, Dropout, and a few linear connectors are in the final layer. In the last layer, we have a sigmoid layer, which gives us the probabilistic outcome.

The results are as follows: Before any training, the evaluation on validation dataset gave below result: {'val_loss': 0.693974494934082, 'val_score': 0.2634097635746002}

For training, I selected ten epochs and a learning rate of 1e-4. We lost by 0.2002 for validating, and we lost by 0.0878 for training after ten rounds. The validation dataset yielded a 75% F1 score for this model. There's no doubt that this is a good match. So, this was my first attempt at entering this contest. This model had a 68 per cent F1 score on the public leaderboard. So, there appears to be more work to be done.


As we can see, ResNet50 was not so fruitful in providing the results. So, I used ResNet34 and followed by ResNet with Hyperparameter tuning. The ResNet34 has not moved much, with the F1 score around 75%.

ResNet with Hyperparameter tuning:

I tuned the hyperparameters with Adam optimizer, and the results showed the F1 score to be around 76%. Any further tuning did not result in F1 score improvement. So, I decided to go with ResNet with Hyperparameter tuning as the final model.

Problems to be addressed:


Throughout this model building process, I was able to see a pattern of overfitting after 15 epochs. Using ResNet helped to reduce the overfitting situation. I also tried to use BatchNormalization and Dropout. Still, there is an overfitting situation after some number of epochs. After that, I saw a drastic increase in validation loss. Finally, training loss goes near to 0.


This project has increased my knowledge of transformers. Hyperparameter tuning, Learning rate cycling, and Early stopping were the most critical findings from this project regarding how to get the best possible results.

Given the short period, I could do minimal tuning, and several models utilized are also lesser. In future competitions, I may involve other models to give good performance.


Dr Santhoshkumar S PhD,
Researcher | Senior Technology Journalist

Get a FREE Digital Marketing Review