January 13, 2022
Computer Progress is a new website launched by a team of researchers from MIT, Yonsei University, and the University of Brasilia that analyses the computational burden of over 1,000 deep learning research papers. The site's data indicate that the computational load increases faster than expected, implying that algorithms still have room for improvement.
The research team analyzed 1,058 research papers on deep learning on arXiv to train the model. The lower bound on computational burden is theoretically a fourth-order polynomial in performance. However, the researchers found that ImageNet image classification algorithms scale as a ninth-order polynomial, requiring 500 times the computation to reduce the error rate by half. According to the authors, these scaling trends indicate that researchers should develop more efficient algorithms.
Deep neural networks are frequently over-parameterized than expected, given the training data size. This research demonstrated empirically to improve model performance and generalization, while training methods such as stochastic gradient descent (SGD) and regularization prevent models from becoming too well-fit. Additionally, researchers discovered that increasing model performance or accuracy requires increased training data, which grows in model size.
To test this hypothesis, the researchers analyzed deep learning papers in various fields within computer vision (CV) and natural language processing (NLP), including image recognition, object detection, question answering, named-entity recognition, and machine translation. They extracted the accuracy metrics for the models discussed in the papers and the computational burden of training the models, defined as the number of processors multiplied by the computation rate multiplied by the time (essentially, the total number of floating-point operations). They then used linear regression to express the model's performance in terms of computation. These equations demonstrate that model performance scales significantly worse than the fourth-degree polynomial predicted by theory: from 7.7th degree for question answering to a degree "around 50" polynomial for object detection, named-entity recognition, and machine translation.
However, improved algorithms may resolve these scaling issues. According to the MIT team's research, "three years of algorithmic improvement equates to a tenfold increase in computing power." In 2020, OpenAI conducted a similar study of image recognition algorithms and discovered that "since 2012, training a neural network to achieve the same performance on ImageNet classification has decreased by a factor of two every 16 months." Thompson and a colleague recently surveyed 113 computer algorithm problem domains , including computer networking, signal processing, operating systems, and cryptography, to determine how improved algorithms improved problem-solving performance. They discovered that while "around half" of problems or "algorithm families" did not improve, 14% achieved "transformative" improvements, and 30%–43% achieved improvements "comparable to or greater than those experienced by users due to Moore's Law."
Additionally, the Computer Progress team suggested several complementary approaches for increasing the efficiency of deep learning. Quantization and pruning can help reduce the power consumption required by large deep learning models. Finally, meta-learning enables model training. Moreover, Computer Progress hosts the compute vs performance scaling data, as well as links to the underlying papers and a call for researchers to submit their performance results.
Image source: Unsplash
Dr Nivash Jeevanandam PhD,
Researcher | Senior Technology Journalist