Imagine using Face-ID to unlock your phone, but having to take thousands of images of yourself in various environments, for your phone to finally recognize your face [1]. If it weren’t for few-shot learning (FSL), this would be the case. A typical modern deep neural network for object detection, trained on the state-of-the-art ImageNet dataset, might require several hundred thousand images of oboes, trumpets, and flutes to detect them, and then another several hundred thousand to detect trombones [2]. On the other hand, even children are able to differentiate between trumpets and trombones after only seeing a few examples. FSL aims to create models that can make accurate predictions after only seeing a very small training dataset [1]. However, this doesn’t mean that the models can make accurate predictions out of thin air. The models somehow need to get useful additional information.

Researchers have identified three main approaches to FSL to get that additional information: the data, model, and algorithm approach [3]. The data approach in some way involves increasing the size of the training data for the FSL model. This can be done by translating, flipping, or modifying image samples in some other way [4] to generate more data, transforming samples from similar data sets [5], and more. By essentially increasing the data set, this approach to FSL can leverage more traditional deep neural networks that excel at making accurate predictions from large data sets. For example, if you want to classify apples, oranges, and bananas, but you only have a picture of each, you can slightly rotate training images, flip them horizontally, and crop them. All of these images are still fundamentally bananas. A traditional classifier can use these additional training examples to then make better predictions.. 

The second approach, the model approach, uses insights from learned representations. Embedding learning for FSL is an example of the model approach, as a learned embedding function can embed samples into a lower-dimensional space where a simpler model that doesn't require as much data can be used for classification [6]. One of the most popular deep learning models today (and biggest with 175 billion parameters), GPT-3 is also in this category [7]. GPT-3 is fundamentally a language model, meaning that given a prompt, it is able to predict what comes next in a text. It utilizes a new architecture called transformers [8], to learn incredibly useful data representations that it effectively leverages during test time. 

The third approach, the algorithm approach, mainly involves learning how to learn, or meta-learning. In meta-learning, there are two learning algorithms at play: a meta-learner and the regular learner. The meta-learner usually learns to create the learner to perform optimally on a task [9]. In a way, it’s similar to how parents may learn to become better parents as they have multiple children over the years, and therefore each successive child is hopefully more successful (it works better for meta-learning algorithms than for humans). This approach may be hard to understand, but it has produced some of the most exciting results in recent years [10]. 

Already, FSL models have incredible real world applications for low-data drug discovery [12], voice cloning from short audio snippets [13], and other domains where there is little labeled training data. AI pioneer Geoffrey Hinton said, “I do believe deep learning is going to be able to do everything, but I do think there’s going to have to be quite a few conceptual breakthroughs” [11]. As FSL models become better and better at learning from small data sets, similar to humans, they have the potential to lead us to one of these incredible “conceptual breakthroughs”. Who knows where few-shot learning could take us in fifty years.



References:

  1. Ozsubasi, W. (2020, November 1). Few-Shot Learning (FSL): What it is & its Applications. Retrieved July 16, 2021, from https://research.aimultiple.com/few-shot-learning/
  2. ILSVRC2017. (n.d.). Retrieved July 16, 2021, from https://image-net.org/challenges/LSVRC/2017/browse-det-synsets.php
  3. Wang, Y., Yao, Q., Kwok, J., & Ni, L. M. (2020). Generalizing from a few examples: A survey on few-shot learning. ArXiv:1904.05046 [Cs]. http://arxiv.org/abs/1904.05046
  4. Shyam, P., Gupta, S., & Dukkipati, A. (2017). Attentive recurrent comparators. ArXiv:1703.00767 [Cs]. http://arxiv.org/abs/1703.00767
  5. Tsai, Y.-H. H., & Salakhutdinov, R. (2018). Improving one-shot learning through fusing side information. ArXiv:1710.08347 [Cs]. http://arxiv.org/abs/1710.08347
  6. Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., & Darrell, T. (2014). Caffe: Convolutional architecture for fast feature embedding. ArXiv:1408.5093 [Cs]. http://arxiv.org/abs/1408.5093
  7. Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. ArXiv:2005.14165 [Cs]. http://arxiv.org/abs/2005.14165
  8. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. ArXiv:1706.03762 [Cs]. http://arxiv.org/abs/1706.03762
  9. Hospedales, T., Antoniou, A., Micaelli, P., & Storkey, A. (2020). Meta-learning in neural networks: A survey. ArXiv:2004.05439 [Cs, Stat]. http://arxiv.org/abs/2004.05439
  10. Finn, C., Abbeel, P., & Levine, S. (2017). Model-agnostic meta-learning for fast adaptation of deep networks. ArXiv:1703.03400 [Cs]. http://arxiv.org/abs/1703.03400
  11. Hao, K. (2020, November 3). AI pioneer Geoff Hinton: “Deep learning is going to be able to do everything.” Retrieved July 16, 2021, from https://www.technologyreview.com/2020/11/03/1011616/ai-godfather-geoffrey-hinton-deep-learning-will-do-everything/
  12. Altae-Tran, H., Ramsundar, B., Pappu, A. S., & Pande, V. (2017). Low data drug discovery with one-shot learning. ACS Central Science, 3(4), 283–293. https://doi.org/10.1021/acscentsci.6b00367
  13. Arik, S. O., Chen, J., Peng, K., Ping, W., & Zhou, Y. (2018). Neural voice cloning with a few samples. ArXiv:1802.06006 [Cs, Eess]. http://arxiv.org/abs/1802.06006