UIJRT » United International Journal for Research & Technology

Caption Recommendation System

Total Views / Downloads: 316 

Cite ➜

Asawa, J., Deshpande, M., Gaikwad, S. and Toshniwal, R., 2021. Caption Recommendation System. United International Journal for Research & Technology (UIJRT), 2(7), pp.04-09.


Caption generation is the challenging neural network problem of generating a human-readable textual description to the given photograph. It requires understanding from the domain of computer vision as well as from the field of natural language processing. Every day, we encounter a large number of images on social media. These sources contain images that viewers would have to interpret themselves. Image captioning is important for many reasons. For example, Facebook and Twitter can directly generate descriptions based on images. The descriptions can include what we wear, where we are (e.g., beach, cafe), and what we are doing there. To generate automatic captions, image understanding is important to detect and recognize objects. It also needs to understand object properties and their interactions with other objects and scene type or location. Generating well-formed sentences requires both semantic and syntactic understanding of the language. In deep learning based techniques, features are learned automatically from training data and they can handle a large set of images and videos. Deep learning techniques such as CNN will be used for image classification and RNN encoders and decoders will be used for text generation that is captions for the provided image. Language models such as LSTM will also be implemented in both sentiment analysis and caption generation.

Keywords: Computer Vision, Automatic captions, Semantic, SyntacticDeep learning, CNN, RNN, LSTM, Sentiment analysis, Caption generation.


  1. Tehseen Zia, Shahan Arif, Shakeeb Murtaz,Mirza Ahsan Ullah. “Text to image generation with attention based recurrent neural networks.” arXiv: 2001.06658,2020.
  2. Niange Yu, Student member, IEEE , Xiaolin Hu , Senior Member , IEEE , Binheng Song , Jian Yang , and Jianwei Zhang. “Topic Oriented Image Captioning Based on Order Embedding, Image processing”. Volume.28, no-6 , JUNE 2019.
  3. Chetan Amritkar and Vaishali Jabade. “Image Caption Generation using Deep Learning Technique”. 25th April 2019.
  4. Alessandro Ortis, Giovanni Maria Farinella, and Sebastiano Battiato. “An Overview on Image Sentiment Analysis: Methods, Datasets and Current Challenges”. 2019.
  5. Zakir Hossain, Ferdous Sohel, Mohd Fairuz Shiratuddin and Hamid Laga. “A Comprehensive Survey of Deep Learning for Image Captioning. arXiv: 1810.04020v2, 14th October, 2018.”
  6. Marco Pedersoli, Thomas Lucas, Cordelia Schmid and Jakob Verbeek. “Areas of Attention for Image Captioning”. 2017
  7. Jyoti Islam and Yanqing Zhang. “Visual Sentiment Analysis for Social Images Using Transfer Learning Approach”. October, 2016.
  8. Aaron van den Oord, Nal Kalchbrenner, and Koray Kavukcuoglu. Pixel recurrent neural networks. arXiv preprint captioning can cause the development of the theory an arXiv:1601.06759, 2016
  9. Karol Gregor, Ivo Danihelka, Alex Graves, Danilo Jimenez Rezende, and Daan Wierstra. Draw: A recurrent neural network for image generation. arXiv preprint arXiv:1502.04623, 2015
  10. Stuti Jindal    and         Sanjay Sin10.1051/matecconf/201820000020
  11. Ruslan Salakhutdinov. Learning deep generative models. Annual Review of Statistics and Its Application, 2: 361–385, 2015.
  12. Alec Radford, Luke Metz, and Soumith Chintala. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434, 2015.
  13. Vasavi Gajarla and Aditi Gupta. “Emotion Detection and Sentiment Analysis of Images”. 2015
  14. Gregor Blossey, Jannick Eisenhardt, Gerd Hahn, “Blockchain Technology in Supply Chain Management:An Application Perspective”, doi:10.24251/HICSS.2019.824

For Conference & Paper Publication​

UIJRT Publication - International Journal