Providing Captions to the Detected Images: Deep Neural Networks and LSTM

Authors

  • Lokesh Tiwari
  • Deepak Singh
  • Sajal Gangwar
  • Arvind Kumar

Keywords:

CNN, RNN, LSTM, detection, captioning

Abstract

Automatically generating captions for images seems to be very interesting but at the same time the task is quite challenging as training a machine with proper comprehension of the image, it’s background, detecting the objects present in it and finally generating the captions is only the frontline projection; The difficult part happens behind the scenes: selecting the dataset, training the model, validating the model, developing pre-trained models to test the photos, utilizing CNN for image detection, and ultimately using RNN & LSTM for caption generation - all are equally critical in completing the task. Image Processing is one of the most interesting and focused fields of Artificial Intelligence, with numerous problems to overcome. The image processing has two steps: The primary step is to detect the image, or the graphics and the secondary step is to tell what the image is about, what is depicted by it!! Understanding images with the help of previously fed data would train our machine accordingly and as the amount of training data would be increased the ability of machine to interpret the images would also increase.

Published

2022-03-28