Real-Time Audio Video Summarization

Authors

  • N. S. Patil
  • Sheetal S. Patil
  • Avinash M. Pawar
  • Chitresh Goel
  • Navneet Rathore
  • Pratyush Upadhyay

Keywords:

Deep Learning, Audio-video, Summarization, Activity net.

Abstract

Video processing has grown in importance for several reasons in today's advanced technological environment where everything is changing at a much faster rate. It is crucial that various video formats, such as surveinillance, social media, and informational videos, are used in our surroundings and daily lives. Various items can be seen, videos can be summarised and characterised, and data can be found with video recording. Additionally, it can benefit the blind by describing what is going on around them and improve military operations and surveillance by spotting threats and assisting soldiers and weaponry in eliminating them. A video encoder and a caption output frame are used by the video caption generator. In this seminar report, we covered two models: the Multistream Hierarchical Boundary Model and the Hierarchical Model. combination of a hierarchical model and focused captions. Temporary clip level features from clips at predetermined video display times can be captured by hierarchical models. While the Steered caption model is a focus model where visual boundaries are used to direct, the immersive hierarchical Boundary model is adopted with a soft system model with the use of feature-border cross-cutting in the Multi-stream Hierarchical Boundary model to define video clips. viewing behaviour in the right parts of the video The discussion of Gaussian parametric attention is included in this report. Gaussian attention spans place a cap on the lengths of stream videos needed for soft attention techniques.

Published

2022-08-25