Oculight: AI Based System for Visual Assistance to Blind and Visually Impaired People
Keywords:
AI, Assistive technologies, captioning model, image understanding, blind people, Android device, scene detection, multimodal interfaces, human-computer interactionAbstract
This research introduces a new captioning model that utilizes both image and caption models to produce textual descriptions of images. By combining a convolutional neural network (CNN) and a long short-term memory (LSTM) recurrent neural network (RNN), this deep learning architecture offers a promising solution for improving the quality of life and independence of blind people through AIbased systems. The system's implementation on an Android device ensures that it is highly accessible and user-friendly, while the integration of text-to-speech technology further enhances its potential for improving human-computer interaction. The CNN is responsible for extracting meaningful features from input images, which the RNN then uses to generate textual descriptions. This research highlights the potential for future development to expand the system's capabilities to support a wider range of visual tasks and environments and improve the efficiency and accuracy of the model. With further refinement, this technology could have a profound impact on the lives of blind people, improving their overall well-being and sense of independence.
References
Peter Young, Alice Lai, Micah Hodosh, Julia Hockenmaier. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans Assoc Comput Linguist. 2014; 2: 67–78. doi: https://doi.org/10.1162/tacl_a_00166
Singh R, Shukla R, Thakur M, Shinde S, Patil A. Drishti for blind - A smart assistant for navigation and go-to text reading. Int J Inf Technol (IJIT). 2021; 7(4): 18–23.
Nguyen Q-H, Tran T-H. Scene description for visually impaired in outdoor environment. 2013 International Conference on Advanced Technologies for Communications (ATC 2013), Ho Chi Minh City, Vietnam. 2013; 398–403. doi: 10.1109/ATC.2013.6698144.
Najm H, Elferjani K, Alariyibi A. Assisting Blind People Using Object Detection with Vocal Feedback. 2022 IEEE 2nd International Maghreb Meeting of the Conference on Sciences and Techniques of Automatic Control and Computer Engineering (MI-STA), Sabratha, Libya. 2022; 48–52. doi: 10.1109/MI-STA54861.2022.9837737.
Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556v6. 2014 Sep 4. https://arxiv.org/abs/1409.1556
Deng J, Dong W, Socher R, Li L-J, Kai Li, Li Fei-Fei. ImageNet: A large-scale hierarchical image database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA. 2009; 248–255. doi: 10.1109/CVPR.2009.5206848.
Albawi S, Mohammed TA, Al-Zawi S. Understanding of a convolutional neural network. 2017 International Conference on Engineering and Technology (ICET), Antalya, Turkey. 2017; 1–6. doi: 10.1109/ICEngTechnol.2017.8308186.
Sherstinsky A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. arXiv:1808.03314. 2018 Aug 9. https://arxiv.org/abs/1808.03314
Hochreiter Sepp, Schmidhuber Jürgen. Long Short-term Memory. Neural Comput. 1997; 9(8): 1735–80. 10.1162/neco.1997.9.8.1735.
Terry JK, Jayakumar M, Alwis KD. Statistically significant stopping of neural network training. arXiv:2103.01205. 2021 Mar 1. https://arxiv.org/abs/2103.01205v3.