Image Recognition Using Text and Audio Translation for the Visually Challenged

Et al. Rishita Khurana
DOI: https://doi.org/10.17762/ijritcc.v11i10.8904
2023-11-07
International Journal on Recent and Innovation Trends in Computing and Communication
Abstract:WHO has expressed that out of the general populace on the planet there are 253 million individuals are outwardly impeded around the world. It comes to the standpoint that visually impaired individuals are finding burdensome to curve out their ordinary life. It is vital for take significant measure with the current innovations so they can experience the ongoing scene with next to no troubles. To lift the visually impaired people in the public, this project has been proposed, which can identify images and translates the description of image into text and then produce the audio. This can assist the individual with perusing any text and recognize the image and get the result in vocal structure. Motivated by late work in machine interpretation also, object recognition, a CNN-RNN based attention model is presented in this project. Through the proposed framework, an image is converted into text description first; then, utilizing a basic text-to-speech API, the extracted caption/subtitle is converted into speech which further assists the visually impaired to understand the image or visuals they are looking at. So, the focal part is centered on building the subtitle/text model while the subsequent part, which is changing the text-to-speech, is moderately simple with the text-to-speech API. When the model is fabricated, it is deployed on the local framework utilizing a Flask-based model to produce audio-based caption for any image fed to the model.
What problem does this paper attempt to address?