International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064




Downloads: 2 | Views: 95 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Informative Article | Science and Technology | India | Volume 10 Issue 6, June 2021 | Rating: 4.8 / 10


Multi-Modal Fusion for Enhanced Image and Speech Recognition in AI Systems

Ankur Tak [4]


Abstract: This research investigates the integration of multi-modal information, specifically images and speech, to enhance the recognition capabilities of artificial intelligence (AI) systems. Adopting an interpretive philosophy and employing a deductive approach, the study explores the potential of dynamic attention mechanisms, semi-supervised learning, and cross-domain adaptation techniques. A descriptive research design is employed, utilizing secondary data collection from reputable academic sources. The research critically evaluates the feasibility and applicability of hardware optimization for efficient multi-modal processing, considering factors like specialized processors and parallel computing. The study presents a thorough analysis of dynamic attention mechanisms, emphasizing their role in dynamically allocating attention across different modalities based on contextual relevance. Additionally, it delves into semi-supervised learning techniques, showcasing their ability to leverage both labeled and unlabeled data for improved recognition performance. Cross-domain adaptation techniques are explored to facilitate the seamless deployment of multi-modal fusion models in diverse real-world scenarios.


Keywords: AI systems, knowledge, connecting, integrating, multi-modal classification, aural, visual information


Edition: Volume 10 Issue 6, June 2021,


Pages: 1780 - 1788


How to Download this Article?

Type Your Valid Email Address below to Receive the Article PDF Link


Verification Code will appear in 2 Seconds ... Wait

Top