Vision LLMs: Bridging Language and Visual Understanding - Case study of IndoAI
International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 12 | Views: 87 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1

Review Papers | Computer Engineering | India | Volume 14 Issue 4, April 2025 | Popularity: 5.9 / 10


     

Vision LLMs: Bridging Language and Visual Understanding - Case study of IndoAI

Vivek Gujar


Abstract: Vision LLMs are trained on vast datasets containing paired image-text samples, allowing them to perform tasks such as image captioning, visual question answering (VQA) and multimodal reasoning. These Models (Vision LLMs) mark a transformative leap in artificial intelligence by merging visual and linguistic understanding, enabling seamless human-machine communication, power groundbreaking applications-from automated diagnostic reporting in healthcare to real-time scene analysis in autonomous systems. Yet, key challenges remain, including computational inefficiency, embedded biases in training data and limited interpretability which currently restrict broader deployment. Cutting-edge research is tackling these obstacles through optimized model architectures, fairness-aware dataset curation and advanced explainable AI methods. As these advancements progress, Vision LLMs are poised to revolutionize AI-driven solutions across industries such as healthcare, robotics, autonomous vehicles. Their continued evolution is redefining the landscape of interdisciplinary AI, fostering more intuitive, ethical and scalable intelligent systems. This article provides an overview of Vision LLM architectures, their applications and the challenges they face and case study of how building of AI Models through visionLLM may help IndoAI AI camera system.


Keywords: vision LLM, LAION, NLP, LLM, GPT, IndoAI, AI Camera


Edition: Volume 14 Issue 4, April 2025


Pages: 158 - 164


DOI: https://www.doi.org/10.21275/SR25331222304


Please Disable the Pop-Up Blocker of Web Browser

Verification Code will appear in 2 Seconds ... Wait



Text copied to Clipboard!
Vivek Gujar, "Vision LLMs: Bridging Language and Visual Understanding - Case study of IndoAI", International Journal of Science and Research (IJSR), Volume 14 Issue 4, April 2025, pp. 158-164, https://www.ijsr.net/getabstract.php?paperid=SR25331222304, DOI: https://www.doi.org/10.21275/SR25331222304

Top