Downloads: 0 | Views: 106

Informative Article | Data & Knowledge Engineering | India | Volume 11 Issue 6, June 2022 | Rating: 4.8 / 10

Multimodal Document Representation for Image-Text Fusion

Abstract: This survey paper aims to discuss the advancements in the field of multimodal document representation with a specific focus on the fusion of textual and visual information. The overview begins with providing an historical context of multimodal representation techniques, ranging from early hand- crafted feature-based approaches to recent advancements in deep learning. Further the paper explores various strategies used to fuse multimodal information such as concatenation, attention mechanisms, and shared layers. The paper also highlights various applications including image captioning, document retrieval, vi- sual question answering, and multimedia analysis, to demonstrate the broad impact and significance of multimodal representation across diverse domains. Despite the progress made in research and development of advanced techniques, challenges such as data heterogeneity, scalability, and interpretability persist, which open up avenues for future research and development. Finally, the paper offers insights into the current state-of-the-art techniques and identifies opportunities for advancing the field of multimodal document representation.

Keywords: Multimodal Representation, Document Fusion, Image-Text integration, Deep Learning, Information Retrieval, Semantic Understanding

Edition: Volume 11 Issue 6, June 2022,

Pages: 1998 - 2002

How to Download this Article?

Type Your Valid Email Address below to Receive the Article PDF Link

Verification Code will appear in 2 Seconds ... Wait

Multimodal Document Representation for Image-Text Fusion

Similar Articles with Keyword 'Deep Learning'

Effective Communication of Data Science Results to Non - Technical Stakeholders

Electrical Power Quality Classification using Nested Ensemble Learning