Downloads: 1 | Views: 204 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1
Informative Article | Data & Knowledge Engineering | India | Volume 11 Issue 6, June 2022 | Popularity: 5.1 / 10
Multimodal Document Representation for Image-Text Fusion
Akshata Upadhye
Abstract: This survey paper aims to discuss the advancements in the field of multimodal document representation with a specific focus on the fusion of textual and visual information. The overview begins with providing an historical context of multimodal representation techniques, ranging from early hand- crafted feature-based approaches to recent advancements in deep learning. Further the paper explores various strategies used to fuse multimodal information such as concatenation, attention mechanisms, and shared layers. The paper also highlights various applications including image captioning, document retrieval, vi- sual question answering, and multimedia analysis, to demonstrate the broad impact and significance of multimodal representation across diverse domains. Despite the progress made in research and development of advanced techniques, challenges such as data heterogeneity, scalability, and interpretability persist, which open up avenues for future research and development. Finally, the paper offers insights into the current state-of-the-art techniques and identifies opportunities for advancing the field of multimodal document representation.
Keywords: Multimodal Representation, Document Fusion, Image-Text integration, Deep Learning, Information Retrieval, Semantic Understanding
Edition: Volume 11 Issue 6, June 2022
Pages: 1998 - 2002
DOI: https://www.doi.org/10.21275/SR24430153718
Make Sure to Disable the Pop-Up Blocker of Web Browser