Downloads: 111 | Views: 308
Survey Paper | Computer Science & Engineering | India | Volume 3 Issue 12, December 2014 | Popularity: 6.7 / 10
A Survey on Duplicate Detection in Hierarchical Data
Nikhil Gawande, S. R. Todamal
Abstract: Although there has been a lot work done on identifying duplicates in relational data, but only a few solutions focus on identifying duplicates in more complex hierarchical structures, like XML data. In this paper, we have demonstrated the novel method for XML duplicate detection, called XMLDup. XMLDup method implements the Bayesian network to calculate and determine the probability of two XML nodes, considering not only the information within the XML nodes, but also the way that the information is structured. In addition, to increase the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the unoptimized version of the algorithm, is presented. Through experiments and comparisons, we show that our algorithm is able to achieve high precision and recall scores in several datasets. XMLDup method helps us to improve both efficiency and of effectiveness.
Keywords: duplicate detection, record linkage, entity resolution, XML, Bayesian networks, data cleaning, optimization
Edition: Volume 3 Issue 12, December 2014
Pages: 751 - 754
Make Sure to Disable the Pop-Up Blocker of Web Browser
Similar Articles
Downloads: 4 | Weekly Hits: ⮙1 | Monthly Hits: ⮙4
Research Paper, Computer Science & Engineering, United States of America, Volume 13 Issue 10, October 2024
Pages: 2042 - 2049Intelligent Sentiment Prediction in Social Networks leveraging Big Data Analytics with Deep Learning
Maria Anurag Reddy Basani
Downloads: 95 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1
Informative Article, Computer Science & Engineering, India, Volume 9 Issue 12, December 2020
Pages: 85 - 88CBCD Methods in Video Copy Detection
Jan Mary Thomas
Downloads: 103
Research Paper, Computer Science & Engineering, India, Volume 4 Issue 6, June 2015
Pages: 2676 - 2680Effective and Efficient XML Duplicate Detection Using Levenshtein Distance Algorithm
Shital Gaikwad, Nagaraju Bogiri
Downloads: 106 | Weekly Hits: ⮙1 | Monthly Hits: ⮙1
Survey Paper, Computer Science & Engineering, India, Volume 3 Issue 11, November 2014
Pages: 1850 - 1856A Review on Detection of Outliers Over High Dimensional Streaming Data Using Cluster Based Hybrid Approach
Abhishek B. Mankar, Namrata Ghuse
Downloads: 107
M.Tech / M.E / PhD Thesis, Computer Science & Engineering, India, Volume 4 Issue 3, March 2015
Pages: 2296 - 2300Clustering Tree based Implementation of Record Linkage on Many-to-Many Relation
V. Balvannanathan, R. Siva