A Survey on Duplicate Detection in Hierarchical Data
International Journal of Science and Research (IJSR)

International Journal of Science and Research (IJSR)
Call for Papers | Fully Refereed | Open Access | Double Blind Peer Reviewed

ISSN: 2319-7064


Downloads: 111 | Views: 347

Survey Paper | Computer Science & Engineering | India | Volume 3 Issue 12, December 2014 | Popularity: 6.7 / 10


     

A Survey on Duplicate Detection in Hierarchical Data

Nikhil Gawande, S. R. Todamal


Abstract: Although there has been a lot work done on identifying duplicates in relational data, but only a few solutions focus on identifying duplicates in more complex hierarchical structures, like XML data. In this paper, we have demonstrated the novel method for XML duplicate detection, called XMLDup. XMLDup method implements the Bayesian network to calculate and determine the probability of two XML nodes, considering not only the information within the XML nodes, but also the way that the information is structured. In addition, to increase the efficiency of the network evaluation, a novel pruning strategy, capable of significant gains over the unoptimized version of the algorithm, is presented. Through experiments and comparisons, we show that our algorithm is able to achieve high precision and recall scores in several datasets. XMLDup method helps us to improve both efficiency and of effectiveness.


Keywords: duplicate detection, record linkage, entity resolution, XML, Bayesian networks, data cleaning, optimization


Edition: Volume 3 Issue 12, December 2014


Pages: 751 - 754



Please Disable the Pop-Up Blocker of Web Browser

Verification Code will appear in 2 Seconds ... Wait



Text copied to Clipboard!
Nikhil Gawande, S. R. Todamal, "A Survey on Duplicate Detection in Hierarchical Data", International Journal of Science and Research (IJSR), Volume 3 Issue 12, December 2014, pp. 751-754, https://www.ijsr.net/getabstract.php?paperid=SUB14438, DOI: https://www.doi.org/10.21275/SUB14438

Top