Downloads: 45 | Views: 152 | Monthly Hits: ⮙1
Research Paper | Computational Linguistics | India | Volume 10 Issue 3, March 2021 | Popularity: 6.1 / 10
Comparison of Various Models in the Context of Language Identification (Indo Aryan Languages)
Salman Alam
Abstract: Automatic language detection is a text classification task in which language is identified in a given multilingual text by the machine. This paper compares the different models of machine learning algorithm in the context of language identification. The corpus includes five major Indo-Aryan Language which are closely related to each other like Hindi, Bhojpuri, Awadhi, Maghahi and Braj. In this paper I have compared models like Random forest classifier, SVC, SGD Classifier, Multi-nominal logistic Regression, Gaussian Naïve Bayes and Bernoulli Naïve Bayes. Out of these models Multi-nominal Naïve Bayes has attained the best accuracy of 74 %.
Keywords: Hindi, Magahi, Bhojpuri, Braj, Awadhi, SVC, Multinominal NB, RNN, Linear SVC, SGD Classifier, Indo-Aryan
Edition: Volume 10 Issue 3, March 2021
Pages: 185 - 188
Make Sure to Disable the Pop-Up Blocker of Web Browser