Downloads: 0 | Views: 47

Research Paper | Computational Linguistics | India | Volume 13 Issue 11, November 2024 | Popularity: 4.3 / 10

Parts of Speech (POS) Tagging in Telugu Corpora Using CRF Algorithm

Rajula Valaraju

Abstract: The study of NLP (Natural Language Processing), a branch of computer science and AI (Artificial Intelligence), enables machines to comprehend human language effectively and assist with linguistic tasks. The initial step in every NLP task is POS (Parts of Speech) tagging, which assigns a tag to a word based on its meaning and context. The present paper discusses parts of speech tagging (POS) in Telugu using Conditional Random Fields (CRF), a sequence modelling algorithm that is particularly effective in identifying entities or text patterns, such as POS tags, in highly inflectional and agglutinative languages like Telugu. Telugu is a highly inflectional and agglutinative language widely spoken in the southern part of India (mainly Andhra Pradesh and Telangana). The Language belongs to the Dravidian Family and, it follows the S - O - V structure. Compared to other machine learning algorithms, CRF has been proven more effective in overcoming label - bias problems in a language. In order to understand the language features and to tag the test corpus, an annotated corpus of 62, 996 words and a tag set of 18 tags is used for the study. The present study has achieved an accuracy of 80.17%.

Keywords: POS tagging, CRF Model, BIS Tag set, Telugu Language

Edition: Volume 13 Issue 11, November 2024

Pages: 188 - 190

DOI: https://www.doi.org/10.21275/SR241102123024

Make Sure to Disable the Pop-Up Blocker of Web Browser

Text copied to Clipboard!

Rajula Valaraju, "Parts of Speech (POS) Tagging in Telugu Corpora Using CRF Algorithm", International Journal of Science and Research (IJSR), Volume 13 Issue 11, November 2024, pp. 188-190, URL: https://www.ijsr.net/getabstract.php?paperid=SR241102123024, DOI: https://www.doi.org/10.21275/SR241102123024

Downloads: 129 | Views: 197

Computational Linguistics, India, Volume 1 Issue 3, December 2012

Pages: 163 - 167

Isolated Spoken Word Identification in Malayalam using Mel-frequency Cepstral Coefficients and K-means clustering

Sreejith C, Reghuraj P C

Downloads: 67 | Views: 196

Computational Linguistics, India, Volume 9 Issue 10, October 2020

Pages: 1664 - 1669

Aspect Based Sentiment Analysis for Users Review Dataset Using Deep Learning and BERT

Karan Arora, Sarthak Arora

Downloads: 45 | Views: 150

Computational Linguistics, India, Volume 10 Issue 3, March 2021

Pages: 185 - 188

Comparison of Various Models in the Context of Language Identification (Indo Aryan Languages)

Salman Alam

Downloads: 2 | Views: 53

Computational Linguistics, India, Volume 13 Issue 11, November 2024

Pages: 367 - 371

A Comprehensive Review of Sentiment Analysis: From Rule-Based Methods to Deep Learning and Future Directions

N John Kuotsu