UIJRT » United International Journal for Research & Technology

Approaches of POS Tagging Algorithm for Bangla Corpus

Total Views / Downloads: 116 

Cite ➜

Sultana, M. and Balazon, F.G., 2020. Approaches of POS Tagging Algorithm for Bangla Corpus. United International Journal for Research & Technology (UIJRT), 1(7), pp.29-35.


Parts of speech is the process of classifying words into their parts of speech and labeling them accordingly in lexical categories and by using this POS tagging it is very easy to identify the words as nouns, verbs, adjectives etc. in each word in a natural Language sentence. For building lemmatizers which we are used to reduce a word to its root form in natural processing language, the POS tagging is essential part. The text analysis, machine translator, information retrieval and text to speech synthesis etc. POS tagging is initial stage in NLP application. Now a days to implement POS tagger various approaches have been proposed. In this paper Trigram and HMM methods are using to develop the tagger in general statistical approach and present a clear idea about this algorithm and also represent tag set with Indian corpus for tagging Bangla text for trying to find the accuracy of taggers output. This paper also presents the various development in POS taggers and POS-tag-set for Bangla language, which is very important computational verbal tool needed for natural language processing (NLP) presentation.

Keywords: Tag-set, Ambiguity, Trigram, HMM, NLP, Token, Corpus, Bangla Language.


  1. Antony P J, Amrita, Dr. K P Soman, “Parts Of Speech Tagging for Indian Languages: A Literature Survey”, IJCA (0975-8887) Volume 34-no. 8, November 2011. IJCATM
  2. Dinesh Kumar and Gurpreet Singh Josan, (2010) “Part of Speech Tagger for Morphologically rich Indian Language: A survey”. International Journal of Computer Application. Vol. 6(5). https://nlp.stanford.edu/software/tagger.shtml
  3. Brants, Thorsten (2000) “TnT- A Statistical Part-of-Speech Tagger”. In the Proceedings of the Sixth Applied Natural Language Processing Conference (ANLP-2000), Seattle, WA, USA,
  4. Brill, Eric (1992) “A simple rule-based part of speech tagger”. In the Proceedings of the Workshop on Speech and Natural Language (HLT-91), Morristown, NJ, USA:Association for Computational Linguistics. Pp. 112-116.
  5. Dhanalakshmi V, Anand Kumar1, Shivapratap G, Soman KP and Rajendran S, “Tamil POS Tagging using Linear Programming”, International Journal of Recent Trends in Engineering, Vol. 1, No. 2, May 2009.
  6. Gurleen Kaur Sidhu, Navjot Kaur, “Role of Machine Translation and Word Sense Disambiguation in Natural Language Processing”, IOSR Journal of Computer Engineering (IOSR-JCE), May. – Jun. 2013.
  7. Akshar Bharathi and Prashanth R. Mannem (2007), “Introduction to the Shallow Parsing Contest for South Asian Languages”, Language Technologies Research Center, International Institute of Information Technology, Hyderabad, India 500032.
  8. Dinesh Kumar and Gurpreet Singh Josan,(2010), “Part of Speech Taggers for Morphologically Rich Indian Languages: A Survey”, International Journal of Computer Applications (0975 – 8887) Volume6–No.5, September, 2010, www.ijcaonline.org/ volume6/number5 /pxc3871409 .pdf..
  9. Debasri Chakrabarti (2011), “Layered Parts of Speech Tagging for Bangla”, Language in India www.languageinindia.c o m, M a y 2 0 1 1, Special Volume:Problems of Parsing in Indian Languages.
  10. https://stlong0521.github.io/20160319%20-%20HMM%20and%20POS.html
  11. Nisheeth Joshi, Hemant Darbari, Iti Mathure, (2013) “HMM based Pos Tagger for Hindi”. In Processing of 2013 International Conference on Artificial Intelligence and Soft Computing.
  12. Akshar Bharti, Dipti Misra Sharma, Lakshmi bai, Rajeev Sangal. AnnCorra: Annotating Corpora Guidelines for POS and Chunk with Annotation For Indian Languages , Language Technologies Research Centre IIT, Hyderabad.
  13. https://medium.com/analytics-vidhya/bengali-pos-part-of-speech-tagging-using-indian-corpus-e85f47d3ad65
  14. Antony P J, Amrita, Dr. K P Soman, “Parts Of Speech Tagging for Indian Languages: A Literature Survey”, IJCA (0975-8887) Volume 34-no. 8, November 2018.
  15. Bhasa Bijnan o Prayukti: An International Journal on Linguistics and Language Technology Vol. 1, No. 1, Jan-Jun 2017, Pp. 53-96
  16. Sag, Ivan A., Timothy Baldwin, Francis Bond, Aann Copestake and Dan Flickinger (2001) “Multiword Expressions: A Pain in the Neck for NLP”. In, Gelbukh, Alexander (Ed.) Proceedings of CICLING2002. Verlag: Springer. Pp. 35-41.
  17. https://www.ijariit.com/manuscripts/v2i3/ V213-1157.pdf
  18. https://www.scribd.com/document/1401156 76/PART-OF-SPEECH-TAGGING-OF- MARARHITEXT-USING-TRIGRA METHOD

For Conference & Paper Publication​

UIJRT Publication - International Journal