Approaches of POS Tagging Algorithm for Bangla Corpus

Maksuda Sultana and Francis G. Balazon
Keywords: Tag-set, Ambiguity, Trigram, HMM, NLP, Token, Corpus, Bangla Language.

Sultana, M. and Balazon, F.G., 2020. Approaches of POS Tagging Algorithm for Bangla Corpus. United International Journal for Research & Technology (UIJRT), 1(7), pp.29-35.


Parts of speech is the process of classifying words into their parts of speech and labeling them accordingly in lexical categories and by using this POS tagging it is very easy to identify the words as nouns, verbs, adjectives etc. in each word in a natural Language sentence. For building lemmatizers which we are used to reduce a word to its root form in natural processing language, the POS tagging is essential part. The text analysis, machine translator, information retrieval and text to speech synthesis etc. POS tagging is initial stage in NLP application. Now a days to implement POS tagger various approaches have been proposed. In this paper Trigram and HMM methods are using to develop the tagger in general statistical approach and present a clear idea about this algorithm and also represent tag set with Indian corpus for tagging Bangla text for trying to find the accuracy of taggers output. This paper also presents the various development in POS taggers and POS-tag-set for Bangla language, which is very important computational verbal tool needed for natural language processing (NLP) presentation.


