Burmese is a low-resource language that is spoken by 50 million people. Among the peculiarities of the language are the nonstandard romanization, the limited use of English/Western loan words, and the preferential use of spacing and punctuation. Moreover, the lack of high-quality linguistic data has been a challenge for natural language processing (NLP) and machine translation (MT) research, while the recent switch from the traditional Zawgyi font to Unicode has not been completed, yet it is in practice. In these turbulent times for Myanmar, could a hybrid approach (NLP + human-in-the-loop) address the issues emerging when translating user-generated content, which includes fake news and hate speech?
All this was explored today in our joint presentation with TWB’s Ei Ei Saing, at the LocWorldWide 44 conference.