Tigrinya Spelling Checker


LanguageTool is an open-source spelling and grammar checker. Very much similar to Grammarly. The tool already supports many languages but not yet Tigrinya. This page is dedicated to Tigrinya support in LanguageTool.

LanguageTool has a plug in for chrome, Microsoft Word, websites and many more. Integrating Tigrinya to LangaugeTool will allow us to use these plug-ins without modification.

LanguageTool in Word
Tigrinya LanguageTool in Microsoft Word

Alternatively, the spell check can also work on a website like below. Or through a Chrome plugin

LanguageTool in Browser
Tigrinya LanguageTool in browser

LanguageTool has two parts. 1) Spell checker based on OpenOffice spell checker called Hunspell. 2) Grammar checker based on grammatical rules. For example, replacing አስምዐ with ኣስምዐ is a spelling check correction. However replacing ቤት ክርስትያን to ቤተ ክርስትያን is a grammatical correction, because it not a one word rule, plus both ቤት and ክርስትያን are correct words by themselves. It only becomes incorrect when they are used one after the other.

1. Spelling Checker

The Hunspell dictionary is based on the words crawled from internet by Biniam and Fitsum The dictionary has over half a million unique words that appeared on Internet more than once on different document. This list does have errors, but the incorrect words have low frequency, which means if they are suggested, they will be ranked low on the suggestion list.

Language Tool also supports POS taggers and word2vec embedding which can farther improve the quality of the spelling checker.

2. Grammar Checker

Grammar rules can be specified in an xml format file named ‘grammar.xml’. For example the above rule for ‘ቤት ክርስትያን’ can be specified as following

            <rule>
                <pattern>
                    <marker>
                        <token regexp="yes">ቤት|በት</token>
                    </marker>
                    <token regexp="yes">መቕደስ|ክርስትያን</token>
                </pattern>
                <message>Did you mean <suggestion>ቤተ</suggestion>?</message>
            </rule>

In a simple word this means ‘if ቤት or በት is followed by መቕደስ or ክርስትያን suggest to change the first word to ቤተ’. LanguageTool rule creation is defined here. For example for English there are about 500 rules defined this way. We are looking for help in adding similar rules for Tigrinya. Let us know if you help.

Source Code

The LanguageTool github repository is forked in the TigrinyaNLP repo and all the required Tigrinya materials are committed here. After we put enough rules and styles, we will request a pull to merge it to the main repository.

Things than needs explanation TBD

  1. separate language-detector.jar is built to support Tigrinya language detection
  2. latinScript=false
  3. ignoreUpperCase=false
  4. fsa.type = NONE