The NRC’s Inuktitut Morphological Analyser is a Java program developed in the Interactive Information Group of the Institute for Information Technology (IIT) of the National Research Council of Canada (NRC), which decomposes an Inuktitut word into its parts (also called morphemes), that is, its root, its internal suffixes, and its grammatical ending. The lexical information about the morphemes consists of over 2000 roots, several hundred lexicalized words (sort of fixed complex stems combining a root and one or two suffixes), over 330 suffixes, over 300 noun endings and 1200 verb endings placed in a linguistic data base that we have created. Most of this lexical information comes from the works by Ken Harper, Alex Spalding, Lucien Schneider, Mick Mallon, and Louis-Jacques Dorais. Please refer to this bibliography page for a detailed list of our linguistic references. A number of phonological rules dealing with double consonants (kt > tt, for example) in different dialects have been incorporated. The dialects recognized by the morphological analyser are basically those of Aivilik, Kivalliq, North and South Baffin, and Arctic Quebec. Schneider’s law has been incorporated into the morphological analyser in order to analyse correctly words from Nunavik (Arctic Quebec) and Nunatsiavut (Labrador).
This first version of the morphological analyser can successfully decompose over 95% of the most frequent words found in the Nunavut’s Hansard and has about the same rate of success with words in Inuktitut Web pages. We are actively working at increasing the coverage of the analyser with the addition of missing roots and suffixes, morpho-phonological behaviours of certain suffixes, and phonological rules.
The decompositions returned by the analyser for any given word typically include the right decomposition at or near the top of the list, and a number of other decompositions which would not normally be considered. This is due to several factors, like lexical ambiguity between different roots or suffixes or endings with the same form and behaviour, or missing constraints on what can or must follow or precede a given root, suffix or ending. Right now, such constraints are not yet fully implemented. We are planning to implement them soon, but the priority is to increase the success rate of the analyser.
Go to the Web application of the Inuktitut Morphological Analyser
Download a Power Point presentation about the Inuktitut Morphological Analyser given in Iqaluit in February 2005.
New! The Inuktitut Morphological Analyser is now available in compiled code version.
Inuktitut Word Definition is an application of the Inuktitut Morphological Analyser that will get you the decomposition of an Inuktitut word that you have selected in a Web page. This application is accessible through a LINK that has to be placed on the LINK BAR of your browser.
To add the ‘Inuktitut Word Definition‘ link onto your browser, click here and follow the very easy installation steps.
For you to get an idea of what this application can do, we have created a demo page with a selection of Inuktitut words that you can click directly to get their decompositions without having to install the link onto your browser.
Such that the Inuktitut syllabic characters can be displayed correctly, you will need an Inuktitut syllabic Unicode font. If you do not have one already installed on your computer, you can get one at Inuktitut fonts.Disclaimer:
|This application uses a Java HTML parser to determine whether the selected word is displayed with a 7-bit font like Nunacom or Prosyl. Unfortunately, this parser is not very forgiving, and for that reason, when the page containing the selected word contains unorthodox HTML coding, it will not be able to determine the font properly. This will result in the impossibility of decomposing the selected word.|