This original search engine for the inuktitut language allows for the retrieval of inuktitut text from inuktitut web pages, whatever the font and the character set used to display the syllabic characters in those pages. Moreover, for the user's convenience, the text to be searched for can be input in the most common syllabic fonts Nunacom, Prosyl and AiPaiNunavik, in Unicode syllabics, and also in the roman alphabet. Wildcards are also allowed, as well as the boolean operators AND, OR and NOT.
For more information about how to use the search engine, please visit this page.
To access the NRC Inuktitut Search Engine, please go to this address.
Each so-called 'legacy' font has its own code-to-glyph association table, and although there may be similitudes between some fonts, there are also big differences. A number of inuktitut words do have the same code sequence in different fonts, but most words have different code sequences. For example, a word like is w6]vNw]/E/z5 in Nunacom, w6>vNw>/E/z5 in Prosyl, w6√Nw÷E/z5 in Naamajut, Žñ›¶ŽÎäÍö” in Aujaq2, w6Ïâ÷E/z5 in AiPaiNunavik, ... In order to search for that word with the existing search engines, one would have to search for all those code sequences and more - given that one knows of them in the first place.
MLWJ8_.0MK^8.6,S8''W6=8M*I22580 ... R!\>="4'+=8O5,!I5[(H\ZC"]^I1KH8*/G M&N+E]Z5[?,6XEE4< ...
For the same word, the NRC Inuktitut Search Engine returns 1 hit !!!
The current search engines were developed bearing in mind languages (primarely English), that use an alphabet with upper and lower cases where the case does not convey any meaning at the lexeme level. For example, 'sky', 'SKY', 'Sky' are all the same word.The search is done with no regards to the case. Searching for any of those forms will result in pages containing any of those forms, independently of the case. So, an input like the Nunacom word which has the indexable code sequence wo8ix3F4 will return pages that contain not only that sequence, but also WO8IX3F4, Wo8iX3f4, and all the code sequences with the 'letter' codes in the lower case and in the upper case, that is, 32 different sequences. The problem is that although those sequences may correspond to the same word in languages with cases, they do not in Inuktitut with legacy fonts. In our example, WO8IX3F4 in Nunacom represents the Inuktitut word , and Wo8iX3f4, the word , that have nothing to do with the input query; they are not even Inuktitut words.