Affordable Access

A corpus-based survey of four electronic swahili–english bilingual dictionaries

Bureau of the WAT
Publication Date
  • Computer Science
  • Linguistics


In this article we survey four different electronic bilingual dictionaries for the lan-guage pair Swahili–English. Aided by a data-driven morphological analyzer and part-of-speech tagger, we quantify the coverage of the dictionaries on large monolingual corpora of Swahili. In a second series of experiments, we investigate how applicable the dictionaries are as a tool in the development of a machine translation system, by evaluating bilingual coverage on the parallel SAWA corpus. At the same time we attempt to consolidate the dictionaries into a unified lexico-graphic database and compare the coverage to that of its composite parts. Keywords: lexicography, evaluation, morphology, lemmatization, parallel corpora, machine learning, machine translation, swahili (kiswahili), english

There are no comments yet on this publication. Be the first to share your thoughts.