Affordable Access

Automatic generation of parallel treebanks: an efficient unsupervised system

Dublin City University. National Centre for Language Technology (NCLT)
Publication Date
  • Machine Translating
  • Computational Linguistics
  • Parallel Treebanks
  • Machine Translation
  • Subtree Alignment
  • Computer Science
  • Linguistics


The need for syntactically annotated data for use in natural language processing has increased dramatically in recent years. This is true especially for parallel treebanks, of which very few exist. The ones that exist are mainly hand-crafted and too small for reliable use in data-oriented applications. In this work I introduce a novel open-source platform for the fast and robust automatic generation of parallel treebanks through sub-tree alignment, using a limited amount of external resources. The intrinsic and extrinsic evaluations that I undertook demonstrate that my system is a feasible alternative to the manual annotation of parallel treebanks. Therefore, I expect the presented platform to help boost research in the field of syntaxaugmented machine translation and lead to advancements in other fields where parallel treebanks can be employed.

There are no comments yet on this publication. Be the first to share your thoughts.