Affordable Access

deepdyve-link
Publisher Website

Handling the Deviation from Isometry Between Domains and Languages in Word Embeddings: Applications to Biomedical Text Translation

Authors
  • Gaschi, Félix
  • Rastin, Parisa
  • Toussaint, Yannick
Publication Date
Dec 07, 2021
Identifiers
DOI: 10.1007/978-3-030-92270-2_19
OAI: oai:HAL:hal-03477901v2
Source
HAL
Keywords
Language
English
License
Unknown
External links

Abstract

Previous literature has shown that it is possible to align word embeddings from different languages with unsupervised methods based on a distance-preserving mapping, with the assumption that the embeddings are isometric. However, these methods seem to work only when both embeddings are trained on the same domain. Nonetheless, we hypothesize that the deviation from isometry might be reduced between relevant subsets of embeddings from different domains, which would allow to partially align them. To support our hypothesis, we leverage the Bottleneck distance, a topological data analysis tool used to approximate the deviation from isometry. We also propose a cross-domain and crosslingual unsupervised alignment method based on a proxy embedding, as a first step towards new cross-lingual alignment methods that generalize to different domains. Results of such a method on translation tasks show that unsupervised alignment methods are not doomed to fail in a crossdomain setting. We obtain BLEU-1 scores ranging from 0.38 to 0.50 on translation tasks, where previous fully unsupervised alignment methods obtain near-zero scores in cross-domain settings.

Report this publication

Statistics

Seen <100 times