Affordable Access

Publisher Website

Improving search over Electronic Health Records using UMLS-based query expansion through random walks

Journal of Biomedical Informatics
DOI: 10.1016/j.jbi.2014.04.013
  • Information Storage And Retrieval
  • Algorithms
  • Data Mining
  • Semantics
  • Natural Language Processing


Objective Most of the information in Electronic Health Records (EHRs) is represented in free textual form. Practitioners searching EHRs need to phrase their queries carefully, as the record might use synonyms or other related words. In this paper we show that an automatic query expansion method based on the Unified Medicine Language System (UMLS) Metathesaurus improves the results of a robust baseline when searching EHRs. Materials and methods The method uses a graph representation of the lexical units, concepts and relations in the UMLS Metathesaurus. It is based on random walks over the graph, which start on the query terms. Random walks are a well-studied discipline in both Web and Knowledge Base datasets. Results Our experiments over the TREC Medical Record track show improvements in both the 2011 and 2012 datasets over a strong baseline. Discussion Our analysis shows that the success of our method is due to the automatic expansion of the query with extra terms, even when they are not directly related in the UMLS Metathesaurus. The terms added in the expansion go beyond simple synonyms, and also add other kinds of topically related terms. Conclusions Expansion of queries using related terms in the UMLS Metathesaurus beyond synonymy is an effective way to overcome the gap between query and document vocabularies when searching for patient cohorts.

There are no comments yet on this publication. Be the first to share your thoughts.


Seen <100 times