Affordable Access

Publisher Website

Domain-specific language models and lexicons for tagging

Authors
Journal
Journal of Biomedical Informatics
1532-0464
Publisher
Elsevier
Publication Date
Volume
38
Issue
6
Identifiers
DOI: 10.1016/j.jbi.2005.02.009
Keywords
  • Clinical Report Analysis
  • Part-Of-Speech Tagging Accuracy
  • Domain Adaptation
  • Clinical Information Systems
  • Biomedical Domain
  • Corpus Linguistics
  • Statistical Part-Of-Speech Tagging
  • Hidden Markov Model
Disciplines
  • Medicine

Abstract

Abstract Accurate and reliable part-of-speech tagging is useful for many Natural Language Processing (NLP) tasks that form the foundation of NLP-based approaches to information retrieval and data mining. In general, large annotated corpora are necessary to achieve desired part-of-speech tagger accuracy. We show that a large annotated general-English corpus is not sufficient for building a part-of-speech tagger model adequate for tagging documents from the medical domain. However, adding a quite small domain-specific corpus to a large general-English one boosts performance to over 92% accuracy from 87% in our studies. We also suggest a number of characteristics to quantify the similarities between a training corpus and the test data. These results give guidance for creating an appropriate corpus for building a part-of-speech tagger model that gives satisfactory accuracy results on a new domain at a relatively small cost.

There are no comments yet on this publication. Be the first to share your thoughts.

Statistics

Seen <100 times
0 Comments

More articles like this

Domain-specific language models and lexicons for t...

on Journal of Biomedical Informat... December 2005

A domain-specific language for models of landscape...

on Ecological Modelling Jan 01, 2001

Conceptual language models for domain-specific ret...

on Information Processing & Manag... Jan 01, 2010
More articles like this..