Affordable Access

Learning and extending lexical ontologies by using machine learning methods: Workshop at ICML 2005, Bonn, 7th-11th August 2005

Authors
Publisher
ICML, Bonn
Publication Date
Disciplines
  • Computer Science
  • Economics
  • Engineering
  • Linguistics
  • Mathematics

Abstract

This ICML-2005 workshop on ontology learning stands in the tradition of the ECML/PKDD 2004 workshop on Knowledge Discovery and Ontologies and precedes the ECML/PKDD 2005 workshop on the same topic. Unlike these workshops, which focus on the application of different ontology types in various domains, this workshop focuses on the acquisition of ontologies. Ontologies and taxonomies are widely used in the field of knowledge engineering, where the domain knowledge is more and more organized in formal, explicit specifications of shared conceptualizations in a specific domain. The workshop is concerned with learning and extending lexical ontologies by means of Machine Learning methods. Lexical-semantic ontology learning is an emerging field that combines machine learning techniques with the most recent advances in natural language and ontology processing. After tremendous effort in the last decade to build and maintain lexical-semantic ontologies such as WordNet and others, there has been a considerable amount of research how the use of these resources improves the performance of a variety of Natural Language Processing tasks, such as Information Retrieval, Word Sense Disambiguation, Document Clustering, Document Summarization and others. It is widely recognized that lexical ontologies will significantly improve the processing of text collections, be it for Clustering, Classification and Information Extraction tasks. It has turned out that simple hierarchies are not sufficient to capture the semantic relations of words. Semantic fields, for instance may be used to represent semantic connections among groups of lexemes drawn from a single domain, while semantic selectional restrictions allow lexemes to post constraints on the semantic properties of constituents that accompany them in sentences. On the other hand multiple data sources might be included: linguistically or statistically pre-processed documents, combinations of multiple and cross-lingual corpora or the web as well as existing dictionaries and ontologies. Machine learning now provides new techniques to learn rich structural properties. The aim of the workshop was to bring together researchers from machine learning as well as NLP and related fields. Progress in this area may foster an understanding of the semantics of language and may have considerable economical impact. For the workshop, 16 papers were submitted, of which 6 have been accepted as full papers and 4 as short papers, based on the comments of at least three reviewers per paper. Full papers were presented orally in a 30 minutes time slot, short papers were introduced in 10 minutes each and subject to discussion in a poster session. The topics cover a variety of aspects in ontology learning. Preliminaries to an algebraic approach to axiomize corpus-based lexical associations is presented by A. Mehler. Other papers on using unstructured text corpora are presented by G. Heinrich et al., who use raw text as well to compare term clusters induced by two different latent concept models for the use of document clustering, and by H.-F. Witschel, who uses an existing lexical net as a decision tree to insert new terms at appropriate positions on basis of term co-occurrences. The effects of iterating the process of cooccurrence calculation for candidate extraction of semantic relations is discussed in M. Mahn and C. Biemann. Linguistic patterns are applied by P. Cimiano and S. Staab to syntactically preprocessed, unstructured text for candidate terms and relations extraction, which are arranged by a hierarchical clustering algorithm. J. Nemvara uses web directories to extract verbs that typically occur within different product categories. The Web is also involved in the method proposed by D. Sánchez and A. Moreno, who rrange URLs and keywords into meaningful clusters in order to construct an OWL ontology and to group web pages by topics at the same time. The internal structure of documents is exploited by P. Makagonov et al. to learn a domain ontology for scientific papers by clustering words into concepts and concepts into topic clusters. I. P. Klapaftis and S. Manandhar address the problem of word ambiguity and present a method to perform word sense disambiguation using Google and WordNet. The extraction of semantic relations in multimedial environments of an architecture domain is presented by E. Andaroodi et al. The workshop also featured an excellent keynote by Soumen Chakrabarti.

There are no comments yet on this publication. Be the first to share your thoughts.