Affordable Access

On Clustering and Evaluation of Narrow Domain Short-Text Corpora

Authors
Publisher
Sociedad Española para el Procesamiento del Lenguaje Natural
Publication Date

Abstract

TesisPintoSEPLN.dvi On Clustering and Evaluation of Narrow Domain Short-Text Corpora∗ Agrupamiento y Evaluacio´n de Corpora de Textos Cortos y de Dominios Restringidos David Eduardo Pinto Avendan˜o Natural Language Engineering Lab., DSIC Universidad Polite´cnica de Valencia Facultad de Ciencias de la Computacio´n, BUAP [email protected] Resumen: Tesis doctoral en Informa´tica realizada por David Eduardo Pinto Aven- dan˜o y dirigida por los doctores Paolo Rosso (Univ. Polite´cnica de Valencia) y He´ctor Jime´nez (Univ. Auto´noma Metropolitana, Me´xico). El acto de defensa de tesis tuvo lugar en Valencia en Julio de 2008 ante el tribunal formado por los doctores Manuel Palomar Sanz (Univ. de Alicante), Alfonso Uren˜a Lo´pez (Univ. de Jae´n), Eneko Agirre (Univ. del Pa´ıs Vasco), Benno Stein (Univ. de Weimar, Alemania) y Encarna Segarra Soriano (Univ. Polite´cnica de Valencia). La calificacio´n obtenida fue Sobre- saliente Cum Laude. Palabras clave: Agrupamiento, Evaluacio´n, Textos cortos, Dominios restringidos Abstract: PhD thesis in Computer Science written by David Eduardo Pinto Aven- dan˜o under the supervision of Paolo Rosso (Univ. Polite´cnica de Valencia) and He´ctor Jime´nez (Univ. Auto´noma Metropolitana, Me´xico). The author was exa- mined in July 2008 in Valencia by the following committee: Manuel Palomar Sanz (Univ. de Alicante), Alfonso Uren˜a Lo´pez (Univ. de Jae´n), Eneko Agirre (Univ. del Pa´ıs Vasco), Benno Stein (Weimar Univ., Germany) and Encarna Segarra Soriano (Univ. Polite´cnica de Valencia). The grade obtained was Sobresaliente Cum Laude. Keywords: Clustering, Evaluation, Narrow Domain Short-text corpora 1. Introduction In this Ph.D. thesis we investigate the pro- blem of clustering a particular set of docu- ments namely narrow domain short texts. To achieve this goal, we have analysed da- tasets and clustering methods. Moreover, we have introduced some corpus evaluation mea- sures, term selection techniques and clusteri- ng validity measures in order to

There are no comments yet on this publication. Be the first to share your thoughts.