Ortiz Suárez, Pedro Javier Dupont, Yoann Muller, Benjamin Romary, Laurent Sagot, Benoît
The French TreeBank developed at the University Paris 7 is the main source of morphosyntactic and syntactic annotations for French. However, it does not include explicit information related to named entities, which are among the most useful information for several natural language processing tasks and applications. Moreover, no large-scale French c...
Heinzinger, Michael Elnaggar, Ahmed Wang, Yu Dallago, Christian Nechaev, Dmitrii Matthes, Florian Rost, Burkhard
Published in
BMC Bioinformatics
BackgroundPredicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary informati...
Magistry, Pierre
This work is part of a broader project which requires adapting information extraction (IE) methods to written materials (mostly press articles) published in China between the mid 19th and the mid 20th centuries. This calls for a better understanding and description of the language(s) we can observe in our sources. More importantly, it is an unprece...
Li, Daoyuan Li, Li Bissyande, Tegawendé François D Assise Klein, Jacques Le Traon, Yves
Time series data are abundant in various domains and are often characterized as large in size and high in dimensionality, leading to storage and processing challenges. Symbolic representation of time series – which transforms numeric time series data into texts – is a promising technique to address these challenges. However, these techniques are es...
Hatmi, Mohamed
La Reconnaissance des entités nommées est une sous-tâche de l’activité d’extraction d’information. Elle consiste à identifier certains objetstextuels tels que les noms de personne, d’organisation et de lieu. Le travail de cette thèse se concentre sur la tâche de reconnaissance des entitésnommées pour la modalité orale. Cette tâche pose un certain n...
Ebadat, Ali-Reza
During the last decade, huge amounts of multimedia documents have been generated. It is therefore important to find a way to manage this data. Every approach to facilitate this process requires to have a deep understanding of the content of the documents. Among two different approaches to get such insights, either by extracting information from the...
Fraga Da Silva, Thiago Le, Viet Bac Lamel, Lori Gauvain, Jean-Luc
The combined use of multi layer perceptron (MLP) and perceptual linear prediction (PLP) features has been reported to improve the performance of automatic speech recognition systems for many different languages and domains. However, MLP features have not yet been used on unsupervised acoustic model training. This approach is introduced in this pape...
Oger, Stanislas
Les trois piliers d’un système de reconnaissance automatique de la parole sont le lexique,le modèle de langage et le modèle acoustique. Le lexique fournit l’ensemble des mots qu’il est possible de transcrire, associés à leur prononciation. Le modèle acoustique donne une indication sur la manière dont sont réalisés les unités acoustiques et le modèl...
Lai, C.L. Xu, K.Q. Lau, Raymond Li, Yuefeng Song, Dawei
Despite many incidents about fake online consumer reviews have been reported, very few studies have been conducted to date to examine the trustworthiness of online consumer reviews. One of the reasons is the lack of an effective computational method to separate the untruthful reviews (i.e., spam) from the legitimate ones (i.e., ham) given the fact ...