Affordable Access

DSCo: A Language Modeling Approach for Time Series Classification

Authors
  • Li, Daoyuan
  • Li, Li
  • Bissyande, Tegawendé François D Assise
  • Klein, Jacques
  • Le Traon, Yves
Publication Date
Jul 01, 2016
Source
ORBilu
Keywords
Language
English
License
Green
External links

Abstract

Time series data are abundant in various domains and are often characterized as large in size and high in dimensionality, leading to storage and processing challenges. Symbolic representation of time series – which transforms numeric time series data into texts – is a promising technique to address these challenges. However, these techniques are essentially lossy compression functions and information are partially lost during transformation. To that end, we bring up a novel approach named Domain Series Corpus (DSCo), which builds per-class language models from the symbolized texts. To classify unlabeled samples, we compute the fitness of each symbolized sample against all per-class models and choose the class represented by the model with the best fitness score. Our work innovatively takes advantage of mature techniques from both time series mining and NLP communities. Through extensive experiments on an open dataset archive, we demonstrate that it performs similarly to approaches working with original uncompressed numeric data.

Report this publication

Statistics

Seen <100 times