Affordable Access

Publisher Website

Frequency-based similarity for parameterized sequences: Formal framework, algorithms, and applications

Authors
Journal
Information Sciences
0020-0255
Publisher
Elsevier
Publication Date
Volume
237
Identifiers
DOI: 10.1016/j.ins.2013.03.016
Keywords
  • Sequence Similarity
  • Alignment Problem
  • Mining Technique
Disciplines
  • Computer Science
  • Mathematics

Abstract

Abstract Computing sequence similarity is a problem emerging in several areas of research. Current solution algorithms are often based on alignment methods under the assumption that matching symbols, or at least a substitution schema among them, are known in advance. This is natural for sequences defined over the same alphabet of symbols. However, for sequences defined over different alphabets and in absence of an appropriate background knowledge, sequence similarity can be conveniently reconsidered from a different perspective where determining the best substitution schema is also part of the computation problem. The basic idea is that any symbol of a sequence can be correlated with many symbols of another, provided each correlation frequently occurs over the various positions of the alignment. This novel setting is formalized and relevant application domains fitting its peculiarities are illustrated. Moreover, the computational complexity of the alignment problems arising therein is analyzed, and practical solution approaches are proposed and validated over synthetic and real datasets.

There are no comments yet on this publication. Be the first to share your thoughts.