Massively distributed concept drift handling in large networks

Affordable Access

Download Read

Massively distributed concept drift handling in large networks

Type
Published Article
Journal
Advances in Complex Systems
Publisher
World Scientific
Publication Date
Sep 26, 2013
Volume
16
Issue
04n05
Identifiers
DOI: 10.1142/s0219525913500215
License
Green

Abstract

Massively Distributed Concept Drift Handling in Large Networks∗ István Hegedu˝s, Róbert Ormándi University of Szeged Szeged, H-6720, Hungary {ihegedus, ormandi}@inf.u-szeged.hu Márk Jelasity Univ. Szeged and Hungarian Acad. Sci. Szeged, H-6720, Hungary [email protected] Abstract Massively distributed data mining in large networks such as smart device platforms and peer-to-peer systems is a rapidly developing research area. One important problem here is concept drift, where global data patterns (movement, preferences, activities, etc.) change according to the actual set of participating users, the weather, the time of day, or as a result of events such as accidents or even natural catastrophes. In an important case—when the network is very large but only a few training samples can be obtained at each node locally—no efficient distributed solution is known that could follow concept drift efficiently. This case is characteristic of smart device platforms where each device stores only one local observation or data record related to a learning problem. Here we present two algorithms to handle concept drift. None of the algorithms collects data to a central location, instead models of the data perform random walks in the network, while being improved using an online learning algorithm. The first algorithm achieves adaptivity by maintaining young as well as old models in the network according to a fixed age distribution. The second one measures the performance of models locally, and discards them if they are judged outdated. We demonstrate through a thorough experimental analysis that our algorithms outperform the known competing methods if the number of independent local samples is limited relative to the speed of drift: a typical scenario in our targeted application domains. The two algorithms have different strengths: while the age distribution approach is very simple and efficient, explicit drift detection can be useful in monitoring applications to trigger control action

Report this publication

Statistics

Seen <100 times
Downloaded <100 times