Affordable Access

deepdyve-link
Publisher Website

An Efficient Data Indexing Approach on Hadoop Using Java Persistence API

Authors
  • Lai, Yang
  • ZhongZhi, Shi
Publication Date
Oct 13, 2010
Identifiers
DOI: 10.1007/978-3-642-16327-2_27
OAI: oai:HAL:hal-01055056v1
Source
HAL-SHS
Keywords
Language
English
License
Unknown
External links

Abstract

Data indexing is common in data mining when working with high-dimensional, large-scale data sets. Hadoop, a cloud computing project using the MapReduce framework in Java, has become of significant interest in distributed data mining. To resolve problems of globalization, random-write and duration in Hadoop, a data indexing approach on Hadoop using the Java Persistence API (JPA) is elaborated in the implementation of a KD-tree algorithm on Hadoop. An improved intersection algorithm for distributed data indexing on Hadoop is proposed, it performs O(M+logN), and is suitable for occasions of multiple intersections. We compare the data indexing algorithm on open dataset and synthetic dataset in a modest cloud environment. The results show the algorithms are feasible in large-scale data mining.

Report this publication

Statistics

Seen <100 times