Abstract Biclustering is one of the important techniques in neurocomputing and bioinformatics. Geometric Biclustering (GBC) algorithm is used to find the common patterns in given microarray data for neural processing. A microarray can produce a massive amount of data and require high computational power for data analysis. With intrinsic parallel architecture and appropriate mapping technique Graphical Processing Unit (GPU) has the advantage of processing large number of threads and data compared to CPU. This paper analyzes the parallelism and data reuse of the GBC algorithm, and presents three different efficient implementations using five benchmarks from real world. The proposed GPU-based GBC program achieves significant speedup over highly optimized CPU program. By comparing implementation results, the paper studies how to design a scalable architecture for mapping the GBC and other similar algorithms that deal with microarray data analysis. The paper also explores how GPU-based GBC is affected by the input data size.