Consistency and coherence of gene expression data across multiple sites depends on several factors such as platform (oligo, cDNA, etc.), environmental conditions at each laboratory, and data quality. The Hepatotoxicity Working Group of the International Life Sciences Institute Health and Environmental Sciences Institute consortium on the application of genomics to mechanism-based risk assessment is investigating these factors by comparing high-density gene expression data sets generated on two sets of RNA from methapyrilene (MP) experiments conducted at Abbott Laboratories and Boehringer-Ingelheim Pharmaceuticals, Inc. using a single platform (Affymetrix Rat Genome U34A GeneChip) at seven different sites. This article focuses on the evaluation of data quality and statistical models that facilitate the comparison of such data sets at the probe level. We present methods for exploring and quantitatively assessing differences in the data, with the principal goal being the generation of lists of site-insensitive genes responsive to low and high doses of MP. A combination of numerical and graphical techniques reveals important patterns and partitions of variability in the data, including the magnitude of the site effects. Although the site effects are significantly large in the analysis results, they appear to be primarily additive and therefore can be adjusted in the statistical calculations in a way that does not bias conclusions regarding treatment differences.