The Lactobacillus bulgaricus beta-galactosidase gene was cloned on a ca. 7-kilobase-pair HindIII fragment in the vector pKK223-3 and expressed in Escherichia coli by using its own promoter. The nucleotide sequence of the gene and approximately 400 bases of 3'- and 5'-flanking sequences was determined. The amino acid sequence of the beta-galactosidase, deduced from the nucleotide sequence of the gene, yielded a monomeric molecular mass of ca. 114 kilodaltons, slightly smaller than the E. coli lacZ and Klebsiella pneumoniae lacZ enzymes but larger than the E. coli evolved (ebgA) beta-galactosidase. The cloned beta-galactosidase was found to be indistinguishable from the native enzyme by several criteria. From amino acid sequence alignments, the L. bulgaricus beta-galactosidase has a 30 to 34% similarity to the E. coli lacZ, E. coli ebgA, and K. pneumoniae lacZ enzymes. There are seven regions of high similarity common to all four of these beta-galactosidases. Also, the putative active-site residues (Glu-461 and Tyr-503 in the E. coli lacZ beta-galactosidase) are conserved in the L. bulgaricus enzyme as well as in the other two beta-galactosidases mentioned above. The conservation of active-site amino acids and the large regions of similarity suggest that all four of these beta-galactosidases evolved from a common ancestral gene. However, these enzymes are quite different from the thermophilic beta-galactosidase encoded by the Bacillus stearothermophilus bgaB gene.