We have characterized the U2 snRNA gene family in the higher plant Arabidopsis thaliana. It consists of 10-15 genes which do not appear to be closely clustered. Six of the U2 genes were sequenced and the structure of the Arabidopsis U2 RNA termini was determined in order to define the coding regions. Each of the genes codes for a distinct RNA differing from the others by 2-13 point mutations, localized in the 3' part of the 196 nt-long RNA. The upstream non-coding regions of all genes show strong sequence similarity in positions -81 to -1 and contain three highly conserved sequence elements: GTCCCACATCG (positions -78 to -68; 100% conservation), GTAGTATAAATA (-37 to -26) and CAANTC (-6 to -1). The coding regions are followed by the sequence CAN(7-9)AGTNNAA, a putative termination signal. The expression of three of the genes was studied in electroporated Orychophragmus violaceus and Nicotiana tabacum protoplasts. The genes, one of which contains a T --> C change in the Sm antigen binding site, were actively transcribed and processed into U2 RNAs of the expected size and containing trimethylguanosine caps. Deletion analysis indicates that sequences upstream of the conserved -80 to -1 region are not important for transcription in protoplasts. The 5'-terminal parts of U2 RNAs from several monocot and dicot plants were sequenced. This region, containing the sequence implicated in base-pairing with the branch point in pre-mRNA introns, is identical in all U2 RNAs examined.