The direct repeat region in Mycobacterium tuberculosis complex strains is composed of multiple direct variant repeats (DVRs), each of which is composed of a 36-bp direct repeat (DR) plus a nonrepetitive spacer sequence of similar size. It has been shown previously that clinical isolates show extensive polymorphism in the DR region by the variable presence of DVRs, and this polymorphism has been used in the epidemiology of tuberculosis. In an attempt to better understand the evolutionary scenario leading to polymorphic DR loci and to improve strain differentiation by spoligotyping, we characterized and compared the DNA sequences of the complete DR region and its flanking DNA of M. tuberculosis complex strains. We identified 94 different spacer sequences among 26 M. tuberculosis complex strains. No sequence homology was found between any of these spacers and M. tuberculosis DNA outside of the DR region or with any other known bacterial sequence. Although strains differed extensively in the presence or absence of DVRs, the order of the spacers in the DR locus was found to be well conserved. The data strongly suggest that the polymorphism in clinical isolates is the result of successive deletions of single discrete DVRs or of multiple contiguous DVRs from a primordial DR region containing many more DVRs than seen in present day isolates and that virtually no scrambling of DVRs took place during evolution. Because the majority of the novel spacer sequences identified in this study were confined to isolates of the rare Mycobacterium canettii taxon, the use of the novel spacers in spoligotyping led only to a slight improvement of strain differentiation by spoligotyping.