Abstract A previously published method for predicting the frequency of random occurrence of a completely specified DNA oligomer in a longer sequence dataset has been generalized to allow degeneracy in the oligomer sequence. With this enhancement, several datasets consisting of sequences from the human genome were searched for the occurrence of consensus binding sites for a set of 13 transcription factors. Although because of the biological significance of these sequences one might predict that they would occur more often than the random frequency, many of the consensus oligomers were found at lower than expected frequencies. Several (G + C)-rich oligomers were found to be moderately over-represented, but this could be accounted for, in part, by the occurrence of (G + C)-rich tracts in the human sequences. Regions very high in (G + C) were found to occur at much higher frequencies than expected in the human genome, and this severely limits the usefulness of this approach for predicting the frequency of (G + C)-rich oligomers. Unexpectedly, more than 1% of the human genome consists of tracts at least 28 bp in length with a (G + C) content greater than 85%.