Abstract The doublet or nearest-neighbour ratios of the nucleotides in various computer-generated sequences of DNA have been counted to find out which sequences would have the same ratios as those measured for guinea-pig DNA by Russell et al. (1976). Their data shows that the ratio patterns for all nuclear DNA fractions except satellite, ribosomal and tRNA coding DNA are similar irrespective of G+C content and are characterised by the amount of the doublet CpG being less than 30% of that expected on a random basis. To construct and analyse such theoretical sequences, methods have been developed which allow to be counted the doublet frequencies that random DNA and the DNA expected to code for any amino acid (AA) sequence would have were they analysed experimentally. These methods permit the C+G content to be altered and the frequency of the doublet CpG to be lowered without affecting the information stored in the DNA. The former is achieved by selecting codons for a given AA that are high or low in G and C while the latter requires selecting against triplets that either contain CpG or will cause a CpG to occur between codons. The results show that no DNA sequence that we have been able to construct using the unrestricted genetic code has doublet ratios similar to those observed. However, the DNA expected to code for a group of 27 vertebrate proteins (5237 AAs) of diverse functions has doublet ratios virtually identical to those measured experimentally for the 47% G+C fraction, provided that 77·5 % of the CpG is eliminated. The data for the 34–43% G+C fractions are matched well by the protein sequence provided that codons low in G and C are selected and, again, that 77·5% of CpG is eliminated. We have been unable to match the data for the satellite DNA. A perhaps surprising result was that DNA with a random sequence of nucleotides but subjected to the removal of 80% of CpG had doublet ratios that were similar to the experimental data but matched them less well than the doublets of protein-coding DNA. This result probably does no more than emphasise that a significant part of the match of the pattern of doublet ratios of guinea-pig DNA derives from the elimination of CpG. The effect of evolution (random mutation) on the doublet ratios of protein-coding DNA has been investigated by assuming a mutation rate of 3·10 −9 per base per generation (the mutation rate of haemoglobin) and seeing how a great many generations of such mutation affect the doublet ratios. The results show that it will take ~1·5·10 6 generations for the number of termination codons to double and ~2·10 7 generations for the doublet ratios to become indistinguishable from those of random DNA. This seems to imply either that selection acts over the whole of the DNA or that the mutation rate of haemoglobin DNA is unusually high. The results, as a whole, support the view that, whether or not all non-satellite DNA actually codes for protein, its sequences are similar to those that would code for proteins.