The modulation statistics of natural sound ensembles were analyzed by calculating the probability distributions of the amplitude envelope of the sounds and their time-frequency correlations given by the modulation spectra. These modulation spectra were obtained by calculating the two-dimensional Fourier transform of the autocorrelation matrix of the sound stimulus in its spectrographic representation. Since temporal bandwidth and spectral bandwidth are conjugate variables, it is shown that the joint modulation spectrum of sound occupies a restricted space: sounds cannot have rapid temporal and spectral modulations simultaneously. Within this restricted space, it is shown that natural sounds have a characteristic signature. Natural sounds, in general, are low-passed, showing most of their modulation energy for low temporal and spectral modulations. Animal vocalizations and human speech are further characterized by the fact that most of the spectral modulation power is found only for low temporal modulation. Similarly, the distribution of the amplitude envelopes also exhibits characteristic shapes for natural sounds, reflecting the high probability of epochs with no sound, systematic differences across frequencies, and a relatively uniform distribution for the log of the amplitudes for vocalizations. It is postulated that the auditory system as well as engineering applications may exploit these statistical properties to obtain an efficient representation of behaviorally relevant sounds. To test such a hypothesis we show how to create synthetic sounds with first and second order envelope statistics identical to those found in natural sounds.