Summary Background Essential tremor of the voice (ETV) is an involuntary intention tremor of the vocal folds that causes fluctuations in fundamental frequency (f0) and/or intensity leading to an unsteady voice. There is limited data on how different acoustic variables affect perception of severity of tremor. Aim The purpose of the study was to determine if systematic changes in f0, rate or modulation frequency (ff0m), extent or depth of modulation (df0m), and signal-to-noise ratio (SNR) affect perception of severity of tremor. Method Vowel phonations of four speakers (two male and two female) with a clinical diagnosis of ETV were selected from the Kay Elemetrics Disordered Voice Database (Lincoln Park, NJ). A high fidelity speech vocoder (STRAIGHT; Kawahara, 1997) was used to synthesize the f0 contour for each of these voices, which were varied in mean f0, ff0m, and df0m. The f0 contour was modified 30Hz above and below the mean f0 for each speaker. ff0m ranged from 3 to 12Hz in steps of 3Hz. df0m ranged from 2 to 32Hz in steps of 6Hz. Six (three experts and three naïve) listeners rated the “severity” of tremor on a seven-point rating scale. Results Significant main effects and interactions were found between the study variables. Perceived severity of tremor increased with ff0m and df0m. There was no systematic effect of SNR on perceived tremor severity. Conclusion The perception of severity for steady-state tremor results from a complex interaction of multiple acoustic cues with df0m acting as the primary acoustic cue.