Using event-related fMRI in a sample of 42 healthy participants, we compared the cerebral activity maps obtained when classifying spoken sentences based on the mental content of the main character (belief, deception or empathy) or on the emotional tonality of the sentence (happiness, anger or sadness). To control for the effects of different syntactic constructions (such as embedded clauses in belief sentences), we subtracted from each map the BOLD activations obtained during plausibility judgments on structurally matching sentences, devoid of emotions or ToM. The obtained theory of mind (ToM) and emotional speech comprehension networks overlapped in the bilateral temporo-parietal junction, posterior cingulate cortex, right anterior temporal lobe, dorsomedial prefrontal cortex and in the left inferior frontal sulcus. These regions form a ToM network, which contributes to the emotional component of spoken sentence comprehension. Compared with the ToM task, in which the sentences were enounced on a neutral tone, the emotional sentence classification task, in which the sentences were play-acted, was associated with a greater activity in the bilateral superior temporal sulcus, in line with the presence of emotional prosody. Besides, the ventromedial prefrontal cortex was more active during emotional than ToM sentence processing. This region may link mental state representations with verbal and prosodic emotional cues. Compared with emotional sentence classification, ToM was associated with greater activity in the caudate nucleus, paracingulate cortex, and superior frontal and parietal regions, in line with behavioral data showing that ToM sentence comprehension was a more demanding task.