Abstract The goal of this study was to compare the performance of a chimpanzee and humans on auditory-visual intermodal matching of conspecifics and non-conspecifics. The task consisted of matching vocal samples to facial images of the corresponding vocalizers. We tested the chimpanzee and human subjects with both chimpanzee and human stimuli to assess the involvement of species-specificity in the recognition process. All subjects were highly familiar with the stimuli. The chimpanzee subject, named Pan, had had extensive previous experience in auditory-visual intermodal matching tasks. We found clear evidence of a species-specific effect: the chimpanzee and human subjects both performed better at recognizing conspecifics than non-conspecifics. Our results suggest that Pan's early exposure to human caretakers did not seem to favor a perceptual advantage in better discriminating familiar humans compared to familiar conspecifics. The results also showed that Pan's recognition of non-conspecifics did not significantly improve over the course of the experiment. In contrast, human subjects learned to better discriminate non-conspecific stimuli, suggesting that the processing of recognition might differ across species. Nevertheless, this comparative study demonstrates that species-specificity significantly affects intermodal individual recognition of highly familiar individuals in both chimpanzee and human subjects.