Abstract To investigate the cross-modal transfer of movement patterns necessary to perform melodies on the piano, 22 non-musicians learned to play short sequences on a piano keyboard by 1) merely listening and replaying (vision of own fingers occluded) or 2) merely observing silent finger movements and replaying (on a silent keyboard). After training, participants recognized with above chance accuracy 1) audio-motor learned sequences upon visual presentation (89±17%), and 2) visuo-motor learned sequences upon auditory presentation (77±22%). The recognition rates for visual presentation significantly exceeded those for auditory presentation (p<.05). fMRI revealed that observing finger movements corresponding to audio-motor trained melodies is associated with stronger activation in the left rolandic operculum than observing untrained sequences. This region was also involved in silent execution of sequences, suggesting that a link to motor representations may play a role in cross-modal transfer from audio-motor training condition to visual recognition. No significant differences in brain activity were found during listening to visuo-motor trained compared to untrained melodies. Cross-modal transfer was stronger from the audio-motor training condition to visual recognition and this is discussed in relation to the fact that non-musicians are familiar with how their finger movements look (motor-to-vision transformation), but not with how they sound on a piano (motor-to-sound transformation).