A content-based autotagging system is a computer system that automatically annotates multimedia data such as music, images, and video with tags (semantically meaningful text- based tokens) based solely on the multimedia content. When developing an autotagging system, three important design decisions are 1) selecting a vocabulary of tags, 2) choosing a feature-based representation of the multimedia content, and 3) picking a supervised learning framework. If we select a tag that cannot be consistently used to annotate multimedia data based on the multimedia content alone (e.g., inconsistent human annotation), or if the feature representation does not encode the information necessary to annotate the multimedia content, then it is unlikely that the supervised learning framework will be able to successfully annotate novel multimedia content with that tag. This paper proposes an approach to select a vocabulary of tags based on sparse canonical component analysis (sparse CCA). That is, sparse CCA is used to find a set of "acoustically meaningful" tags that are correlated with a chosen feature-based representation of multimedia content. As a result, we find that we are better able to model the selected tags using our supervised autotagging system. In this paper, we specifically focus on music since we are interested in building a content-based music annotation system.