Accurately distinguishing aerial photographs from different categories is a promising technique in computer vision. It can facilitate a series of applications such as video surveillance and vehicle navigation. In the paper, a new image kernel is proposed for effectively recognizing aerial photographs. The key is to encode high-level semantic cues into local image patches in a weakly-supervised way, and integrate multimodal visual features using a newly-developed hashing algorithm. The flowchart can be elaborated as follows. Given an aerial photo, we first extract a number of graphlets to describe its topological structure. For each graphlet, we utilize color and texture to capture its appearance, and a weakly-supervised algorithm to capture its semantics. Thereafter, aerial photo categorization can be naturally formulated as graphlet-to-graphlet matching. As the number of graphlets from each aerial photo is huge, to accelerate matching, we present a hashing algorithm to seamlessly fuze the multiple visual features into binary codes. Finally, an image kernel is calculated by fast matching the binary codes corresponding to each graphlet. And a multi-class SVM is learned for aerial photo categorization. We demonstrate the advantage of our proposed model by comparing it with state-of-the-art image descriptors. Moreover, an in-depth study of the descriptiveness of the hash-based graphlet is presented.