Any large library of information requires efficient ways to organise it and methods that allow people to access information efficiently and collections of digital images are no exception. Automatically creating high-level semantic tags based on image content is difficult, if not impossible to achieve accurately. In this thesis a framework is presented that allows for the automatic creation of rich and accurate tags for images with landmarks as the main object. This framework uses state of the art computer vision techniques fused with the wide range of contextual information that is available with community contributed imagery. Images are organised into clusters based on image content and spatial data associated with each image. Based on these clusters different types of classifiers are* trained to recognise landmarks contained within the images in each cluster. A novel hybrid approach is proposed combining these classifiers with an hierarchical matching approach to allow near real-time classification and captioning of images containing landmarks.