High spatial resolution information on urban air pollution levels is unavailable in many areas globally, partially due to high input data needs of existing estimation approaches. Here we introduce a computer vision method to estimate annual means for air pollution levels from street level images. We used annual mean estimates of NO2 and PM2.5 concentrations from locally calibrated models as labels from London, New York, and Vancouver to allow for compilation of a sufficiently large dataset (~250k images for each city). Our experimental setup is designed to quantify intra and intercity transferability of image-based model estimates. Performances were high and comparable to traditional land-use regression (LUR) and dispersion models when training and testing on images from the same city (R2 values between 0.51 and 0.95 when validated on data from ground monitoring stations). Like LUR models, transferability of models between cities in different geographies is more difficult. Specifically, transferability between the three cities i.e., London, New York, and Vancouver, which have similar pollution source profiles were moderately successful (R2 values between zero and 0.67). Comparatively, performances when transferring models trained on these cities with very different source profiles i.e., Accra in Ghana and Hong Kong were lower (R2 between zero and 0.21) suggesting the need for local calibration with local calibration using additional measurement data from cities that share similar source profiles.