Abstract Operational monitoring of land cover from satellite data will require automated procedures for analyzing large volumes of data. We propose multiple criteria for assessing algorithms for this task. In addition to standard classification accuracy measures, we propose criteria to account for computational resources required by the algorithms, stability of the algorithms, and robustness to noise in the training data. We also propose that classification accuracy take account, through estimation of misclassification costs, of unequal consequences to the user depending on which cover types are confused. In this article, we apply these criteria to three variants of decision tree classifiers, a standard decision tree implemented in C5.0 and two techniques recently proposed in the machine learning literature known as “bagging” and “boosting.” Each of these algorithms are applied to two data sets, a global land cover classification from 8 km AVHRR data and a Landsat Thematic Mapper scene in Peru. Results indicate comparable accuracy of the three variants of the decision tree algorithms on the two data sets, with boosting providing marginally higher accuracies. The bagging and boosting algorithms, however, are both substantially more stable and more robust to noise in the training data compared with the standard C5.0 decision tree. The bagging algorithm is most costly in terms of computational resources while the standard decision tree is least costly. The results illustrate that the choice of the most suitable algorithm requires consideration of a suite of criteria in addition to the traditional accuracy measures and that there are likely to be trade-offs between algorithm performance and required computational resources.