Despite several benefits to modern communities and businesses, Twitter has attracted many spammers overwhelming legitimate users with unwanted and disruptive advertising and fake information. Detecting spammers is always challenging because there is a huge volume of data that needs to be analyzed while at the mean time spammers continue learning and changing their ways to avoid being detected by anti-spammer systems. Several spam classification systems are proposed using various features extracted from the content and user’s information from their Tweets. Nevertheless, no comprehensive study has been done to compare and evaluate the effectiveness and efficiency of these systems. It is not known what the best anti-spammer system is and why. This paper proposes an evaluation framework that allows researchers, developers, and practitioners to access existing user-based and content-based features, implement their own features, and evaluate the performance of their systems against other systems. Our framework helps identify the most effective and efficient spammer detection features, evaluate the impact of using different numbers of recent tweets, and therefore obtaining a faster and more accurate classifier model.