Abstract Several criteria and measures have been proposed and used in evaluating interactive IR performance. There is no agreement about what is a successful IR performance or which are the best existing evaluation measure(s). This study aims to identify the best evaluation measure(s) for interactive IR performance. Twenty measures of IR performance were selected for study in the natural IR environment, involving 40 real end-users from an academic setting with 40 real information problems, interacting with six professional intermediaries searching in large operational IR systems. These end-users were responsible for the costs of their own searches. This study showed that value of search results as a whole is the best single measure of interactive IR performance among the measures selected. Precision, one of the most important traditional measures of effectiveness, is not significantly correlated with success. Users appear to be more concerned with absolute recall than with precision. The study also identified the two more basic factors for future IR evaluation which can account for a much higher proportion of the total variance than value of search results as a whole alone can. Seventeen new success categories were also suggested for future investigation.