In the past decades, machine learning (ML) tools have been successfully used in several medical diagnostic problems. While they often significantly outperform expert physicians (in terms of diagnostic accuracy, sensitivity, and specificity), they are mostly not being used in practice. One reason for this is that it is difficult to obtain an unbiased estimation of diagnose's reliability. We discuss how reliability of diagnoses is assessed in medical decision-making and propose a general framework for reliability estimation in machine learning, based on transductive inference. We compare our approach with a usual (machine learning) probabilistic approach as well as with classical stepwise diagnostic process where reliability of diagnose is presented as its post-test probability. The proposed transductive approach is evaluated on several medical datasets from the University of California (UCI) repository as well as on a practical problem of clinical diagnosis of the coronary artery disease (CAD). In all cases, significant improvements over existing techniques are achieved.