Markov chain Monte Carlo (MCMC) methods have been used in many fields (physics, chemistry, biology, and computer science) for simulation, inference, and optimization. The essence of these methods is to simulate a Markov chain whose state X follows a target probability X ∼ π(X). In many applications, π(X) is defined on a graph G whose vertices represent elements in the system and whose edges represent the connectivity of the elements. X is a vector of variables on the vertices which often take discrete values called labels or colors. Designing rapid mixing Markov chain is a challenging task when the variables in the graph are strongly coupled. Methods, like the single-site Gibbs sampler, often experience long waiting time. A well-celebrated algorithm for sampling on graphs is the Swendsen-Wang (1987) (SW) method. The SWmethod finds a cluster of vertices as a connected component after turning off some edges probabilistically, and flips the color of the cluster as a whole. It is shown to mix rapidly under certain conditions. Unfortunately, the SW method is only applicable to the Ising/Potts models and slow down critically in the presence of "external fields" i.e. likelihood in Bayesian inference. In this paper, we present a general cluster sampling method which achieves the following objectives. Firstly, it extends the SW algorithm to general Bayesian inference on graphs. Especially we focus a number of image analysis problems where the graph sizes are in the order of O(103) — O(106) with O(1) connectivity. Secondly, the edge probability for clustering the vertices are discriminative probabilities computed from data. Empirically such data driven clustering leads to much improved efficiency. Thirdly, we present a generalized Gibbs sampler which samples the color of a cluster according to a conditional probability (like the Gibbs sampler) weighted by a product of edge probabilities. Fourthly, we design the algorithm to work on multi-grid and multi-level graphs. The algorithm is tested on typical problems in image analysis, such as image segmentation and motion analysis, In our experiments, the algorithm is O(102) orders faster than the single-site Gibbs sampler. In the literature, there are several ways for interpreting the SW-method which leads to various analyses or generalizations, including random cluster model (RCM), data augmentation, slice sampling, and partial decoupling. We take a different route by interpreting SW as a Metropolis-Hastings step with auxiliary variables for proposing the moves or we can view it as a generalized hit-and-run method.