Finding the preimage of a feature vector in kernel principal component analysis (KPCA) is of crucial importance when KPCA is applied in some applications such as image preprocessing. Since the exact preimage of a feature vector in the kernel feature space, normally, does not exist in the input data space, an approximate preimage is learned and encouraging results have been reported in the last few years. However, it is still difficult to find a "good" estimation of preimage. As estimation of preimage in kernel methods is ill-posed, how to guide the preimage learning for a better estimation is important and still an open problem. To address this problem, a penalized strategy is developed in this paper, where some penalization terms are used to guide the preimage learning process. To develop an efficient penalized technique, we first propose a two-step general framework, in which a preimage is directly modeled by weighted combination of the observed samples and the weights are learned by some optimization function subject to certain constraints. Compared to existing techniques, this would also give advantages in directly turning preimage learning into the optimization of the combination weights. Under this framework, a penalized methodology is developed by integrating two types of penalizations. First, to ensure learning a well-defined preimage, of which each entry is not out of data range, convexity constraint is imposed for learning the combination weights. More insight effects of the convexity constraint are also explored. Second, a penalized function is integrated as part of the optimization function to guide the preimage learning process. Particularly, the weakly supervised penalty is proposed, discussed, and extensively evaluated along with Laplacian penalty and ridge penalty. It could be further interpreted that the learned preimage can preserve some kind of pointwise conditional mutual information. Finally, KPCA with preimage learning is applied on face image data sets in the aspects of facial expression normalization, face image denoising, recovery of missing parts from occlusion, and illumination normalization. Experimental results show that the proposed preimage learning algorithm obtains lower mean square error (MSE) and better visual quality of reconstructed images.