It has been suggested that neurons can represent sensory input using probability distributions and neural circuits can perform probabilistic inference. Lateral connections between neurons have been shown to have non-random connectivity and modulate responses to stimuli within the classical receptive field. Large-scale efforts mapping local cortical connectivity describe cell type specific connections from inhibitory neurons and like-to-like connectivity between excitatory neurons. To relate the observed connectivity to computations, we propose a neuronal network model that approximates Bayesian inference of the probability of different features being present at different image locations. We show that the lateral connections between excitatory neurons in a circuit implementing contextual integration in this should depend on correlations between unit activities, minus a global inhibitory drive. The model naturally suggests the need for two types of inhibitory gates (normalization, surround inhibition). First, using natural scene statistics and classical receptive fields corresponding to simple cells parameterized with data from mouse primary visual cortex, we show that the predicted connectivity qualitatively matches with that measured in mouse cortex: neurons with similar orientation tuning have stronger connectivity, and both excitatory and inhibitory connectivity have a modest spatial extent, comparable to that observed in mouse visual cortex. We incorporate lateral connections learned using this model into convolutional neural networks. Features are defined by supervised learning on the task, and the lateral connections provide an unsupervised learning of feature context in multiple layers. Since the lateral connections provide contextual information when the feedforward input is locally corrupted, we show that incorporating such lateral connections into convolutional neural networks makes them more robust to noise and leads to better performance on noisy versions of the MNIST dataset. Decomposing the predicted lateral connectivity matrices into low-rank and sparse components introduces additional cell types into these networks. We explore effects of cell-type specific perturbations on network computation. Our framework can potentially be applied to networks trained on other tasks, with the learned lateral connections aiding computations implemented by feedforward connections when the input is unreliable and demonstrate the potential usefulness of combining supervised and unsupervised learning techniques in real-world vision tasks.