Even though crashes between trains and road users are rare events at railway level crossings, they are one of the major safety concerns for the Australian railway industry. Nearmiss events at level crossings occur more frequently, and can provide more information about factors leading to level crossing incidents. In this paper we introduce a video analytic approach for automatically detecting and localizing vehicles from cameras mounted on trains for detecting near-miss events. To detect and localize vehicles at level crossings we extract patches from an image and classify each patch for detecting vehicles. We developed a region proposals algorithm for generating patches, and we use a Convolutional Neural Network (CNN) for classifying each patch. To localize vehicles in images we combine the patches that are classified as vehicles according to their CNN scores and positions. We compared our system with the Deformable Part Models (DPM) and Regions with CNN features (R-CNN) object detectors. Experimental results on a railway dataset show that the recall rate of our proposed system is 29% higher than what can be achieved with DPM or R-CNN detectors.