Here, we present IDNet, an original user authentication framework from smartphone-acquired motion signals. Its goal is to recognize a target user from her/his way of walking, using the accelerometer and gyroscope (inertial) signals provided by a commercial smartphone worn in the front pocket of the user's trousers. Our design features several innovations including: a robust and smartphone-orientation-independent walking cycle extraction block, a novel feature extractor based on convolutional neural networks, a one-class support vector machine to classify walking cycles, and the coherent integration of these into a multi-stage authentication system. To the best of our knowledge, our system is the first exploiting convolutional neural networks as universal feature extractors for gait recognition, and using classification results from subsequent walking cycles into a multi-stage decision making framework. Experimental results show the superiority of our approach against state-of-the-art techniques, leading to misclassification rates (either false negatives or positives) smaller than 0.15% in fewer than five walking cycles. Design choices are discussed and motivated throughout, assessing their impact on the authentication performance.