Smartphones and other consumer devices capable of capturing video content and sharing it on social media in nearly real time are widely available at a reasonable cost. Thus, there is a growing need for no-reference video quality assessment (NR-VQA) of consumer produced video content, typically characterized by capture impairments that are qualitatively different from those observed in professionally produced video content. To date, most of the NR-VQA models in prior art have been developed for assessing coding and transmission distortions, rather than capture impairments. In addition, the most accurate NR-VQA methods known in prior art are often computationally complex, and therefore impractical for many real life applications. In this paper, we propose a new approach for learning-based video quality assessment, based on the idea of computing features in two levels so that low complexity features are computed for the full sequence first, and then high complexity features are extracted from a subset of representative video frames, selected by using the low complexity features. We have compared the proposed method against several relevant benchmark methods using three recently published annotated public video quality databases, and our results show that the proposed method can predict subjective video quality more accurately than the benchmark methods. The best performing prior method achieves nearly similar accuracy, but at substantially higher computational cost.