We present a global optimizer, based on a conditional generative neural network, which can output ensembles of highly efficient topology-optimized metasurfaces operating across a range of parameters. A key feature of the network is that it initially generates a distribution of devices that broadly samples the design space and then shifts and refines this distribution toward favorable design space regions over the course of optimization. Training is performed by calculating the forward and adjoint electromagnetic simulations of outputted devices and using the subsequent efficiency gradients for backpropagation. With metagratings operating across a range of wavelengths and angles as a model system, we show that devices produced from the trained generative network have efficiencies comparable to or better than the best devices produced by adjoint-based topology optimization, while requiring less computational cost. Our reframing of adjoint-based optimization to the training of a generative neural network applies generally to physical systems that can utilize gradients to improve performance.