Dirty paper coding (DPC) refers to methods for pre-subtraction of known interference at the transmitter of a multiuser communication system. There are numerous applications for DPC, including coding for broadcast channels. Recently, lattice-based coding techniques have provided several designs for DPC. In lattice-based DPC, there are two codes - a convolutional code that defines a lattice used for shaping and an error correction code used for channel coding. Several specific designs have been reported in the recent literature using convolutional and graph-based codes for capacity-approaching shaping and coding gains. In most of the reported designs, either the encoder works on a joint trellis of shaping and channel codes or the decoder requires iterations between the shaping and channel decoders. This results in high complexity of implementation. In this work, we present a lattice-based DPC scheme that provides good shaping and coding gains with moderate complexity at both the encoder and the decoder. We use a convolutional code for sign-bit shaping, and a low-density parity check (LDPC) code for channel coding. The crucial idea is the introduction of a one-codeword delay and careful parsing of the bits at the transmitter, which enable an LDPC decoder to be run first at the receiver. This provides gains without the need for iterations between the shaping and channel decoders. Simulation results confirm that at high rates the proposed DPC method performs close to capacity with moderate complexity. As an application of the proposed DPC method, we show a design for superposition coding that provides rates better than time-sharing over a Gaussian broadcast channel.