For the past four decades, cost and features have driven CMOS scaling. Severe lithography and material limitations seen below the 20 nm node, however, are challenging the fundamental premise of affordable CMOS scaling. Just continuing to co-optimize leaf cell circuit and layout designs with process technology does not enable us to exploit the challenges of a sub-20 nm CMOS. For affordable scaling it is imperative to work past sub-20 nm technology impediments while exploiting its features. To this end, we propose to broaden the scope of design technology co-optimization (DTCO) to be more holistic by including micro-architecture design and CAD, along with circuits, layout and process technology. Applying such holistic DTCO to the most significant block in a system-on-chip (SoC), embedded memory, we can synthesize smarter and efficient embedded memory blocks that are customized to application needs. To evaluate the efficacy of the proposed holistic DTCO process, we designed, fabricated and tested several design experiments in a state-of-the-art IBM 14SOI process. DTCOed leaf cells, standard cells and SRAM bitcells were robust during testing, but failed to meet node to node area scaling requirements. Holistic DTCO, when applied to a widely used parallel access SRAM sub-block, consumed 25% less area with a 50% better performance per watt compared to a traditional implementation using compiled SRAM blocks and standard cells. To extend the benefits of holistic DTCO to other embedded memory intensive sub-blocks in SoCs, we developed a readily customizable smart memory synthesis framework (SMSF). We believe that such an approach is important to establish an affordable path for sub-20 nm scaling.