Affordable Access

Software register synchronization for super-scalar processors with partitioned register files

Publication Date
  • Communication
  • Computer Science


Increases in high-end microprocessor performance are becoming increasingly reliant on simultaneous issuing of instructions to multiple functional units on a single chip. As the number of functional units increases, the chip area, wire lengths, and delays required for a monolithic register file become unreasonable. Future microprocessors will have partitioned register files. The correctness of contemporary super-scalar processors relies on synchronized accesses to registers. This issue will be critical in systems with partitioned register files. Current techniques for managing register access ordering, such as register score boarding and register renaming, are inadequate for architectures with partitioned register files. This thesis demonstrates the difficulties of implementing these techniques with a partitioned register file, and introduces a novel compiler algorithm which addresses this issue. Whenever a processor using register scoreboarding or register renaming issues an instruction, either the scoreboard or the register name table must be accessed to check the instruction's sources and destination. If the register file is partitioned, checking the scoreboard or name table for a remote register is difficult. One functional unit cannot determine at runtime when it is safe to write to a register in another functional unit's register file. While these techniques can be supported through use of a global or partitioned scoreboard, such an implementation would be complex, and have latency problems similar to those of a monolithic register file. This work discusses the organization of multiple functional units into loosely-coupled groups of functional units that can communicate via direct register writes, but with purely local hardware interlocks to force synchronization. A novel compiler algorithm, Software Register Synchronization (SRS), is introduced. A comparison between SRS and existing hardware mechanisms is conducted using the Multiflow compiler modified to generate code for the MIT M-Machine, Experiments to evaluate the SRS algorithm are run on the M-Machine simulator being used for architectural verification. In order to support partitioned register file architectures, an alternative to traditional hardware methods for managing register synchronization needs to be developed. This thesis presents a novel compiler algorithm to address this need. The SRS algorithm is described, demonstrated to be correct, and evaluated. Details of the implementation of the SRS algorithm within the Multiflow compiler for the MIT M-Machine are provided.

There are no comments yet on this publication. Be the first to share your thoughts.