In computer engineering, register renaming refers to a technique used to avoid unnecessary serialized execution of program instructions because of the reuse of the same registers by those instructions.
Programs are composed of instructions which operate on values. The instructions must name these values in order to distinguish them from one another. A typical instruction might say, add X and Y and put the result in Z. In this instruction, X, Y, and Z are the names of storage locations.
In order to have a compact instruction encoding, most processor instruction sets have a small set of special locations which can be directly named. For example, the x86 instruction set architecture has 8 integer registers, x86-64 has 16, many RISCs have 32, and IA-64 has 128. In smaller processors, the names of these locations correspond directly to elements of a register file.
Different instructions may take different amounts of time (e.g., CISC architecture). For instance, a processor may be able to execute hundreds of instructions while a single load from main memory instruction is in progress. Shorter instructions executed while the load is outstanding will finish first, thus the instructions are finishing out of the original program order. Out-of-order execution has been used in most recent high-performance CPUs to achieve some of their speed gains.
Consider this sequance of instructions running on an out-of-order CPU:
1.LOAD R1, LOC MEM 2.R1 = R1 + 2 3.STORE R1, LOC MEM[1024+8] 4.LOAD R1, LOC MEM 5.R1 = R1 + 4 6.STORE R1, LOC MEM[2048+8]
Instructions 4, 5, and 6 are independent of instructions 1, 2, and 3, but the processor cannot finish 4 until 3 is done, because 3 would then write the wrong value.
We can remove this restriction by changing the names of some of the registers:
1.LOAD R1, LOC MEM 2.R1 = R1 + 2 3.STORE R1, LOC MEM[1024+8] 4.LOAD R2, LOC MEM 5.R2 = R2 + 4 6.STORE R2, LOC MEM[2048+8]
Now instructions 4, 5, and 6 can be executed in parallel with instructions 1, 2, and 3, so that the program can be executed faster.
When possible, compilers do this renaming. But compilers are limited by the number of register inside the CPU. Many high performance CPUs have more physical registers than may be named directly in the instruction set and they can rename registers in hardware to achieve better instruction level parallelism.
Anything that is read and written can be renamed. While the general-purpose and floating-point registers are the most, flag and status registers or even individual status bits are commonly renamed as well.
Memory locations can also be renamed, although it is not commonly done to the level used in register renaming.
When more than one instruction references a particular location for an operand, either reading it (as an input) or writing it (as an output), executing those instructions in an order different from the original program order can cause to three kinds of problems, also known as hazards:
- Read-after-write (RAW)
- A read from a register or memory location must return the value placed there by the last write in program order, not some other write. This is referred to as a true dependency or flow dependency, and requires the instructions to execute in program order.
- Write-after-write (WAW)
- Successive writes to a particular register or memory location must leave that location containing the result of the second write. This can be resolved by squashing (synonyms: cancelling, annulling, mooting) the first write if necessary. WAW dependencies are also known as output dependencies.
- Write-after-read (WAR)
- A read from a register or memory location must return the last prior value written to that location, and not one written programmatically after the read. This is the sort of false dependency that can be resolved by renaming. WAR dependencies are also known as anti-dependencies.
- Instead of delaying the write until all reads are completed, two copies of the location can be maintained, the old value and the new value. Reads that precede, in program order, the write of the new value can be provided with the old value, even while other reads that follow the write are provided with the new value. The false dependency is broken and additional opportunities for out-of-order execution are created. When all reads needing the old value have been satisfied, it can be discarded. This is the essential concept behind register renaming.
- "Implementing Precise Exceptions in Pipelined Processors" Archived 2007-04-11 at the Wayback Machine, J.E. Smith and A.R. Pleszkun, 1985