Cache coherence

special case of memory coherence

A cache can be used to improve the performance of accessing a given resource. When there are several such caches for the same resource, as shown in the picture, this can lead to problems. Cache coherence or Cache coherency refers to a number of ways to make sure all the caches of the resource have the same data, and that the data in the caches makes sense (called data integrity). Cache coherence is a special case of memory coherence.

Multiple Caches of Shared Resource

There may be problems if there are many caches of a common memory resource, as data in the cache may no longer make sense, or one cache may no longer have the same data as the others. A common case where the problem occurs is the cache of CPUs in a multiprocessing system. As can be seen in the figure, if the top client has a copy of a memory block from a previous read and the bottom client changes that memory block, the top client could be left with an invalid cache of memory, without knowing. Cache coherence is there to manage such conflicts and maintain consistency between cache and memory.

Definition change

Coherence defines the behavior of reads and writes to the same memory location. The caches are coherent, if all of the following conditions are met:

  1. When a processor P reads a location X, after writing to that location, P must get the value it wrote, if no other processor wrote another value to that location. This is also true for monoprocessor systems, it means that the memory is able to keep a value written.
  2. Suppose there are two processors, P1 and P2, and P1 wrote a value X1, and afterwards, P2 wrote a value X2, if P1 reads the value, it must get the value written by P2, X2, and not the value it wrote, X1, if there are no other writes between the two. This means that the view of memory is coherent. If processors can read the same old value after the write made by P2, the memory would not be coherent.
  3. There can only be one write to a certain location in memory at a time. If there are several writes, they must occur one after another. In other words, if location X received two different values A and B, in this order, by any two processors, the processors can never read location X as B and then read it as A. The location X must be seen with values A and B in that order.

These conditions are defined supposing that the read and write operations are made instantaneously. However, this does not happen in computer hardware because of memory latency and other aspects of the architecture. A write by processor X may not be seen by a read from processor Y if the read is made within a very small time after the write has been made. The memory consistency model defines when a written value must be seen by a following read instruction made by the other processors.

Cache coherence mechanisms change

  • Directory-based coherence mechanisms maintain a central directory of cached blocks.
  • Snooping is the process where each cache monitors address lines for accesses to memory locations that are in its cache. When a write operation is observed to a location that a cache has a copy of, the cache controller invalidates its own copy of the snooped memory location.
  • Snarfing is where a cache controller watches both address and data in an attempt to update its own copy of a memory location when a second master modifies a location in main memory.

Distributed shared memory systems mimic these mechanisms so that they can maintain consistency between blocks of memory in loosely coupled systems.

The two most common types of coherence that are typically studied are Snooping and Directory-based. Each has its own benefits and drawbacks. Snooping protocols tend to be faster, if enough bandwidth is available, since all transactions are a request/response seen by all processors. The drawback is that snooping is not scalable. Every request must be broadcast to all nodes in a system. As the system gets larger, the size of the (logical or physical) bus and the bandwidth it provides must grow. Directories, on the other hand, tend to have longer latencies (with a 3 hop request/forward/respond) but use much less bandwidth since messages are point to point and not broadcast. For this reason, many of the larger systems (>64 processors) use this type of cache coherence.