Affordable Access

Micro Architecture for Fault Tolerant NOCs

Authors
Publication Date
Source
Universität Stuttgart, Fakultät 5, Germany, Computer Science Archive
Keywords
  • Reliability, Testing, And Fault-Tolerance (Cr B.8.1)
  • Multiprocessors) (Cr C.1.2 (Multiple Data Stream Architectures)
  • Multiprocessors) (Cr C.1.2 (Multiple Data Stream Architectures)
  • Network Operations (Cr C.2.3)
  • Data Structures (Cr E.1)
External links

Abstract

Due to the scaling of technology, it is possible to implement other architectures. Thus, more and more cores are placed on a chip. With the increasing number of cores is increasing the demand for communication. The alternative to the bus-based communication of a system-on-chip is a network-on-chip. A network-on-chip based system with hundreds or thousands of cores has a better performance and higher throughput than a comparable bus-based system-on-chip. The network on a chip is spanned by the switches. To each of these switches is connected to a core each. With the increasingly complex systems, the error rate of a system increases. The defects occurring thereby can have a significant impact on system performance and system availability. It must be ensured that a faulty connection between a switch and a core or a defective core will not affect the system operation. For this reason that these faults must be detected and tolerated. To detect faulty connections between the switch and the core, the port functionality of the connection is checked when an error occurs. Information about the faulty port is stored locally in the switch. A redundant connection between the core and the switches keeps the core connected if a switch breaks down or the connection to the core is broken. Three configurations, with two, three and four switches connected to a core are examined by numerical reliability calculations. The fault-tolerant architecture also modifies the routing algorithm. The packets must be delivered to each core through alternative connections too. Through these extensions, the availability and performance can be increased. In order to increase the reliability of the system transient errors of permanent errors distinguish. For this purpose, the verification of connections is expanded. The architecture is used to detect the faulty cores. The operations are scheduled to be performed on three identical cores connected to the same switch. If the result of one core is different to the other cores then the faulty core is disconnected from that switch. Through this triple modular redundancy, the reliability of the system increases.

There are no comments yet on this publication. Be the first to share your thoughts.

Statistics

Seen <100 times
0 Comments