Affordable Access

Publisher Website

Design and implementation of highly reliable dual-computer systems

Authors
Journal
Computers & Security
0167-4048
Publisher
Elsevier
Publication Date
Volume
28
Issue
7
Identifiers
DOI: 10.1016/j.cose.2009.04.003
Keywords
  • Reliability
  • Performance
  • Real-Time
  • Dual-Computer Systems
  • Fault Tolerance
  • Recovery Point
  • Satisfiability Model
  • Recovery Device
Disciplines
  • Computer Science
  • Design
  • Medicine

Abstract

Abstract Two of the main parameters of real-time computer systems are reliability and performance. Researchers are always looking for solutions to increase the values of these parameters, which is the goal of this study. To this end, we propose an architecture for a dual-computer system that operates in real-time with fault tolerance implemented purely by hardware. The hardware, as designed and implemented, performs the following key services: 1) determination of the fault type (temporary or permanent) and 2) localization of the faulty computer without using self-testing techniques or diagnostic routines. Our design has several benefits: 1) the designed hardware shortens the recovery point time period; 2) the proposed nontrivial sequence of fault-tolerant services reduces (to two) the number of logical segments that must be re-run to recover computational processes; and 3) the determination of the fault type allows for the elimination of only computers with permanent faults. These contributions yield improvements in both the performance and reliability of the system.

There are no comments yet on this publication. Be the first to share your thoughts.