Search This Blog

Friday, September 30, 2011




Dependable computer systems are required in applications which involve human life or large economics. In this course we study the theory and practice of design of such system both at hardware and software level. We will cover the following topics.

Dependability concepts: dependable system, techniques for achieving dependability, dependability measures, fault, error, failure, and classification of faults and failures. Dependability measures and reliability calculation.

Fault tolerant strategies: Fault detection, masking, containment, location, reconfiguration, and recovery.


Fault tolerant design techniques: Hardware redundancy, software redundancy, time redundancy, and information redundancy.

Fault tolerance in real-time systems: Time-space tradeoff, imprecise computation, (m,k)-firm deadline model, fault tolerant scheduling algorithms.

Dependable communication: Dependable channels, survivable networks, fault-tolerant routing.

Fault tolerance in distributed systems: Building blocks: consensus protocols, fault diagnosis, clock synchronization, stable storage and RAID architectures;  checkpointing and recovery; atomic actions; data replication and resiliency.

Dependability evaluation techniques and tools: Fault trees, Markov chains, Petri Nets; Case studies.

Analysis of fault tolerant hardware and software architectures.

Case studies of dependable systems.

Reading of some of the state-of-the-art research material.

Dependability Concepts  
Lecture 01
 [ PPT ] [pdf]
Lecture 02
[ PPT ]  [pdf]
Lecture 03
[ PPT ] [pdf]
Fault-Tolerant (FT) Design Techniques  
Lecture 04
 [ PPT ] [pdf]
Lecture 05
 [ PPT ] [pdf]
Information Redundancy - self reading
Dependability Modeling  
Reliability, MTTF, etc. PPT
Fault Tree Analysis
Petri Nets  
FT in Distributed Systems  
Stable storage -- RAID PPT
Stable storage - advanced RAID PPT
Consensus PPT
Clock Synchronization PPT
System-level diagnosis PPT
Checkpoint and Rollback recovery PPT
Atomic actions -- Lock & Commit Protocols PPT
Replica management protocols PPT
FT in Networks  
Dependable communication - 1 PPT
Primary-backup path PPT
Fault Localization PPT
Dependability-Security PDF
FT in Real-Time Systems  
Lecture 06
[ PPT ] [pdf]
Lecture 07
[ PPT ] [pdf]
Lecture 08
[ PPT ] [pdf]
Spring 2010 Student Presentations  
Recovery-Oriented Computing - Peter Scott PDF
ZFS - a RAID based file system - Henri Bai
2-dimensional error coding - Long Chen PDF
Software based fault detection - Tim Prince PPT
Self Recovery of Server Programs - Chesta Dwivedi PPT
Dynamic Fault Trees - Ashok Aditya PPT
Device Failure Tolerance Using Software - Haribabu Narayanan PPT
FPGA Fault Tolerance - Matt Clausman PPT
Byzantine Storage - Debkanta Chakraborty PPT
Spring 2009 Student Presentations  
Fault-Tolerant Internet Services -- Indranil Roy PPT
Checkpoint Recovery in Petaflop systems -- Paul Jennings PPT
Highly Available Systems - Case Study -- Cory Kleinheksel PPT
Fault-Tolerant TCP Server -- Preethika K. PPT
Fault-Tolerant CORBA (NVP implementation) -- Indranil Roy PPT
Fault-Tolerant Multipath Routing - Ganesan Mani PPT
Petri Net modeling - Phased Mission - Siddharth Sridhar PPT
Spring 2007 Student Presentations  
Energy-aware scheduling Weakly-hard real-time systems (Julie Rursch) PPT
Fault-Tolerance in Multiprocessor SoC (Premkumar) PPT
Fountain Codes (Long Long) PPT
Network Time Protocol (Lizandro) PPT
RAID architectures (Russell Graves) PDF
Spring 2006 Student Presentations  
Architecture fault-tolerance (Viswanathan) PPT
Advanced Quorum protocols (Kamna) PPT
Fault-tolerant objects (Bebek) PPT
Hierarchical system-level Diagnosis (Qin Wen) PPT
 Checkpointing in mobile systems (Ben) PPT
Dependability and Security (Srdjan) PPT
 Decidability and Schedulability -- Timed Automata  () PPT
System-level diagnosis in adhoc networks (Kavitha) PPT1, PPT2

NOTE: You can print the handout slides from Microsoft Powerpoint.

1 comment: