18-749: Fault-Tolerant Distributed Systems
Prof. Priya Narasimhan
|
Since the only real way to appreciate dependability issues is to experience them first-hand, a substantial portion of the course content will involve a cooperative team software system implementation project. The project requires the design, implementation, empirical evaluation and end-to-end analysis of a real-time fault-tolerant high-performance distributed middleware application. The lectures, along with regular project meetings with the instructor, will allow students to design and implement realistic middleware applications, to develop working infrastructures to make these applications dependable, and to analyze the effectiveness of their techniques. From this course, students can expect to learn (i) the individual and the combined aspects of performance and fault tolerance, (ii) the basics of middleware, (iii) tools and techniques for analyzing dependability, and (iv) strengths and weaknesses of current distributed technologies from the respective viewpoints of real-time, fault-tolerance and scalability.
12 units (3 hours lecture + 1 hour project meeting per week)
PREREQUISITES
(1) Solid knowledge of C++ and/or Java (if you have a knowledge of Java,
you will need to pick a Java-based project in this course; if you have
a knowledge of C++, you will need to pick a C++-based project in this course).
(2) Understanding of basic operating systems concepts
MEETING TIMES
WEDNESDAY and FRIDAY,
10.30am - 12.20pm, PH A18A
PLUS: project meeting times TBD
PREVIOUS OFFERINGS OF THIS COURSE
18-749 in Spring 2005
18-846/17-654 in Spring 2004
18-846/17-654 in Spring 2003
18-841/17-654 in Spring 2002
INSTRUCTOR
![]() |
Prof. Priya Narasimhan, Assistant Professor of ECE and CS, has 10 years of experience, and over 50 publications, in the field of fault-tolerant distributed systems. Apart from her significant contributions to the Fault-Tolerant CORBA standard, she has real-world experience as the CTO and Vice-President of Engineering of a start-up company building embedded fault-tolerance products. Her current research focuses on fault-tolerant and survivable distributed middleware systems, both in the enterprise and embedded domains. |