Predictably Dependable Computing Systems
Description
- Project Title:
- Predictably Dependable Computing Systems
- Acronym:
- PDCS 2
- Number:
- 6362
- Work Area:
- Distributed Systems, Reliability & Dependability
- Coordinator:
- University of Newcastle-upon-Tyne
Computing Laboratory
Claremont Tower, Claremont Road
UK - NEWCASTLE-UPON-TYNE NE1 7RU
- Coordinator Country:
- UK
- Partners
- Technische Universität Wien A
CNRS - LAAS F
CNR - IMU I
Chalmers Tekniska Hogskola S
University of York UK
City University UK
- Contact Point:
- Prof. B. Randell
- Telephone:
- +44/91 222 7923
- Fax:
- +44/91 222 8232
- E-Mail:
- Brian.randall@newcastle
- Keywords:
- dependability, safety, security, prediction, fault tolerance, real-time systems, distributed systems
- Start Date:
- 1 August 92
- Duration:
- 36 months
- Status:
- running
- Abstract:
- PDCS2 aims to build on, and take significantly further, the work of ESPRIT Basic Research Action 3092 (Predictably Dependable Computing Systems), on the problems of making the process of designing and constructing adequately dependable computing systems much more predictable and cost-effective than at present. In particular it will address the problems of producing dependable distributed real-time systems and especially those where the dependability requirements centre on issues of safety and/or security. The planned programme of research concerns a number of carefully selected topics in fault prevention, fault tolerance, fault removal and fault forecasting. The work to be done ranges in nature from theoretical to experimental; in a number of cases it involves the acquisition or implementation, in prototype form, of software tools, and their experimental interconnection.
AIMS
The PDCS2 Project aims to build on, and take significantly further, the work of ESPRIT Basic Research Action 3092 (Predictably Dependable Computing Systems), on the problems of making the process of designing and constructing adequately dependable computing systems much more predictable and cost-effective than at present. In particular it will address the problems of producing dependable distributed real-time systems and especially those where the dependability requirements centre on issues of safety and/or security.
APPROACH AND METHODS
The problems of predicting and achieving specific levels of dependability involve all aspects of systems and system specification, design and construction. Despite this, the project has of necessity to be extremely selective regarding the problems to concentrate on, and the planned programme of research concerns a small number of carefully selected topics in fault prevention, fault tolerance, fault removal and fault forecasting, as follows:
Fault Prevention: techniques for eliciting and stating dependability requirements, and means of timeliness analysis in order to enable the building of systems with known maximum execution times.
Fault Tolerance: (i) strategies for designing systems whose performance is maximised within given reliability constraints, and design notations for expressing fault tolerance provisions and timing issues, (ii) the principles of design environments for fault-tolerant systems, and (iii) further development of the fragmentation-redundancy-scattering technique.
Fault Removal: investigation of two complementary methods of generating test inputs, one deterministic, the other probabilistic.
Fault Forecasting: (i) reliability and availability modelling, (ii) means of evaluation for ultra-high dependability (iii) analytical techniques and methods of reducing the state space storage requirements of Markov and semi-Markov modelling tools (iv) improved methods of coverage evaluation using both physical and simulated fault injection, and (v) modelling the operational security of a system in its environment.
The set of sub-tasks which make up these four main tasks range in nature from theoretical to experimental. In several cases they involve the acquisition or implementation, in prototype form, of software tools. We aim to investigate the experimental interconnection of some of these tools, using the inter-tool messaging techniques that have been developed in the MARS Design System (MARDS), as a first step towards the ultimate long term objective of a design support environment which is well-populated with tools and ready-made system components, and which fully supports the notion of predictably dependable design of large distributed real-time computing systems.
PROGRESS AND RESULTS
With regard to fault prevention, we are developing (i) techniques for eliciting and stating dependability (and in particular security and safety) requirements in a form that is consistent with a subsequent validation procedure, and (ii) timeliness analysis in order to enable the building of systems with known maximum execution times, including implementation of suitable hardware and operating system bases, and study of time-critical applications running on these bases.
In the area of fault tolerance, work includes (i) the development of strategies for designing systems whose performance is maximised within given reliability constraints, design notations for expressing fault tolerance provisions and timing issues, and the formal description of fault-tolerant designs, (ii) the investigation of the principles of design environments for fault-tolerant systems, and (iii) further development of the fragmentation-redundancy-scattering technique for tolerating both accidental and intentional faults in two non-exclusive directions: generalisation via an object-oriented model, and application to high performance networks.
Research related to fault removal is investigating two complementary methods of generating functional test inputs, ie deterministic (based on formal specifications) and probabilistic, with both methods being investigated.
The principal objectives of our work on fault forecasting have been (i) to extend further our work on reliability and availability modelling, (ii) to develop analytical techniques and methods of reducing the state space storage requirements of Markov and semi-Markov modelling tools aimed at extending the range of complexity of systems whose dependability can be accurately evaluated, (iii) to develop improved methods of coverage evaluation using both physical and simulated fault injection (at circuit and system level), and (iv) to develop further our approach to modelling the operational security of a system in its environment, and to conduct intrusion experiments aimed at providing relevant data for such modelling exercises.
POTENTIAL
The work on dependability requirements elicitation could lead to impovements in system specification techniques; that on fault tolerance is aimed at faciliating tradeoffs between system dependability and performance, and achieving combined reliability and security. The work on reliability and availability modelling could already form the basis for industrial exploitation and is at a point where future development would benefit greatly from the provision of data from industry, whereas that on security modelling is very exploratory in nature, and is aimed at establishing whether such modelling can be made practicable. The work on timeliness analysis could be of use as a further data point for the assessment of the potentials and limitations of predictable hardware and operating system behaviour, whilst that on fault injection has promise of providing improvements to the design and evaluation of a systems' provisions for fault tolerance.
LATEST PUBLICATIONS
- Shrivastava S K, Mancini L V and Randell B The Duality of Fault-Tolerant System Structures Software Practice and Experience, To appear 1993
- Brocklehurst S and Littlewood B New Ways to get Accurate Reliability Measures IEEE Software on Applications of Software Reliability Models (Special Issue) July 1992
- Littlewood B, Strigini L Validation of Ultra-High Dependability for Software-based Systems Comm ACM, November 1993 (to appear)
- Laprie J C and Kanoun K X-ware Reliability and Availability Modelling IEEE Transactions on Software Engineering (1992)
- Powell D, Martins E, Arlat J, Crouzet Y Estimators for fault tolerance coverage evaluation In: Proc. 23rd Int. Symp. on Fault-Tolerant Computing (FTCS-23), IEEE, Toulouse, France, June 1993
INFORMATION DISSEMINATION ACTIVITIES
The first PDCS2 Open Workshop will take place in Toulouse, September 1993. Over 100 people are expected to attend and external speakers from both industry and non-European Universities will address the attendees.
The project's first year Deliverable Report, comprising technical papers charting the progress of PDCS2 are available from: Louise Heery (PDCS2 Administrative Co-ordinator), Dept. Computing Science, University of Newcastle upon Tyne, Newcastle upon Tyne, NE1 7RU, UK.

Sven Müßig, last update 07-nov-1995. Your feedback is welcome.