Dictionary Definition
dependability n : the trait of being dependable
or reliable [syn: dependableness, reliability, reliableness] [ant:
undependability,
undependability,
undependability,
undependability]
User Contributed Dictionary
Extensive Definition
Dependability is a value showing the reliability
of a person to others because of his/her integrity, truthfulness,
and trustfulness, traits that can encourage someone to depend on
him/her.
The wider use of this noun is in Systems
engineering.
Dependability as applied to a computer system is
defined by the IFIP 10.4 Working Group on Dependable Computing and
Fault Tolerance as:
- "[..] the trustworthiness of a computing system which allows reliance to be justifiably placed on the service it delivers [..]"
- "dependability (is) the collective term used to describe the availability performance and its influencing factors : reliability performance, maintainability performance and maintenance support performance"
This concept can be further extended to encompass
mechanisms to increase and maintain the Dependability of a system .
Dependability can be thought of as being composed of three
elements:
- Attributes - A way to assess the Dependability of a system
- Threats - An understanding of the things that can affect the Dependability of a system
- Means - Ways to increase the Dependability of a system
History
The field of Dependability grew out of previous
related fields such as fault tolerance and system reliability in
the 1960s. As interest in these fields increased during the 1970s
and early part of the 1980s the term reliability began to be
become overloaded and was being used outside of it's originally
intended definition, as a measurement of failures in a system, to
encompass more diverse measures which would now come under other
classifications such as safety, integrity, etc. Jean-Claude Laprie
thus coined the term Dependability
to encompass these related disciplines in the early 1980.
The field of Dependability has evolved from these
beginnings to be an internationally active field of research. This
research is fostered by a number of prominent international
conferences, notably the International Conference on
Dependable Systems and Networks, the International Symposium on
Reliable Dependable Systems and the International Symposium on
Fault-Tolerant Computing.
The original definition of dependability for a
computing system gathers the following attributes or non-functional
requirements:
- Availability: readiness for correct service
- Reliability: continuity of correct service
- Maintainability: to undergo modifications and repairs
and combines them with the concepts of Threats
and Failures to create Dependability.
Elements of dependability
Attributes
Attributes are qualities of a system. These can
be assessed to determine its overall dependability using Qualitative or
Quantitative
measures. Avizienis et al define the following Dependability
Attributes:
- Availability - readiness for correct service
- Reliability - continuity of correct service
- Safety - absence of catastrophic consequences on the user(s) and the environment
- Integrity - absence of improper system alteration
- Maintainability - ability to undergo modifications and repairs
As these definitions suggested, only Availability
and Reliability are quantifiable by direct measurements whilst
others are more subjective. For instance Safety cannot be measured
directly via metrics but is a subjective assessment that requires
judgmental information to be applied to give a level of confidence,
whilst Reliability can be measured as failures over time.
Confidentiality,
i.e. the absence of unauthorized disclosure of information is also
used when addressing security. Security is a composite of Confidentiality,
Integrity, and
Availability.
Security is sometimes classed as an attribute but the current view
is to aggregate it together with dependability and treat
Dependability as a composite term called Dependability and
Security.
Practically, applying security measures to the
appliances of a system generally improves the dependability by
limiting the number of externally-originated errors.
Threats
Threats are things that can affect a system and
cause a drop in Dependability. There are three main terms that must
be clearly understood:
- Fault: A fault (which is usually referred to as a bug for historic reasons) is a defect in a system. The presence of a fault in a system may or may not lead to a failure, for instance although a system may contain a fault its input and state conditions may never cause this fault to be executed so that an error occurs and thus never exhibits as a failure.
- Error: An error is a discrepancy between the intended behaviour of a system and its actual behaviour inside the system boundary. Errors occur at runtime when some part of the system enters an unexpected state due to the activation of a fault. Since errors are generated from invalid states they are hard to observe without special mechanisms, such as debuggers or debug output to logs.
- Failure: A failure is an instance in time when a system displays behaviour that is contrary to its specification. An error may not necessarily cause a failure, for instance an exception may be thrown by a system but this may be caught and handled using fault tolerance techniques so the overall operation of the system will conform to the specification.
It is important to note that Failures are
recorded at the system boundary. They are basically Errors that
have propagated to the system boundary and have become observable.
Faults, Errors and Failures operate according to a mechanism. This
mechanism is sometimes known as a Fault-Error-Failure chain. As a
general rule a fault, when activated, can lead to an error (which
is an invalid state) and the invalid state generated by an error
may lead to another error or a failure (which is an observable
deviation from the specified behaviour at the system
boundary).
Once a fault is activated an error is created. An
error may act in the same way as a fault in that it can create
further error conditions, therefore an error may propagate multiple
times within a system boundary without causing an observable
failure. If an error propagates outside the system boundary a
failure is said to occur. A failure is basically the point at which
it can be said that a service is failing to meet its specification.
Since the output data from one service may be feed into another, a
failure in one service may propagate into another service as a
fault so a chain can be formed of the form: Fault leading to Error
leading to Failure leading to Error, etc.
Means
Since the mechanism of a Fault-Error-Chain is
understood it is possible to construct means to break these chains
and thereby increase the dependability of a system. Four means have
been identified so far:
- Prevention
- Removal
- Forecasting
- Tolerance
Fault Prevention deals with preventing faults
being incorporated into a system. This can be accomplished by use
of development methodologies and good implementation
techniques.
Fault Removal can be sub-divided into two
sub-categories:
- Removal During Development
- Removal During Use
Removal during development requires verification
so that faults can be detected and removed before a system is put
into production. Once systems have been put into production a
system is needed to record failures and remove them via a
maintenance cycle.
Fault Forecasting predicts likely faults so that
they can be removed or their effects can be circumvented.
Fault
Tolerance deals with putting mechanisms in place that will
allow a system to still deliver the required service in the
presence of faults, although that service may be at a degraded
level.
Dependability means are intended to reduce the
number of failures presented to the user of a system. Failures are
traditionally recorded over time and it is useful to understand how
their frequency is measured so that the effectiveness of means can
be assessed.
Dependability of information systems and survivability
Recent works, such upon dependability take
benefit of structured information
systems, e.g. with
SOA, to introduce a more efficient ability, the survivability, thus taking
into account the degraded services that an Information System
sustains or resumes after a non-maskable failure.
The flexibility of current frameworks encourage
system architect to enable reconfiguration mechanisms that refocus
the available, safe resources to support the most critical services
rather that over-provisioning to build failure-proof system.
With the generalisation of networked information
systems, accessibility was
introduced to give greater importance to users' experience.
To take into account the level of performance,
the measurement of performability is defined
as "quantifying how well the object system performs in the presence
of faults over a specified period of time" .
See also
References
Further Reading
Papers
- Wilfredo Torres-Pomales: Software Fault Tolerance: A Tutorial — a very good tutorial, read it after you have read the previous document, 2002
- Stefano Porcarelli, Marco Castaldi, Felicita Di Giandomenico, Andrea Bondavalli, Paola Inverardi An Approach to Manage Reconfiguration in Fault-Tolerant Distributed Systems
Books
- J.C. Laprie, Dependability: Basic Concepts and Terminology Springer-Verlag, 1992. ISBN 0387822968
Research projects
- DESEREC, DEpendability and Security by Enhanced REConfigurability, FP6/IST integrated project 2006-2008
- ESFORS, European security Forum for Web Services, Software, and Systems, FP6/IST coordination action
- HIDENETS HIghly DEpendable ip-based NETworks and Services, FP6/IST targeted project 2006-2008
- RESIST FP6/IST Network of Excellence 2006-2007
- RODIN Rigorous Open Development Environment for Complex Systems FP6/IST targeted project 2004-2007
- SERENITY System Engineering for Security and Dependability, FP6/IST integrated project 2006-2008
- Willow Survivability Architecture, and STILT, System for Terrorism Intervention and Large-scale Teamwork 2002-2004
dependability in French: Sûreté de
fonctionnement
dependability in Italian:
dependability