fbpx
Wikipedia

Redundancy (engineering)

In engineering and systems theory, redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system, usually in the form of a backup or fail-safe, or to improve actual system performance, such as in the case of GNSS receivers, or multi-threaded computer processing.

Common redundant power supply
Redundant subsystem "B"
Extensively redundant rear lighting installation on a Thai tour bus

In many safety-critical systems, such as fly-by-wire and hydraulic systems in aircraft, some parts of the control system may be triplicated,[1] which is formally termed triple modular redundancy (TMR). An error in one component may then be out-voted by the other two. In a triply redundant system, the system has three sub components, all three of which must fail before the system fails. Since each one rarely fails, and the sub components are designed to preclude common failure modes (which can then be modelled as independent failure), the probability of all three failing is calculated to be extraordinarily small; it is often outweighed by other risk factors, such as human error. Electrical surges arising from lightning strikes are an example of a failure mode which is difficult to fully isolate, unless the components are powered from independent power busses and have no direct electrical pathway in their interconnect (communication by some means is required for voting). Redundancy may also be known by the terms "majority voting systems"[2] or "voting logic".[3]

A suspension bridge's numerous cables are a form of redundancy.

Redundancy sometimes produces less, instead of greater reliability – it creates a more complex system which is prone to various issues, it may lead to human neglect of duty, and may lead to higher production demands which by overstressing the system may make it less safe.[4]

Redundancy is one form of robustness as practiced in computer science.

Geographic redundancy has become important in the data center industry, to safeguard data against natural disasters and political instability (see below).

Forms of redundancy edit

In computer science, there are four major forms of redundancy:[5]

A modified form of software redundancy, applied to hardware may be:

  • Distinct functional redundancy, such as both mechanical and hydraulic braking in a car. Applied in the case of software, code written independently and distinctly different but producing the same results for the same inputs.

Structures are usually designed with redundant parts as well, ensuring that if one part fails, the entire structure will not collapse. A structure without redundancy is called fracture-critical, meaning that a single broken component can cause the collapse of the entire structure. Bridges that failed due to lack of redundancy include the Silver Bridge and the Interstate 5 bridge over the Skagit River.

Parallel and combined systems demonstrate different level of redundancy. The models are subject of studies in reliability and safety engineering.[6]

Dissimilar redundancy edit

Unlike traditional redundancy, which uses more than one of the same thing, dissimilar redundancy uses different things. The idea is that the different things are unlikely to contain identical flaws. The voting method may involve additional complexity if the two things take different amounts of time. Dissimilar redundancy is often used with software, because identical software contains identical flaws.

The chance of failure is reduced by using at least two different types of each of the following

  • processors,
  • operating systems,
  • software,
  • sensors,
  • types of actuators (electric, hydraulic, pneumatic, manual mechanical, etc.)
  • communications protocols,
  • communications hardware,
  • communications networks,
  • communications paths[7][8][9]

Geographic redundancy edit

Geographic redundancy corrects the vulnerabilities of redundant devices deployed by geographically separating backup devices. Geographic redundancy reduces the likelihood of events such as power outages, floods, HVAC failures, lightning strikes, tornadoes, building fires, wildfires, and mass shootings would disable the system.

Geographic redundancy locations can be

  • more than 621 miles (999 km) continental,[10]
  • more than 62 miles apart and less than 93 miles (150 km) apart,[10]
  • less than 62 miles apart, but not on the same campus, or
  • different buildings that are more than 300 feet (91 m) apart on the same campus.

The following methods can reduce the risks of damage by a fire conflagration:

  • large buildings at least 80 feet (24 m) to 110 feet (34 m) apart, but sometimes a minimum of 210 feet (64 m) apart.[11][12]: 9 
  • high-rise buildings at least 82 feet (25 m) apart[12]: 12 [13]
  • open spaces clear of flammable vegetation within 200 feet (61 m) on each side of objects[14]
  • different wings on the same building, in rooms that are separated by more than 300 feet (91 m)
  • different floors on the same wing of a building in rooms that are horizontally offset by a minimum of 70 feet (21 m) with fire walls between the rooms that are on different floors
  • two rooms separated by another room, leaving at least a 70-foot gap between the two rooms
  • there should be a minimum of two separated fire walls and on opposite sides of a corridor[10]

Geographic redundancy is used by Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure, Netflix, Dropbox, Salesforce, LinkedIn, PayPal, Twitter, Facebook, Apple iCloud, Cisco Meraki, and many others to provide geographic redundancy, high availability, fault tolerance and to ensure availability and reliability for their cloud services.[15]

As another example, to minimize risk of damage from severe windstorms or water damage, buildings can be located at least 2 miles (3.2 km) away from the shore, with an elevation of at least 5 feet (1.5 m) above sea level. For additional protection, they can be located at least 100 feet (30 m) away from flood plain areas.[16][17]

Functions of redundancy edit

The two functions of redundancy are passive redundancy and active redundancy. Both functions prevent performance decline from exceeding specification limits without human intervention using extra capacity.

Passive redundancy uses excess capacity to reduce the impact of component failures. One common form of passive redundancy is the extra strength of cabling and struts used in bridges. This extra strength allows some structural components to fail without bridge collapse. The extra strength used in the design is called the margin of safety.

Eyes and ears provide working examples of passive redundancy. Vision loss in one eye does not cause blindness but depth perception is impaired. Hearing loss in one ear does not cause deafness but directionality is lost. Performance decline is commonly associated with passive redundancy when a limited number of failures occur.

Active redundancy eliminates performance declines by monitoring the performance of individual devices, and this monitoring is used in voting logic. The voting logic is linked to switching that automatically reconfigures the components. Error detection and correction and the Global Positioning System (GPS) are two examples of active redundancy.

Electrical power distribution provides an example of active redundancy. Several power lines connect each generation facility with customers. Each power line includes monitors that detect overload. Each power line also includes circuit breakers. The combination of power lines provides excess capacity. Circuit breakers disconnect a power line when monitors detect an overload. Power is redistributed across the remaining lines.[citation needed] At the Toronto Airport, there are 4 redundant electrical lines. Each of the 4 lines supply enough power for the entire airport. A Spot network substation uses reverse current relays to open breakers to lines that fail, but lets power continue to flow the airport.

Electrical power systems use power scheduling to reconfigure active redundancy. Computing systems adjust the production output of each generating facility when other generating facilities are suddenly lost. This prevents blackout conditions during major events such as an earthquake.

Disadvantages edit

Charles Perrow, author of Normal Accidents, has said that sometimes redundancies backfire and produce less, not more reliability. This may happen in three ways: First, redundant safety devices result in a more complex system, more prone to errors and accidents. Second, redundancy may lead to shirking of responsibility among workers. Third, redundancy may lead to increased production pressures, resulting in a system that operates at higher speeds, but less safely.[4]

Voting logic edit

Voting logic uses performance monitoring to determine how to reconfigure individual components so that operation continues without violating specification limitations of the overall system. Voting logic often involves computers, but systems composed of items other than computers may be reconfigured using voting logic. Circuit breakers are an example of a form of non-computer voting logic.

The simplest voting logic in computing systems involves two components: primary and alternate. They both run similar software, but the output from the alternate remains inactive during normal operation. The primary monitors itself and periodically sends an activity message to the alternate as long as everything is OK. All outputs from the primary stop, including the activity message, when the primary detects a fault. The alternate activates its output and takes over from the primary after a brief delay when the activity message ceases. Errors in voting logic can cause both outputs to be active or inactive at the same time, or cause outputs to flutter on and off.

A more reliable form of voting logic involves an odd number of three devices or more. All perform identical functions and the outputs are compared by the voting logic. The voting logic establishes a majority when there is a disagreement, and the majority will act to deactivate the output from other device(s) that disagree. A single fault will not interrupt normal operation. This technique is used with avionics systems, such as those responsible for operation of the Space Shuttle.

Calculating the probability of system failure edit

Each duplicate component added to the system decreases the probability of system failure according to the formula:-

 

where:

  •   – number of components
  •   – probability of component i failing
  •   – the probability of all components failing (system failure)

This formula assumes independence of failure events. That means that the probability of a component B failing given that a component A has already failed is the same as that of B failing when A has not failed. There are situations where this is unreasonable, such as using two power supplies connected to the same socket in such a way that if one power supply failed, the other would too.

It also assumes that only one component is needed to keep the system running.

See also edit

References edit

  1. ^ Redundancy Management Technique for Space Shuttle Computers (PDF), IBM Research
  2. ^ R. Jayapal (2003-12-04). . elecdesign.com. Archived from the original on 2007-03-03. Retrieved 2014-06-01.
  3. ^ "The Aerospace Corporation | Assuring Space Mission Success". Aero.org. 2014-05-20. Retrieved 2014-06-01.
  4. ^ a b Scott D. Sagan (March 2004). (PDF). Organization & Environment. Archived from the original (PDF) on 2004-07-14.
  5. ^ Koren, Israel; Krishna, C. Mani (2007). Fault-Tolerant Systems. San Francisco, CA: Morgan Kaufmann. p. 3. ISBN 978-0-12-088525-1.
  6. ^ [1] Smithsonian Institution | Office of Safety, Health, and Environmental Management | Fire Protection and Life Safety Design ManualIndependent Sources | Facilities with a maximum possible fire loss exceeding $ 50 million must have two independent sources of fire protection water.
  7. ^ [2] Why Dissimilar Redundant Architectures Are a Necessity for DAL A | Curtis Wright Defense Systems ]
  8. ^ [3] Fire Alarm Circuits | A Class X circuit will continue to work with a single open or a single short-circuit by use of a redundant path.
  9. ^ [4] Protecting against the power of lightning | to protect against induced surges rather than direct lightning strikes. Feb 1st, 2005 Twisted pair
  10. ^ a b c [5] Data Center Site Redundancy | H. M. Brotherton and J. Eric Dietz | Computer Information Technology, Purdue University
  11. ^ [6] Factory Mutual Insurance Company | 1-20 Protection Against Exterior Fire Exposure
  12. ^ a b [7] National Research Council | Canada | Division Of Building Research | Spatial Separation Of Buildlngs | November 1959
  13. ^ [8] Tall Building Design Guidelines | City of Toronto | March 2013 | Page 52 | the separation distance between towers on the same site of 25 meters or more
  14. ^ [9] Protecting Residences From Wildfires | by Howard E. Moore (General Technical Report PSW-50) | page 30, item 10.
  15. ^ [10] On-Premises Cloud Is a Failure. Google Has the Fix | Elias Khnaser | 05/17/2023
  16. ^ https://www.archives.gov/files/records-mgmt/storage-standards-toolkit/file3.pdf Facility Standards for Records Storage Facilities
  17. ^ https://www.archives.gov/preservation/storage/presidential-library-standards.html Standards for Permanent Records Storage and Presidential Libraries

External links edit

  • Secure Propulsion using Advanced Redundant Control
  • Using powerline as a redundant communication channel
  • Flammini, Francesco; Marrone, Stefano; Mazzocca, Nicola; Vittorini, Valeria (2009). "A new modeling approach to the safety evaluation of N-modular redundant computer systems in presence of imperfect maintenance". Reliability Engineering & System Safety. 94 (9): 1422–1432. arXiv:1304.6656. doi:10.1016/j.ress.2009.02.014. S2CID 6932645.

redundancy, engineering, other, uses, redundancy, engineering, systems, theory, redundancy, intentional, duplication, critical, components, functions, system, with, goal, increasing, reliability, system, usually, form, backup, fail, safe, improve, actual, syst. For other uses see Redundancy In engineering and systems theory redundancy is the intentional duplication of critical components or functions of a system with the goal of increasing reliability of the system usually in the form of a backup or fail safe or to improve actual system performance such as in the case of GNSS receivers or multi threaded computer processing Common redundant power supply Redundant subsystem B Extensively redundant rear lighting installation on a Thai tour bus In many safety critical systems such as fly by wire and hydraulic systems in aircraft some parts of the control system may be triplicated 1 which is formally termed triple modular redundancy TMR An error in one component may then be out voted by the other two In a triply redundant system the system has three sub components all three of which must fail before the system fails Since each one rarely fails and the sub components are designed to preclude common failure modes which can then be modelled as independent failure the probability of all three failing is calculated to be extraordinarily small it is often outweighed by other risk factors such as human error Electrical surges arising from lightning strikes are an example of a failure mode which is difficult to fully isolate unless the components are powered from independent power busses and have no direct electrical pathway in their interconnect communication by some means is required for voting Redundancy may also be known by the terms majority voting systems 2 or voting logic 3 A suspension bridge s numerous cables are a form of redundancy Redundancy sometimes produces less instead of greater reliability it creates a more complex system which is prone to various issues it may lead to human neglect of duty and may lead to higher production demands which by overstressing the system may make it less safe 4 Redundancy is one form of robustness as practiced in computer science Geographic redundancy has become important in the data center industry to safeguard data against natural disasters and political instability see below Contents 1 Forms of redundancy 1 1 Dissimilar redundancy 1 2 Geographic redundancy 2 Functions of redundancy 3 Disadvantages 4 Voting logic 5 Calculating the probability of system failure 6 See also 7 References 8 External linksForms of redundancy editIn computer science there are four major forms of redundancy 5 Hardware redundancy such as dual modular redundancy and triple modular redundancy Information redundancy such as error detection and correction methods Time redundancy performing the same operation multiple times such as multiple executions of a program or multiple copies of data transmitted Software redundancy such as N version programming A modified form of software redundancy applied to hardware may be Distinct functional redundancy such as both mechanical and hydraulic braking in a car Applied in the case of software code written independently and distinctly different but producing the same results for the same inputs Structures are usually designed with redundant parts as well ensuring that if one part fails the entire structure will not collapse A structure without redundancy is called fracture critical meaning that a single broken component can cause the collapse of the entire structure Bridges that failed due to lack of redundancy include the Silver Bridge and the Interstate 5 bridge over the Skagit River Parallel and combined systems demonstrate different level of redundancy The models are subject of studies in reliability and safety engineering 6 Dissimilar redundancy edit Unlike traditional redundancy which uses more than one of the same thing dissimilar redundancy uses different things The idea is that the different things are unlikely to contain identical flaws The voting method may involve additional complexity if the two things take different amounts of time Dissimilar redundancy is often used with software because identical software contains identical flaws The chance of failure is reduced by using at least two different types of each of the following processors operating systems software sensors types of actuators electric hydraulic pneumatic manual mechanical etc communications protocols communications hardware communications networks communications paths 7 8 9 Geographic redundancy edit Geographic redundancy corrects the vulnerabilities of redundant devices deployed by geographically separating backup devices Geographic redundancy reduces the likelihood of events such as power outages floods HVAC failures lightning strikes tornadoes building fires wildfires and mass shootings would disable the system Geographic redundancy locations can be more than 621 miles 999 km continental 10 more than 62 miles apart and less than 93 miles 150 km apart 10 less than 62 miles apart but not on the same campus or different buildings that are more than 300 feet 91 m apart on the same campus The following methods can reduce the risks of damage by a fire conflagration large buildings at least 80 feet 24 m to 110 feet 34 m apart but sometimes a minimum of 210 feet 64 m apart 11 12 9 high rise buildings at least 82 feet 25 m apart 12 12 13 open spaces clear of flammable vegetation within 200 feet 61 m on each side of objects 14 different wings on the same building in rooms that are separated by more than 300 feet 91 m different floors on the same wing of a building in rooms that are horizontally offset by a minimum of 70 feet 21 m with fire walls between the rooms that are on different floors two rooms separated by another room leaving at least a 70 foot gap between the two rooms there should be a minimum of two separated fire walls and on opposite sides of a corridor 10 Geographic redundancy is used by Amazon Web Services AWS Google Cloud Platform GCP Microsoft Azure Netflix Dropbox Salesforce LinkedIn PayPal Twitter Facebook Apple iCloud Cisco Meraki and many others to provide geographic redundancy high availability fault tolerance and to ensure availability and reliability for their cloud services 15 As another example to minimize risk of damage from severe windstorms or water damage buildings can be located at least 2 miles 3 2 km away from the shore with an elevation of at least 5 feet 1 5 m above sea level For additional protection they can be located at least 100 feet 30 m away from flood plain areas 16 17 Functions of redundancy editThe two functions of redundancy are passive redundancy and active redundancy Both functions prevent performance decline from exceeding specification limits without human intervention using extra capacity Passive redundancy uses excess capacity to reduce the impact of component failures One common form of passive redundancy is the extra strength of cabling and struts used in bridges This extra strength allows some structural components to fail without bridge collapse The extra strength used in the design is called the margin of safety Eyes and ears provide working examples of passive redundancy Vision loss in one eye does not cause blindness but depth perception is impaired Hearing loss in one ear does not cause deafness but directionality is lost Performance decline is commonly associated with passive redundancy when a limited number of failures occur Active redundancy eliminates performance declines by monitoring the performance of individual devices and this monitoring is used in voting logic The voting logic is linked to switching that automatically reconfigures the components Error detection and correction and the Global Positioning System GPS are two examples of active redundancy Electrical power distribution provides an example of active redundancy Several power lines connect each generation facility with customers Each power line includes monitors that detect overload Each power line also includes circuit breakers The combination of power lines provides excess capacity Circuit breakers disconnect a power line when monitors detect an overload Power is redistributed across the remaining lines citation needed At the Toronto Airport there are 4 redundant electrical lines Each of the 4 lines supply enough power for the entire airport A Spot network substation uses reverse current relays to open breakers to lines that fail but lets power continue to flow the airport Electrical power systems use power scheduling to reconfigure active redundancy Computing systems adjust the production output of each generating facility when other generating facilities are suddenly lost This prevents blackout conditions during major events such as an earthquake Disadvantages editCharles Perrow author of Normal Accidents has said that sometimes redundancies backfire and produce less not more reliability This may happen in three ways First redundant safety devices result in a more complex system more prone to errors and accidents Second redundancy may lead to shirking of responsibility among workers Third redundancy may lead to increased production pressures resulting in a system that operates at higher speeds but less safely 4 Voting logic editVoting logic uses performance monitoring to determine how to reconfigure individual components so that operation continues without violating specification limitations of the overall system Voting logic often involves computers but systems composed of items other than computers may be reconfigured using voting logic Circuit breakers are an example of a form of non computer voting logic The simplest voting logic in computing systems involves two components primary and alternate They both run similar software but the output from the alternate remains inactive during normal operation The primary monitors itself and periodically sends an activity message to the alternate as long as everything is OK All outputs from the primary stop including the activity message when the primary detects a fault The alternate activates its output and takes over from the primary after a brief delay when the activity message ceases Errors in voting logic can cause both outputs to be active or inactive at the same time or cause outputs to flutter on and off A more reliable form of voting logic involves an odd number of three devices or more All perform identical functions and the outputs are compared by the voting logic The voting logic establishes a majority when there is a disagreement and the majority will act to deactivate the output from other device s that disagree A single fault will not interrupt normal operation This technique is used with avionics systems such as those responsible for operation of the Space Shuttle Calculating the probability of system failure editEach duplicate component added to the system decreases the probability of system failure according to the formula p i 1 n p i displaystyle p prod i 1 n p i nbsp where n displaystyle n nbsp number of components p i displaystyle p i nbsp probability of component i failing p displaystyle p nbsp the probability of all components failing system failure This formula assumes independence of failure events That means that the probability of a component B failing given that a component A has already failed is the same as that of B failing when A has not failed There are situations where this is unreasonable such as using two power supplies connected to the same socket in such a way that if one power supply failed the other would too It also assumes that only one component is needed to keep the system running See also editAir gap networking Network security measure Common cause and special cause statistics Statistics concept Data redundancy presence of data additional to the actual data that may permit correction of errors in stored or transmitted dataPages displaying wikidata descriptions as a fallback Double switching using a multipole switch to close or open two sides of a circuitPages displaying wikidata descriptions as a fallback Fault tolerance Resilience of systems to component failures or errors Radiation hardening Processes and techniques used for making electronic devices resistant to ionizing radiation Factor of safety System strength beyond intended load Reliability engineering Sub discipline of systems engineering that emphasizes dependability Reliability theory of aging and longevity Biophysics theory Safety engineering Engineering discipline which assures that engineered systems provide acceptable levels of safety Reliability computer networking Protocol acknowledgement capability MTBF Predicted elapsed time between inherent failures of a system during operationPages displaying short descriptions of redirect targets N 1 redundancy Form of resilience with independent backup components fault tolerant computer system Resilience of systems to component failures or errorsPages displaying short descriptions of redirect targets ZFS File system Byzantine fault Fault in a computer system that presents different symptoms to different observers Byzantine Paxos Family of protocols for solving consensus Quantum Byzantine agreement Quantum version of the Byzantine agreement protocol Two Generals Problem Thought experiment Degeneracy Process in biologyReferences edit Redundancy Management Technique for Space Shuttle Computers PDF IBM Research R Jayapal 2003 12 04 Analog Voting Circuit Is More Flexible Than Its Digital Version elecdesign com Archived from the original on 2007 03 03 Retrieved 2014 06 01 The Aerospace Corporation Assuring Space Mission Success Aero org 2014 05 20 Retrieved 2014 06 01 a b Scott D Sagan March 2004 Learning from Normal Accidents PDF Organization amp Environment Archived from the original PDF on 2004 07 14 Koren Israel Krishna C Mani 2007 Fault Tolerant Systems San Francisco CA Morgan Kaufmann p 3 ISBN 978 0 12 088525 1 1 Smithsonian Institution Office of Safety Health and Environmental Management Fire Protection and Life Safety Design ManualIndependent Sources Facilities with a maximum possible fire loss exceeding 50 million must have two independent sources of fire protection water 2 Why Dissimilar Redundant Architectures Are a Necessity for DAL A Curtis Wright Defense Systems 3 Fire Alarm Circuits A Class X circuit will continue to work with a single open or a single short circuit by use of a redundant path 4 Protecting against the power of lightning to protect against induced surges rather than direct lightning strikes Feb 1st 2005 Twisted pair a b c 5 Data Center Site Redundancy H M Brotherton and J Eric Dietz Computer Information Technology Purdue University 6 Factory Mutual Insurance Company 1 20 Protection Against Exterior Fire Exposure a b 7 National Research Council Canada Division Of Building Research Spatial Separation Of Buildlngs November 1959 8 Tall Building Design Guidelines City of Toronto March 2013 Page 52 the separation distance between towers on the same site of 25 meters or more 9 Protecting Residences From Wildfires by Howard E Moore General Technical Report PSW 50 page 30 item 10 10 On Premises Cloud Is a Failure Google Has the Fix Elias Khnaser 05 17 2023 https www archives gov files records mgmt storage standards toolkit file3 pdf Facility Standards for Records Storage Facilities https www archives gov preservation storage presidential library standards html Standards for Permanent Records Storage and Presidential LibrariesExternal links editSecure Propulsion using Advanced Redundant Control Using powerline as a redundant communication channel Flammini Francesco Marrone Stefano Mazzocca Nicola Vittorini Valeria 2009 A new modeling approach to the safety evaluation of N modular redundant computer systems in presence of imperfect maintenance Reliability Engineering amp System Safety 94 9 1422 1432 arXiv 1304 6656 doi 10 1016 j ress 2009 02 014 S2CID 6932645 Retrieved from https en wikipedia org w index php title Redundancy engineering amp oldid 1188859818, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.