fbpx
Wikipedia

Byzantine fault

A Byzantine fault (also Byzantine generals problem, interactive consistency, source congruency, error avalanche, Byzantine agreement problem, and Byzantine failure[1]) is a condition of a computer system, particularly distributed computing systems, where components may fail and there is imperfect information on whether a component has failed. The term takes its name from an allegory, the "Byzantine generals problem",[2] developed to describe a situation in which, in order to avoid catastrophic failure of the system, the system's actors must agree on a concerted strategy, but some of these actors are unreliable.

In a Byzantine fault, a component such as a server can inconsistently appear both failed and functioning to failure-detection systems, presenting different symptoms to different observers. It is difficult for the other components to declare it failed and shut it out of the network, because they need to first reach a consensus regarding which component has failed in the first place. Byzantine fault tolerance (BFT) is the resiliency of a fault-tolerant computer system to such conditions.

Analogy

 
If all generals attack in coordination, the battle is won (left). If two generals falsely declare that they intend to attack, but instead retreat, the battle is lost (right).

As an analogy of the fault's simplest form, consider a number of generals who are attacking a fortress. The generals must decide as a group whether to attack or retreat; some may prefer to attack, while others prefer to retreat. The important thing is that all generals agree on a common decision, for a halfhearted attack by a few generals would become a rout, and would be worse than either a coordinated attack or a coordinated retreat.

The problem is complicated by the presence of treacherous generals who may not only cast a vote for a suboptimal strategy, they may do so selectively. For instance, if nine generals are voting, four of whom support attacking while four others are in favor of retreat, the ninth general may send a vote of retreat to those generals in favor of retreat, and a vote of attack to the rest. Those who received a retreat vote from the ninth general will retreat, while the rest will attack (which may not go well for the attackers). The problem is complicated further by the generals being physically separated and having to send their votes via messengers who may fail to deliver votes or may forge false votes.

Resolution

Byzantine fault tolerance can be achieved if the loyal (non-faulty) generals have a majority agreement on their strategy. There can be a default vote value given to missing messages. For example, missing messages can be given a "null" value. Further, if the agreement is that the null votes are in the majority, a pre-assigned default strategy can be used (e.g. retreat).[3]

The typical mapping of this story onto computer systems is that the computers are the generals and their digital communication system links are the messengers. Although the problem is formulated in the analogy as a decision-making and security problem, in electronics, it cannot be solved by cryptographic digital signatures alone, because failures such as incorrect voltages can propagate through the encryption process. Thus, a component may appear functioning to one component and faulty to another, which prevents forming a consensus as to whether the component is faulty or not.

Characteristics

A Byzantine fault is any fault presenting different symptoms to different observers.[4] A Byzantine failure is the loss of a system service due to a Byzantine fault in systems that require consensus among distributed nodes.[5]

The objective of Byzantine fault tolerance is to be able to defend against failures of system components with or without symptoms that prevent other components of the system from reaching an agreement among themselves, where such an agreement is needed for the correct operation of the system.

The remaining operationally correct components of a Byzantine fault tolerant system will be able to continue providing the system's service as originally intended, assuming there are a sufficient number of accurately-operating components to maintain the service.

Byzantine failures are considered the most general and most difficult class of failures among the failure modes. The so-called fail-stop failure mode occupies the simplest end of the spectrum. Whereas fail-stop failure mode simply means that the only way to fail is a node crash, detected by other nodes, Byzantine failures imply no restrictions, which means that the failed node can generate arbitrary data, including data that makes it appear like a functioning node. Thus, Byzantine failures can confuse failure detection systems, which makes fault tolerance difficult. Despite the analogy, a Byzantine failure is not necessarily a security problem involving hostile human interference: it can arise purely from electrical or software faults.

The terms fault and failure are used here according to the standard definitions[6] originally created by a joint committee on "Fundamental Concepts and Terminology" formed by the IEEE Computer Society's Technical Committee on Dependable Computing and Fault-Tolerance and IFIP Working Group 10.4 on Dependable Computing and Fault Tolerance.[7] See also dependability.

Caveat

Byzantine fault tolerance is only concerned with broadcast consistency, that is, the property that when one component broadcasts a single consistent value to other components (i.e., sends the same value to the other components), they all receive exactly the same value, or in the case that the broadcaster is not consistent, the other components agree on a common value. This kind of fault tolerance does not encompass the correctness of the value itself; for example, an adversarial component that deliberately sends an incorrect value, but sends that same value consistently to all components, will not be caught in the Byzantine fault tolerance scheme.

Formal definition

Setting:[8] Given a system of n components, t of which are dishonest, and assuming only point-to-point channels between all the components.

Whenever a component A tries to broadcast a value x, the other components are allowed to discuss with each other and verify the consistency of A's broadcast, and eventually settle on a common value y.

Property: The system is said to resist Byzantine faults if a component A can broadcast a value x, and then:

  1. If A is honest, then all honest components agree on the value x.
  2. In any case, all honest components agree on the same value y.

Variants: The problem has been studied in the case of both synchronous and asynchronous communications.

The communication graph above is assumed to be the complete graph (i.e. each component can discuss with every other), but the communication graph can be restricted.

It can also be relaxed in a more "realistic" problem where the faulty components do not collude together in an attempt to lure the others into error. It is in this setting that practical algorithms have been devised.

History

The problem of obtaining Byzantine consensus was conceived and formalized by Robert Shostak, who dubbed it the interactive consistency problem. This work was done in 1978 in the context of the NASA-sponsored SIFT[9] project in the Computer Science Lab at SRI International. SIFT (for Software Implemented Fault Tolerance) was the brain child of John Wensley, and was based on the idea of using multiple general-purpose computers that would communicate through pairwise messaging in order to reach a consensus, even if some of the computers were faulty.

At the beginning of the project, it was not clear how many computers in total were needed to guarantee that a conspiracy of n faulty computers could not "thwart" the efforts of the correctly-operating ones to reach consensus. Shostak showed that a minimum of 3n+1 are needed, and devised a two-round 3n+1 messaging protocol that would work for n=1. His colleague Marshall Pease generalized the algorithm for any n > 0, proving that 3n+1 is both necessary and sufficient. These results, together with a later proof by Leslie Lamport of the sufficiency of 3n using digital signatures, were published in the seminal paper, Reaching Agreement in the Presence of Faults.[10] The authors were awarded the 2005 Edsger W. Dijkstra Prize for this paper.

To make the interactive consistency problem easier to understand, Lamport devised a colorful allegory in which a group of army generals formulate a plan for attacking a city. In its original version, the story cast the generals as commanders of the Albanian army. The name was changed, eventually settling on "Byzantine", at the suggestion of Jack Goldberg to future-proof any potential offense giving.[11] This formulation of the problem, together with some additional results, were presented by the same authors in their 1982 paper, "The Byzantine Generals Problem".[3]

Examples

Several examples of Byzantine failures that have occurred are given in two equivalent journal papers.[4][5] These and other examples are described on the NASA DASHlink web pages.[12]

Byzantine errors were observed infrequently and at irregular points during endurance testing for the newly constructed Virginia class submarines, at least through 2005 (when the issues were publicly reported).[13]

Implementations

One example of BFT in use is Bitcoin, a peer-to-peer digital cash system.[14] The Bitcoin network works in parallel to generate a blockchain with proof-of-work allowing the system to overcome Byzantine failures and reach a coherent global view of the system's state. Some proof of stake blockchains also use BFT algorithms.[15]

Some aircraft systems, such as the Boeing 777 Aircraft Information Management System (via its ARINC 659 SAFEbus network), the Boeing 777 flight control system, and the Boeing 787 flight control systems use Byzantine fault tolerance; because these are real-time systems, their Byzantine fault tolerance solutions must have very low latency. For example, SAFEbus can achieve Byzantine fault tolerance within the order of a microsecond of added latency.[16][17][18] The SpaceX Dragon considers Byzantine fault tolerance in its design.[19]

Byzantine fault tolerance mechanisms use components that repeat an incoming message (or just its signature) to other recipients of that incoming message. All these mechanisms make the assumption that the act of repeating a message blocks the propagation of Byzantine symptoms. For systems that have a high degree of safety or security criticality, these assumptions must be proven to be true to an acceptable level of fault coverage. When providing proof through testing, one difficulty is creating a sufficiently wide range of signals with Byzantine symptoms.[20] Such testing likely will require specialized fault injectors.[21][22]

Solutions

Several early solutions were described by Lamport, Shostak, and Pease in 1982.[3] They began by noting that the Generals' Problem can be reduced to solving a "Commander and Lieutenants" problem where loyal Lieutenants must all act in unison and that their action must correspond to what the Commander ordered in the case that the Commander is loyal:

  • One solution considers scenarios in which messages may be forged, but which will be Byzantine-fault-tolerant as long as the number of disloyal generals is less than one third of the generals. The impossibility of dealing with one-third or more traitors ultimately reduces to proving that the one Commander and two Lieutenants problem cannot be solved, if the Commander is traitorous. To see this, suppose we have a traitorous Commander A, and two Lieutenants, B and C: when A tells B to attack and C to retreat, and B and C send messages to each other, forwarding A's message, neither B nor C can figure out who is the traitor, since it is not necessarily A—the other Lieutenant could have forged the message purportedly from A. It can be shown that if n is the number of generals in total, and t is the number of traitors in that n, then there are solutions to the problem only when n > 3t and the communication is synchronous (bounded delay).[23]
  • A second solution requires unforgeable message signatures. For security-critical systems, digital signatures (in modern computer systems, this may be achieved in practice using public-key cryptography) can provide Byzantine fault tolerance in the presence of an arbitrary number of traitorous generals. However, for safety-critical systems (where "security" addresses intelligent threats while "safety" addresses the inherent dangers of an activity or mission), simple error detecting codes, such as CRCs, provide weaker but often sufficient coverage at a much lower cost. This is true for both Byzantine and non-Byzantine faults. Furthermore, sometimes security measures weaken safety and vice versa. Thus, cryptographic digital signature methods are not a good choice for safety-critical systems, unless there is also a specific security threat as well.[24] While error detecting codes, such as CRCs, are better than cryptographic techniques, neither provide adequate coverage for active electronics in safety-critical systems. This is illustrated by the Schrödinger CRC scenario where a CRC-protected message with a single Byzantine faulty bit presents different data to different observers and each observer sees a valid CRC.[4][5]
  • Also presented is a variation on the first two solutions allowing Byzantine-fault-tolerant behavior in some situations where not all generals can communicate directly with each other.

Several system architectures were designed c. 1980 that implemented Byzantine fault tolerance. These include: Draper's FTMP,[25] Honeywell's MMFCS,[26] and SRI's SIFT.[9]

Advanced solutions

In 1999, Miguel Castro and Barbara Liskov introduced the "Practical Byzantine Fault Tolerance" (PBFT) algorithm,[27] which provides high-performance Byzantine state machine replication, processing thousands of requests per second with sub-millisecond increases in latency.

After PBFT, several BFT protocols were introduced to improve its robustness and performance. For instance, Q/U,[28] HQ,[29] Zyzzyva,[30] and ABsTRACTs,[31] addressed the performance and cost issues; whereas other protocols, like Aardvark[32] and RBFT,[33] addressed its robustness issues. Furthermore, Adapt[34] tried to make use of existing BFT protocols, through switching between them in an adaptive way, to improve system robustness and performance as the underlying conditions change. Furthermore, BFT protocols were introduced that leverage trusted components to reduce the number of replicas, e.g., A2M-PBFT-EA[35] and MinBFT.[36]

See also

References

  1. ^ Kirrmann, Hubert (n.d.). (PDF). Switzerland: ABB Research Center. p. 94. Archived from the original (PDF) on 2014-03-26. Retrieved 2015-03-02.
  2. ^ Lamport, L.; Shostak, R.; Pease, M. (1982). "The Byzantine Generals Problem" (PDF). ACM Transactions on Programming Languages and Systems. 4 (3): 382–401. CiteSeerX 10.1.1.64.2312. doi:10.1145/357172.357176. S2CID 55899582. (PDF) from the original on 13 June 2018.
  3. ^ a b c Lamport, L.; Shostak, R.; Pease, M. (1982). (PDF). ACM Transactions on Programming Languages and Systems. 4 (3): 387–389. CiteSeerX 10.1.1.64.2312. doi:10.1145/357172.357176. S2CID 55899582. Archived from the original (PDF) on 7 February 2017.
  4. ^ a b c Driscoll, K.; Hall, B.; Paulitsch, M.; Zumsteg, P.; Sivencrona, H. (2004). "The Real Byzantine Generals". The 23rd Digital Avionics Systems Conference (IEEE Cat. No.04CH37576). pp. 6.D.4–61–11. doi:10.1109/DASC.2004.1390734. ISBN 978-0-7803-8539-9. S2CID 15549497.
  5. ^ a b c Driscoll, Kevin; Hall, Brendan; Sivencrona, Håkan; Zumsteg, Phil (2003). "Byzantine Fault Tolerance, from Theory to Reality". Computer Safety, Reliability, and Security. Lecture Notes in Computer Science. Vol. 2788. pp. 235–248. doi:10.1007/978-3-540-39878-3_19. ISBN 978-3-540-20126-7. ISSN 0302-9743. S2CID 12690337.
  6. ^ Avizienis, A.; Laprie, J.-C.; Randell, Brian; Landwehr, C. (2004). "Basic concepts and taxonomy of dependable and secure computing". IEEE Transactions on Dependable and Secure Computing. 1 (1): 11–33. doi:10.1109/TDSC.2004.2. hdl:1903/6459. ISSN 1545-5971. S2CID 215753451.
  7. ^ "Dependable Computing and Fault Tolerance". from the original on 2015-04-02. Retrieved 2015-03-02.
  8. ^ Matthias Fitzi (2002). "Generalized Communication and Security Models in Byzantine Agreement" (PDF). ETH Zurich.
  9. ^ a b "SIFT: design and analysis of a fault-tolerant computer for aircraft control". Microelectronics Reliability. 19 (3): 190. 1979. doi:10.1016/0026-2714(79)90211-7. ISSN 0026-2714.
  10. ^ Pease, Marshall; Shostak, Robert; Lamport, Leslie (April 1980). "Reaching Agreement in the Presence of Faults". Journal of the Association for Computing Machinery. 27 (2): 228–234. CiteSeerX 10.1.1.68.4044. doi:10.1145/322186.322188. S2CID 6429068.
  11. ^ Lamport, Leslie (2016-12-19). "The Byzantine Generals Problem". ACM Transactions on Programming Languages and Systems. SRI International. Retrieved 18 March 2019.
  12. ^ Driscoll, Kevin (2012-12-11). "Real System Failures". DASHlink. NASA. from the original on 2015-04-02. Retrieved 2015-03-02.
  13. ^ Walter, C.; Ellis, P.; LaValley, B. (2005). "The Reliable Platform Service: A Property-Based Fault Tolerant Service Architecture". Ninth IEEE International Symposium on High-Assurance Systems Engineering (HASE'05). pp. 34–43. doi:10.1109/HASE.2005.23. ISBN 978-0-7695-2377-4. S2CID 21468069.
  14. ^ "Bitcoin - Open source P2P money". bitcoin.org. Retrieved 2019-08-18.
  15. ^ Deirmentzoglou, Papakyriakopoulos & Patsakis 2019, p. 28716.
  16. ^ M., Paulitsch; Driscoll, K. (9 January 2015). "Chapter 48:SAFEbus". In Zurawski, Richard (ed.). Industrial Communication Technology Handbook, Second Edition. CRC Press. pp. 48–1–48–26. ISBN 978-1-4822-0733-0.
  17. ^ Thomas A. Henzinger; Christoph M. Kirsch (26 September 2001). Embedded Software: First International Workshop, EMSOFT 2001, Tahoe City, CA, USA, October 8-10, 2001. Proceedings (PDF). Springer Science & Business Media. pp. 307–. ISBN 978-3-540-42673-8. (PDF) from the original on 2015-09-22. Retrieved 2015-03-05.
  18. ^ Yeh, Y.C. (2001). "Safety critical avionics for the 777 primary flight controls system". 20th DASC. 20th Digital Avionics Systems Conference (Cat. No.01CH37219). Vol. 1. pp. 1C2/1–1C2/11. doi:10.1109/DASC.2001.963311. ISBN 978-0-7803-7034-0. S2CID 61489128.
  19. ^ "ELC: SpaceX lessons learned [LWN.net]". from the original on 2016-08-05. Retrieved 2016-07-21.
  20. ^ Nanya, T.; Goosen, H.A. (1989). "The Byzantine hardware fault model". IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems. 8 (11): 1226–1231. doi:10.1109/43.41508. ISSN 0278-0070.
  21. ^ Martins, Rolando; Gandhi, Rajeev; Narasimhan, Priya; Pertet, Soila; Casimiro, António; Kreutz, Diego; Veríssimo, Paulo (2013). "Experiences with Fault-Injection in a Byzantine Fault-Tolerant Protocol". Middleware 2013. Lecture Notes in Computer Science. Vol. 8275. pp. 41–61. doi:10.1007/978-3-642-45065-5_3. ISBN 978-3-642-45064-8. ISSN 0302-9743. S2CID 31337539.
  22. ^ US patent 7475318, Kevin R. Driscoll, "Method for testing the sensitive input range of Byzantine filters", issued 2009-01-06, assigned to Honeywell International Inc. 
  23. ^ Feldman, P.; Micali, S. (1997). "An optimal probabilistic protocol for synchronous Byzantine agreement" (PDF). SIAM J. Comput. 26 (4): 873–933. doi:10.1137/s0097539790187084. (PDF) from the original on 2016-03-05. Retrieved 2012-06-14.
  24. ^ Paulitsch, M.; Morris, J.; Hall, B.; Driscoll, K.; Latronico, E.; Koopman, P. (2005). "Coverage and the Use of Cyclic Redundancy Codes in Ultra-Dependable Systems". 2005 International Conference on Dependable Systems and Networks (DSN'05). pp. 346–355. doi:10.1109/DSN.2005.31. ISBN 978-0-7695-2282-1. S2CID 14096385.
  25. ^ Hopkins, Albert L.; Lala, Jaynarayan H.; Smith, T. Basil (1987). "The Evolution of Fault Tolerant Computing at the Charles Stark Draper Laboratory, 1955–85". The Evolution of Fault-Tolerant Computing. Dependable Computing and Fault-Tolerant Systems. Vol. 1. pp. 121–140. doi:10.1007/978-3-7091-8871-2_6. ISBN 978-3-7091-8873-6. ISSN 0932-5581.
  26. ^ Driscoll, Kevin; Papadopoulos, Gregory; Nelson, Scott; Hartmann, Gary; Ramohalli, Gautham (1984), Multi-Microprocessor Flight Control System (Technical Report), Wright-Patterson Air Force Base, OH: AFWAL/FIGL U.S. Air Force Systems Command, AFWAL-TR-84-3076
  27. ^ Castro, M.; Liskov, B. (2002). "Practical Byzantine Fault Tolerance and Proactive Recovery". ACM Transactions on Computer Systems. Association for Computing Machinery. 20 (4): 398–461. CiteSeerX 10.1.1.127.6130. doi:10.1145/571637.571640. S2CID 18793794.
  28. ^ Abd-El-Malek, M.; Ganger, G.; Goodson, G.; Reiter, M.; Wylie, J. (2005). "Fault-scalable Byzantine Fault-Tolerant Services". ACM SIGOPS Operating Systems Review. Association for Computing Machinery. 39 (5): 59. doi:10.1145/1095809.1095817.
  29. ^ Cowling, James; Myers, Daniel; Liskov, Barbara; Rodrigues, Rodrigo; Shrira, Liuba (2006). HQ Replication: A Hybrid Quorum Protocol for Byzantine Fault Tolerance. Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. pp. 177–190. ISBN 1-931971-47-1.
  30. ^ Kotla, Ramakrishna; Alvisi, Lorenzo; Dahlin, Mike; Clement, Allen; Wong, Edmund (December 2009). "Zyzzyva: Speculative Byzantine Fault Tolerance". ACM Transactions on Computer Systems. Association for Computing Machinery. 27 (4): 1–39. doi:10.1145/1658357.1658358.
  31. ^ Guerraoui, Rachid; Kneževic, Nikola; Vukolic, Marko; Quéma, Vivien (2010). The Next 700 BFT Protocols. Proceedings of the 5th European conference on Computer systems. EuroSys. from the original on 2011-10-02. Retrieved 2011-10-04.
  32. ^ Clement, A.; Wong, E.; Alvisi, L.; Dahlin, M.; Marchetti, M. (April 22–24, 2009). Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults (PDF). Symposium on Networked Systems Design and Implementation. USENIX. (PDF) from the original on 2010-12-25. Retrieved 2010-02-17.
  33. ^ Aublin, P.-L.; Ben Mokhtar, S.; Quéma, V. (July 8–11, 2013). . 33rd IEEE International Conference on Distributed Computing Systems. International Conference on Distributed Computing Systems. Archived from the original on August 5, 2013.
  34. ^ Bahsoun, J. P.; Guerraoui, R.; Shoker, A. (2015-05-01). "Making BFT Protocols Really Adaptive". Parallel and Distributed Processing Symposium (IPDPS), 2015 IEEE International: 904–913. doi:10.1109/IPDPS.2015.21. ISBN 978-1-4799-8649-1. S2CID 16310807.
  35. ^ Chun, Byung-Gon; Maniatis, Petros; Shenker, Scott; Kubiatowicz, John (2007-01-01). "Attested Append-only Memory: Making Adversaries Stick to Their Word". Proceedings of Twenty-First ACM SIGOPS Symposium on Operating Systems Principles. SOSP '07. New York, NY, USA: ACM: 189–204. doi:10.1145/1294261.1294280. ISBN 9781595935915. S2CID 6685352.
  36. ^ Veronese, G. S.; Correia, M.; Bessani, A. N.; Lung, L. C.; Verissimo, P. (2013-01-01). "Efficient Byzantine Fault-Tolerance". IEEE Transactions on Computers. 62 (1): 16–30. CiteSeerX 10.1.1.408.9972. doi:10.1109/TC.2011.221. ISSN 0018-9340. S2CID 8157723.

Sources

  • Deirmentzoglou, Evangelos; Papakyriakopoulos, Georgios; Patsakis, Constantinos (2019). "A Survey on Long-Range Attacks for Proof of Stake Protocols". IEEE Access. 7: 28712–28725. doi:10.1109/ACCESS.2019.2901858. eISSN 2169-3536. S2CID 84185792.

External links

    byzantine, fault, byzantine, generals, redirects, here, military, generals, byzantine, empire, category, byzantine, generals, political, structure, complexity, byzantinism, also, byzantine, generals, problem, interactive, consistency, source, congruency, error. Byzantine generals redirects here For military generals of the Byzantine empire see Category Byzantine generals For the political structure and complexity see Byzantinism A Byzantine fault also Byzantine generals problem interactive consistency source congruency error avalanche Byzantine agreement problem and Byzantine failure 1 is a condition of a computer system particularly distributed computing systems where components may fail and there is imperfect information on whether a component has failed The term takes its name from an allegory the Byzantine generals problem 2 developed to describe a situation in which in order to avoid catastrophic failure of the system the system s actors must agree on a concerted strategy but some of these actors are unreliable In a Byzantine fault a component such as a server can inconsistently appear both failed and functioning to failure detection systems presenting different symptoms to different observers It is difficult for the other components to declare it failed and shut it out of the network because they need to first reach a consensus regarding which component has failed in the first place Byzantine fault tolerance BFT is the resiliency of a fault tolerant computer system to such conditions Contents 1 Analogy 1 1 Resolution 2 Characteristics 2 1 Caveat 3 Formal definition 4 History 5 Examples 5 1 Implementations 6 Solutions 6 1 Advanced solutions 7 See also 8 References 9 Sources 10 External linksAnalogy Edit If all generals attack in coordination the battle is won left If two generals falsely declare that they intend to attack but instead retreat the battle is lost right As an analogy of the fault s simplest form consider a number of generals who are attacking a fortress The generals must decide as a group whether to attack or retreat some may prefer to attack while others prefer to retreat The important thing is that all generals agree on a common decision for a halfhearted attack by a few generals would become a rout and would be worse than either a coordinated attack or a coordinated retreat The problem is complicated by the presence of treacherous generals who may not only cast a vote for a suboptimal strategy they may do so selectively For instance if nine generals are voting four of whom support attacking while four others are in favor of retreat the ninth general may send a vote of retreat to those generals in favor of retreat and a vote of attack to the rest Those who received a retreat vote from the ninth general will retreat while the rest will attack which may not go well for the attackers The problem is complicated further by the generals being physically separated and having to send their votes via messengers who may fail to deliver votes or may forge false votes Resolution Edit Byzantine fault tolerance can be achieved if the loyal non faulty generals have a majority agreement on their strategy There can be a default vote value given to missing messages For example missing messages can be given a null value Further if the agreement is that the null votes are in the majority a pre assigned default strategy can be used e g retreat 3 The typical mapping of this story onto computer systems is that the computers are the generals and their digital communication system links are the messengers Although the problem is formulated in the analogy as a decision making and security problem in electronics it cannot be solved by cryptographic digital signatures alone because failures such as incorrect voltages can propagate through the encryption process Thus a component may appear functioning to one component and faulty to another which prevents forming a consensus as to whether the component is faulty or not Characteristics EditA Byzantine fault is any fault presenting different symptoms to different observers 4 A Byzantine failure is the loss of a system service due to a Byzantine fault in systems that require consensus among distributed nodes 5 The objective of Byzantine fault tolerance is to be able to defend against failures of system components with or without symptoms that prevent other components of the system from reaching an agreement among themselves where such an agreement is needed for the correct operation of the system The remaining operationally correct components of a Byzantine fault tolerant system will be able to continue providing the system s service as originally intended assuming there are a sufficient number of accurately operating components to maintain the service Byzantine failures are considered the most general and most difficult class of failures among the failure modes The so called fail stop failure mode occupies the simplest end of the spectrum Whereas fail stop failure mode simply means that the only way to fail is a node crash detected by other nodes Byzantine failures imply no restrictions which means that the failed node can generate arbitrary data including data that makes it appear like a functioning node Thus Byzantine failures can confuse failure detection systems which makes fault tolerance difficult Despite the analogy a Byzantine failure is not necessarily a security problem involving hostile human interference it can arise purely from electrical or software faults The terms fault and failure are used here according to the standard definitions 6 originally created by a joint committee on Fundamental Concepts and Terminology formed by the IEEE Computer Society s Technical Committee on Dependable Computing and Fault Tolerance and IFIP Working Group 10 4 on Dependable Computing and Fault Tolerance 7 See also dependability Caveat Edit Byzantine fault tolerance is only concerned with broadcast consistency that is the property that when one component broadcasts a single consistent value to other components i e sends the same value to the other components they all receive exactly the same value or in the case that the broadcaster is not consistent the other components agree on a common value This kind of fault tolerance does not encompass the correctness of the value itself for example an adversarial component that deliberately sends an incorrect value but sends that same value consistently to all components will not be caught in the Byzantine fault tolerance scheme Formal definition EditSetting 8 Given a system of n components t of which are dishonest and assuming only point to point channels between all the components Whenever a component A tries to broadcast a value x the other components are allowed to discuss with each other and verify the consistency of A s broadcast and eventually settle on a common value y Property The system is said to resist Byzantine faults if a component A can broadcast a value x and then If A is honest then all honest components agree on the value x In any case all honest components agree on the same value y Variants The problem has been studied in the case of both synchronous and asynchronous communications The communication graph above is assumed to be the complete graph i e each component can discuss with every other but the communication graph can be restricted It can also be relaxed in a more realistic problem where the faulty components do not collude together in an attempt to lure the others into error It is in this setting that practical algorithms have been devised History EditThe problem of obtaining Byzantine consensus was conceived and formalized by Robert Shostak who dubbed it the interactive consistency problem This work was done in 1978 in the context of the NASA sponsored SIFT 9 project in the Computer Science Lab at SRI International SIFT for Software Implemented Fault Tolerance was the brain child of John Wensley and was based on the idea of using multiple general purpose computers that would communicate through pairwise messaging in order to reach a consensus even if some of the computers were faulty At the beginning of the project it was not clear how many computers in total were needed to guarantee that a conspiracy of n faulty computers could not thwart the efforts of the correctly operating ones to reach consensus Shostak showed that a minimum of 3n 1 are needed and devised a two round 3n 1 messaging protocol that would work for n 1 His colleague Marshall Pease generalized the algorithm for any n gt 0 proving that 3n 1 is both necessary and sufficient These results together with a later proof by Leslie Lamport of the sufficiency of 3n using digital signatures were published in the seminal paper Reaching Agreement in the Presence of Faults 10 The authors were awarded the 2005 Edsger W Dijkstra Prize for this paper To make the interactive consistency problem easier to understand Lamport devised a colorful allegory in which a group of army generals formulate a plan for attacking a city In its original version the story cast the generals as commanders of the Albanian army The name was changed eventually settling on Byzantine at the suggestion of Jack Goldberg to future proof any potential offense giving 11 This formulation of the problem together with some additional results were presented by the same authors in their 1982 paper The Byzantine Generals Problem 3 Examples EditSeveral examples of Byzantine failures that have occurred are given in two equivalent journal papers 4 5 These and other examples are described on the NASA DASHlink web pages 12 Byzantine errors were observed infrequently and at irregular points during endurance testing for the newly constructed Virginia class submarines at least through 2005 when the issues were publicly reported 13 Implementations Edit One example of BFT in use is Bitcoin a peer to peer digital cash system 14 The Bitcoin network works in parallel to generate a blockchain with proof of work allowing the system to overcome Byzantine failures and reach a coherent global view of the system s state Some proof of stake blockchains also use BFT algorithms 15 Some aircraft systems such as the Boeing 777 Aircraft Information Management System via its ARINC 659 SAFEbus network the Boeing 777 flight control system and the Boeing 787 flight control systems use Byzantine fault tolerance because these are real time systems their Byzantine fault tolerance solutions must have very low latency For example SAFEbus can achieve Byzantine fault tolerance within the order of a microsecond of added latency 16 17 18 The SpaceX Dragon considers Byzantine fault tolerance in its design 19 Byzantine fault tolerance mechanisms use components that repeat an incoming message or just its signature to other recipients of that incoming message All these mechanisms make the assumption that the act of repeating a message blocks the propagation of Byzantine symptoms For systems that have a high degree of safety or security criticality these assumptions must be proven to be true to an acceptable level of fault coverage When providing proof through testing one difficulty is creating a sufficiently wide range of signals with Byzantine symptoms 20 Such testing likely will require specialized fault injectors 21 22 Solutions EditSeveral early solutions were described by Lamport Shostak and Pease in 1982 3 They began by noting that the Generals Problem can be reduced to solving a Commander and Lieutenants problem where loyal Lieutenants must all act in unison and that their action must correspond to what the Commander ordered in the case that the Commander is loyal One solution considers scenarios in which messages may be forged but which will be Byzantine fault tolerant as long as the number of disloyal generals is less than one third of the generals The impossibility of dealing with one third or more traitors ultimately reduces to proving that the one Commander and two Lieutenants problem cannot be solved if the Commander is traitorous To see this suppose we have a traitorous Commander A and two Lieutenants B and C when A tells B to attack and C to retreat and B and C send messages to each other forwarding A s message neither B nor C can figure out who is the traitor since it is not necessarily A the other Lieutenant could have forged the message purportedly from A It can be shown that if n is the number of generals in total and t is the number of traitors in that n then there are solutions to the problem only when n gt 3t and the communication is synchronous bounded delay 23 A second solution requires unforgeable message signatures For security critical systems digital signatures in modern computer systems this may be achieved in practice using public key cryptography can provide Byzantine fault tolerance in the presence of an arbitrary number of traitorous generals However for safety critical systems where security addresses intelligent threats while safety addresses the inherent dangers of an activity or mission simple error detecting codes such as CRCs provide weaker but often sufficient coverage at a much lower cost This is true for both Byzantine and non Byzantine faults Furthermore sometimes security measures weaken safety and vice versa Thus cryptographic digital signature methods are not a good choice for safety critical systems unless there is also a specific security threat as well 24 While error detecting codes such as CRCs are better than cryptographic techniques neither provide adequate coverage for active electronics in safety critical systems This is illustrated by the Schrodinger CRC scenario where a CRC protected message with a single Byzantine faulty bit presents different data to different observers and each observer sees a valid CRC 4 5 Also presented is a variation on the first two solutions allowing Byzantine fault tolerant behavior in some situations where not all generals can communicate directly with each other Several system architectures were designed c 1980 that implemented Byzantine fault tolerance These include Draper s FTMP 25 Honeywell s MMFCS 26 and SRI s SIFT 9 Advanced solutions Edit In 1999 Miguel Castro and Barbara Liskov introduced the Practical Byzantine Fault Tolerance PBFT algorithm 27 which provides high performance Byzantine state machine replication processing thousands of requests per second with sub millisecond increases in latency After PBFT several BFT protocols were introduced to improve its robustness and performance For instance Q U 28 HQ 29 Zyzzyva 30 and ABsTRACTs 31 addressed the performance and cost issues whereas other protocols like Aardvark 32 and RBFT 33 addressed its robustness issues Furthermore Adapt 34 tried to make use of existing BFT protocols through switching between them in an adaptive way to improve system robustness and performance as the underlying conditions change Furthermore BFT protocols were introduced that leverage trusted components to reduce the number of replicas e g A2M PBFT EA 35 and MinBFT 36 See also EditAtomic commit Brooks Iyengar algorithm List of terms relating to algorithms and data structures Byzantine Paxos Quantum Byzantine agreement Two Generals Problem Conflict free replicated data typeReferences Edit Kirrmann Hubert n d Fault Tolerant Computing in Industrial Automation PDF Switzerland ABB Research Center p 94 Archived from the original PDF on 2014 03 26 Retrieved 2015 03 02 Lamport L Shostak R Pease M 1982 The Byzantine Generals Problem PDF ACM Transactions on Programming Languages and Systems 4 3 382 401 CiteSeerX 10 1 1 64 2312 doi 10 1145 357172 357176 S2CID 55899582 Archived PDF from the original on 13 June 2018 a b c Lamport L Shostak R Pease M 1982 The Byzantine Generals Problem PDF ACM Transactions on Programming Languages and Systems 4 3 387 389 CiteSeerX 10 1 1 64 2312 doi 10 1145 357172 357176 S2CID 55899582 Archived from the original PDF on 7 February 2017 a b c Driscoll K Hall B Paulitsch M Zumsteg P Sivencrona H 2004 The Real Byzantine Generals The 23rd Digital Avionics Systems Conference IEEE Cat No 04CH37576 pp 6 D 4 61 11 doi 10 1109 DASC 2004 1390734 ISBN 978 0 7803 8539 9 S2CID 15549497 a b c Driscoll Kevin Hall Brendan Sivencrona Hakan Zumsteg Phil 2003 Byzantine Fault Tolerance from Theory to Reality Computer Safety Reliability and Security Lecture Notes in Computer Science Vol 2788 pp 235 248 doi 10 1007 978 3 540 39878 3 19 ISBN 978 3 540 20126 7 ISSN 0302 9743 S2CID 12690337 Avizienis A Laprie J C Randell Brian Landwehr C 2004 Basic concepts and taxonomy of dependable and secure computing IEEE Transactions on Dependable and Secure Computing 1 1 11 33 doi 10 1109 TDSC 2004 2 hdl 1903 6459 ISSN 1545 5971 S2CID 215753451 Dependable Computing and Fault Tolerance Archived from the original on 2015 04 02 Retrieved 2015 03 02 Matthias Fitzi 2002 Generalized Communication and Security Models in Byzantine Agreement PDF ETH Zurich a b SIFT design and analysis of a fault tolerant computer for aircraft control Microelectronics Reliability 19 3 190 1979 doi 10 1016 0026 2714 79 90211 7 ISSN 0026 2714 Pease Marshall Shostak Robert Lamport Leslie April 1980 Reaching Agreement in the Presence of Faults Journal of the Association for Computing Machinery 27 2 228 234 CiteSeerX 10 1 1 68 4044 doi 10 1145 322186 322188 S2CID 6429068 Lamport Leslie 2016 12 19 The Byzantine Generals Problem ACM Transactions on Programming Languages and Systems SRI International Retrieved 18 March 2019 Driscoll Kevin 2012 12 11 Real System Failures DASHlink NASA Archived from the original on 2015 04 02 Retrieved 2015 03 02 Walter C Ellis P LaValley B 2005 The Reliable Platform Service A Property Based Fault Tolerant Service Architecture Ninth IEEE International Symposium on High Assurance Systems Engineering HASE 05 pp 34 43 doi 10 1109 HASE 2005 23 ISBN 978 0 7695 2377 4 S2CID 21468069 Bitcoin Open source P2P money bitcoin org Retrieved 2019 08 18 Deirmentzoglou Papakyriakopoulos amp Patsakis 2019 p 28716 M Paulitsch Driscoll K 9 January 2015 Chapter 48 SAFEbus In Zurawski Richard ed Industrial Communication Technology Handbook Second Edition CRC Press pp 48 1 48 26 ISBN 978 1 4822 0733 0 Thomas A Henzinger Christoph M Kirsch 26 September 2001 Embedded Software First International Workshop EMSOFT 2001 Tahoe City CA USA October 8 10 2001 Proceedings PDF Springer Science amp Business Media pp 307 ISBN 978 3 540 42673 8 Archived PDF from the original on 2015 09 22 Retrieved 2015 03 05 Yeh Y C 2001 Safety critical avionics for the 777 primary flight controls system 20th DASC 20th Digital Avionics Systems Conference Cat No 01CH37219 Vol 1 pp 1C2 1 1C2 11 doi 10 1109 DASC 2001 963311 ISBN 978 0 7803 7034 0 S2CID 61489128 ELC SpaceX lessons learned LWN net Archived from the original on 2016 08 05 Retrieved 2016 07 21 Nanya T Goosen H A 1989 The Byzantine hardware fault model IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems 8 11 1226 1231 doi 10 1109 43 41508 ISSN 0278 0070 Martins Rolando Gandhi Rajeev Narasimhan Priya Pertet Soila Casimiro Antonio Kreutz Diego Verissimo Paulo 2013 Experiences with Fault Injection in a Byzantine Fault Tolerant Protocol Middleware 2013 Lecture Notes in Computer Science Vol 8275 pp 41 61 doi 10 1007 978 3 642 45065 5 3 ISBN 978 3 642 45064 8 ISSN 0302 9743 S2CID 31337539 US patent 7475318 Kevin R Driscoll Method for testing the sensitive input range of Byzantine filters issued 2009 01 06 assigned to Honeywell International Inc Feldman P Micali S 1997 An optimal probabilistic protocol for synchronous Byzantine agreement PDF SIAM J Comput 26 4 873 933 doi 10 1137 s0097539790187084 Archived PDF from the original on 2016 03 05 Retrieved 2012 06 14 Paulitsch M Morris J Hall B Driscoll K Latronico E Koopman P 2005 Coverage and the Use of Cyclic Redundancy Codes in Ultra Dependable Systems 2005 International Conference on Dependable Systems and Networks DSN 05 pp 346 355 doi 10 1109 DSN 2005 31 ISBN 978 0 7695 2282 1 S2CID 14096385 Hopkins Albert L Lala Jaynarayan H Smith T Basil 1987 The Evolution of Fault Tolerant Computing at the Charles Stark Draper Laboratory 1955 85 The Evolution of Fault Tolerant Computing Dependable Computing and Fault Tolerant Systems Vol 1 pp 121 140 doi 10 1007 978 3 7091 8871 2 6 ISBN 978 3 7091 8873 6 ISSN 0932 5581 Driscoll Kevin Papadopoulos Gregory Nelson Scott Hartmann Gary Ramohalli Gautham 1984 Multi Microprocessor Flight Control System Technical Report Wright Patterson Air Force Base OH AFWAL FIGL U S Air Force Systems Command AFWAL TR 84 3076 Castro M Liskov B 2002 Practical Byzantine Fault Tolerance and Proactive Recovery ACM Transactions on Computer Systems Association for Computing Machinery 20 4 398 461 CiteSeerX 10 1 1 127 6130 doi 10 1145 571637 571640 S2CID 18793794 Abd El Malek M Ganger G Goodson G Reiter M Wylie J 2005 Fault scalable Byzantine Fault Tolerant Services ACM SIGOPS Operating Systems Review Association for Computing Machinery 39 5 59 doi 10 1145 1095809 1095817 Cowling James Myers Daniel Liskov Barbara Rodrigues Rodrigo Shrira Liuba 2006 HQ Replication A Hybrid Quorum Protocol for Byzantine Fault Tolerance Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation pp 177 190 ISBN 1 931971 47 1 Kotla Ramakrishna Alvisi Lorenzo Dahlin Mike Clement Allen Wong Edmund December 2009 Zyzzyva Speculative Byzantine Fault Tolerance ACM Transactions on Computer Systems Association for Computing Machinery 27 4 1 39 doi 10 1145 1658357 1658358 Guerraoui Rachid Knezevic Nikola Vukolic Marko Quema Vivien 2010 The Next 700 BFT Protocols Proceedings of the 5th European conference on Computer systems EuroSys Archived from the original on 2011 10 02 Retrieved 2011 10 04 Clement A Wong E Alvisi L Dahlin M Marchetti M April 22 24 2009 Making Byzantine Fault Tolerant Systems Tolerate Byzantine Faults PDF Symposium on Networked Systems Design and Implementation USENIX Archived PDF from the original on 2010 12 25 Retrieved 2010 02 17 Aublin P L Ben Mokhtar S Quema V July 8 11 2013 RBFT Redundant Byzantine Fault Tolerance 33rd IEEE International Conference on Distributed Computing Systems International Conference on Distributed Computing Systems Archived from the original on August 5 2013 Bahsoun J P Guerraoui R Shoker A 2015 05 01 Making BFT Protocols Really Adaptive Parallel and Distributed Processing Symposium IPDPS 2015 IEEE International 904 913 doi 10 1109 IPDPS 2015 21 ISBN 978 1 4799 8649 1 S2CID 16310807 Chun Byung Gon Maniatis Petros Shenker Scott Kubiatowicz John 2007 01 01 Attested Append only Memory Making Adversaries Stick to Their Word Proceedings of Twenty First ACM SIGOPS Symposium on Operating Systems Principles SOSP 07 New York NY USA ACM 189 204 doi 10 1145 1294261 1294280 ISBN 9781595935915 S2CID 6685352 Veronese G S Correia M Bessani A N Lung L C Verissimo P 2013 01 01 Efficient Byzantine Fault Tolerance IEEE Transactions on Computers 62 1 16 30 CiteSeerX 10 1 1 408 9972 doi 10 1109 TC 2011 221 ISSN 0018 9340 S2CID 8157723 Sources EditDeirmentzoglou Evangelos Papakyriakopoulos Georgios Patsakis Constantinos 2019 A Survey on Long Range Attacks for Proof of Stake Protocols IEEE Access 7 28712 28725 doi 10 1109 ACCESS 2019 2901858 eISSN 2169 3536 S2CID 84185792 External links EditByzantine Fault Tolerance in the RKBExplorer Retrieved from https en wikipedia org w index php title Byzantine fault amp oldid 1142263534, wikipedia, wiki, book, books, library,

    article

    , read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.