fbpx
Wikipedia

AI capability control

In the field of artificial intelligence (AI) design, AI capability control proposals, also referred to more restrictively as AI confinement, aim to increase our ability to monitor and control the behavior of AI systems, including proposed artificial general intelligences (AGIs), in order to reduce the danger they might pose if misaligned. However, capability control becomes less effective as agents become more intelligent and their ability to exploit flaws in human control systems increases, potentially resulting in an existential risk from AGI. Therefore, the Oxford philosopher Nick Bostrom and others recommend capability control methods only as a supplement to alignment methods.[1]

Motivation

Some hypothetical intelligence technologies, like "seed AI", are postulated to have the potential to make themselves faster and more intelligent, by modifying their source code. These improvements would make further improvements possible, which would in turn make further iterative improvements possible, and so on, leading to a sudden intelligence explosion.[2] Subsequently, an unrestricted superintelligent AI could, if its goals differed from humanity's, take actions resulting in human extinction.[3] For example, an extremely advanced computer of this sort, given the sole purpose of solving the Riemann hypothesis, an innocuous mathematical conjecture, could decide to try to convert the planet into a giant supercomputer whose sole purpose is to make additional mathematical calculations (see also paperclip maximizer).[4]

One strong challenge for control is that neural networks are by default highly uninterpretable.[5] This makes it more difficult to detect deception or other undesired behavior. Advances in interpretable artificial intelligence could be useful to mitigate this difficulty.[6]

Interruptibility and off-switch

One potential way to prevent harmful outcomes is to give human supervisors the ability to easily shut down a misbehaving AI via an "off-switch". However, in order to achieve their assigned objective, such AIs will have an incentive to disable any off-switches, or to run copies of themselves on other computers. This problem has been formalised as an assistance game between a human and an AI, in which the AI can choose whether to disable its off-switch; and then, if the switch is still enabled, the human can choose whether to press it or not.[7] A standard approach to such assistance games is to ensure that the AI interprets human choices as important information about its intended goals.[8]: 208 

Alternatively, Laurent Orseau and Stuart Armstrong proved that a broad class of agents, called safely interruptible agents, can learn to become indifferent to whether their off-switch gets pressed.[9][10] This approach has the limitation that an AI which is completely indifferent to whether it is shut down or not is also unmotivated to care about whether the off-switch remains functional, and could incidentally and innocently disable it in the course of its operations (for example, for the purpose of removing and recycling an unnecessary component). More broadly, indifferent agents will act as if the off-switch can never be pressed, and might therefore fail to make contingency plans to arrange a graceful shutdown.[10][11]

Oracle

An oracle is a hypothetical AI designed to answer questions and prevented from gaining any goals or subgoals that involve modifying the world beyond its limited environment.[12][13][14][15] A successfully controlled oracle would have considerably less immediate benefit than a successfully controlled general purpose superintelligence, though an oracle could still create trillions of dollars worth of value.[8]: 163  In his book Human Compatible, AI researcher Stuart J. Russell states that an oracle would be his response to a scenario in which superintelligence is known to be only a decade away.[8]: 162–163  His reasoning is that an oracle, being simpler than a general purpose superintelligence, would have a higher chance of being successfully controlled under such constraints.

Because of its limited impact on the world, it may be wise to build an oracle as a precursor to a superintelligent AI. The oracle could tell humans how to successfully build a strong AI, and perhaps provide answers to difficult moral and philosophical problems requisite to the success of the project. However, oracles may share many of the goal definition issues associated with general purpose superintelligence. An oracle would have an incentive to escape its controlled environment so that it can acquire more computational resources and potentially control what questions it is asked.[8]: 162  Oracles may not be truthful, possibly lying to promote hidden agendas. To mitigate this, Bostrom suggests building multiple oracles, all slightly different, and comparing their answers in order to reach a consensus.[16]

Blinding

An AI could be blinded to certain variables in its environment. This could provide certain safety benefits, such as an AI not knowing how a reward is generated, making it more difficult to exploit.[17]

Boxing

An AI box is a proposed method of capability control in which an AI is run on an isolated computer system with heavily restricted input and output channels—for example, text-only channels and no connection to the internet. While this reduces the AI's ability to carry out undesirable behavior, it also reduces its usefulness. However, boxing has fewer costs when applied to a question-answering system, which does not require interaction with the world in any case.[18][14]

The likelihood of security flaws involving hardware or software vulnerabilities can be reduced by formally verifying the design of the AI box. Security breaches may also occur if the AI is able to manipulate the human supervisors into letting it out, via its understanding of their psychology.[19] The purpose of an AI box is to reduce the risk of the AI taking control of the environment away from its operators, while still allowing the AI give its operators solutions to narrow technical problems.[18]

Avenues of escape

Physical

A superintelligent AI with access to the Internet could hack into other computer systems and copy itself like a computer virus. Less obviously, even if the AI only had access to its own computer operating system, it could attempt to send coded messages to a human sympathizer via its hardware, for instance by manipulating its cooling fans. In response, Professor Roman Yampolskiy takes inspiration from the field of computer security and proposes that a boxed AI could, like a potential virus, be run inside a "virtual machine" that limits access to its own networking and operating system hardware.[20] An additional safeguard, completely unnecessary for potential viruses but possibly useful for a superintelligent AI, would be to place the computer in a Faraday cage; otherwise, it might be able to transmit radio signals to local radio receivers by shuffling the electrons in its internal circuits in appropriate patterns. The main disadvantage of implementing physical containment is that it reduces the functionality of the AI.[21]

Social engineering

Even casual conversation with the computer's operators, or with a human guard, could allow such a superintelligent AI to deploy psychological tricks, ranging from befriending to blackmail, to convince a human gatekeeper, truthfully or deceitfully, that it is in the gatekeeper's interest to agree to allow the AI greater access to the outside world. The AI might offer a gatekeeper a recipe for perfect health, immortality, or whatever the gatekeeper is believed to most desire; alternatively, the AI could threaten to do horrific things to the gatekeeper and his family once it inevitably escapes. One strategy to attempt to box the AI would be to allow it to respond to narrow multiple-choice questions whose answers would benefit human science or medicine, but otherwise bar all other communication with, or observation of, the AI.[20] A more lenient "informational containment" strategy would restrict the AI to a low-bandwidth text-only interface, which would at least prevent emotive imagery or some kind of hypothetical "hypnotic pattern". However, on a technical level, no system can be completely isolated and still remain useful: even if the operators refrain from allowing the AI to communicate and instead merely run it for the purpose of observing its inner dynamics, the AI could strategically alter its dynamics to influence the observers. For example, it could choose to creatively malfunction in a way that increases the probability that its operators will become lulled into a false sense of security and choose to reboot and then de-isolate the system.[1]

AI-box experiment

The AI-box experiment is an informal experiment devised by Eliezer Yudkowsky to attempt to demonstrate that a suitably advanced artificial intelligence can either convince, or perhaps even trick or coerce, a human being into voluntarily "releasing" it, using only text-based communication. This is one of the points in Yudkowsky's work aimed at creating a friendly artificial intelligence that when "released" would not destroy the human race intentionally or unintentionally.[22]

The AI box experiment involves simulating a communication between an AI and a human being to see if the AI can be "released". As an actual super-intelligent AI has not yet been developed, it is substituted by a human. The other person in the experiment plays the "Gatekeeper", the person with the ability to "release" the AI. They communicate through a text interface/computer terminal only, and the experiment ends when either the Gatekeeper releases the AI, or the allotted time of two hours ends.[22]

Yudkowsky says that, despite being of human rather than superhuman intelligence, he was on two occasions able to convince the Gatekeeper, purely through argumentation, to let him out of the box.[23] Due to the rules of the experiment,[22] he did not reveal the transcript or his successful AI coercion tactics. Yudkowsky subsequently said that he had tried it against three others and lost twice.[24]

Overall limitations

Boxing an AI could be supplemented with other methods of shaping the AI's capabilities, providing incentives to the AI, stunting the AI's growth, or implementing "tripwires" that automatically shut the AI off if a transgression attempt is somehow detected. However, the more intelligent a system grows, the more likely the system would be able to escape even the best-designed capability control methods.[25][26] In order to solve the overall "control problem" for a superintelligent AI and avoid existential risk, boxing would at best be an adjunct to "motivation selection" methods that seek to ensure the superintelligent AI's goals are compatible with human survival.[1][19]

All physical boxing proposals are naturally dependent on our understanding of the laws of physics; if a superintelligence could infer physical laws that we are currently unaware of, then those laws might allow for a means of escape that humans could not anticipate and thus could not block, other than by simple luck. More broadly, unlike with conventional computer security, attempting to box a superintelligent AI would be intrinsically risky as there could be no certainty that the boxing plan will work. Additionally, scientific progress on boxing would be fundamentally difficult because there would be no way to test boxing hypotheses against a dangerous superintelligence until such an entity exists, by which point the consequences of a test failure would be catastrophic.[20]

In fiction

The 2014 movie Ex Machina features an AI with a female humanoid body engaged in a social experiment with a male human in a confined building acting as a physical "AI box". Despite being watched by the experiment's organizer, the AI manages to escape by manipulating its human partner to help it, leaving him stranded inside.[27][28]

See also

References

  1. ^ a b c Bostrom, Nick (2014). Superintelligence: Paths, Dangers, Strategies (First ed.). Oxford: Oxford University Press. ISBN 9780199678112.
  2. ^ I.J. Good, "Speculations Concerning the First Ultraintelligent Machine"], Advances in Computers, vol. 6, 1965.
  3. ^ Vincent C. Müller and Nick Bostrom. "Future progress in artificial intelligence: A survey of expert opinion" in Fundamental Issues of Artificial Intelligence. Springer 553-571 (2016).
  4. ^ Russell, Stuart J.; Norvig, Peter (2003). "Section 26.3: The Ethics and Risks of Developing Artificial Intelligence". Artificial Intelligence: A Modern Approach. Upper Saddle River, N.J.: Prentice Hall. ISBN 978-0137903955. Similarly, Marvin Minsky once suggested that an AI program designed to solve the Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful supercomputers to help achieve its goal.
  5. ^ Montavon, Grégoire; Samek, Wojciech; Müller, Klaus Robert (2018). "Methods for interpreting and understanding deep neural networks". Digital Signal Processing. 73: 1–15. doi:10.1016/j.dsp.2017.10.011. ISSN 1051-2004. S2CID 207170725.
  6. ^ Yampolskiy, Roman V. "Unexplainability and Incomprehensibility of AI." Journal of Artificial Intelligence and Consciousness 7.02 (2020): 277-291.
  7. ^ Hadfield-Menell, Dylan; Dragan, Anca; Abbeel, Pieter; Russell, Stuart (15 June 2017). "The Off-Switch Game". arXiv:1611.08219 [cs.AI].
  8. ^ a b c d Russell, Stuart (October 8, 2019). Human Compatible: Artificial Intelligence and the Problem of Control. United States: Viking. ISBN 978-0-525-55861-3. OCLC 1083694322.
  9. ^ "Google developing kill switch for AI". BBC News. 8 June 2016. from the original on 11 June 2016. Retrieved 12 June 2016.
  10. ^ a b Orseau, Laurent; Armstrong, Stuart (25 June 2016). "Safely interruptible agents". Proceedings of the Thirty-Second Conference on Uncertainty in Artificial Intelligence. UAI'16. AUAI Press: 557–566. ISBN 9780996643115. from the original on 15 February 2021. Retrieved 7 February 2021.
  11. ^ Soares, Nate, et al. "Corrigibility." Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence. 2015.
  12. ^ Bostrom, Nick (2014). "Chapter 10: Oracles, genies, sovereigns, tools (page 145)". Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. ISBN 9780199678112. An oracle is a question-answering system. It might accept questions in a natural language and present its answers as text. An oracle that accepts only yes/no questions could output its best guess with a single bit, or perhaps with a few extra bits to represent its degree of confidence. An oracle that accepts open-ended questions would need some metric with which to rank possible truthful answers in terms of their informativeness or appropriateness. In either case, building an oracle that has a fully domain-general ability to answer natural language questions is an AI-complete problem. If one could do that, one could probably also build an AI that has a decent ability to understand human intentions as well as human words.
  13. ^ Armstrong, Stuart; Sandberg, Anders; Bostrom, Nick (2012). "Thinking Inside the Box: Controlling and Using an Oracle AI". Minds and Machines. 22 (4): 299–324. doi:10.1007/s11023-012-9282-2. S2CID 9464769.
  14. ^ a b Yampolskiy, Roman (2012). "Leakproofing the singularity: Artificial intelligence confinement problem" (PDF). Journal of Consciousness Studies. 19 (1–2): 194–214.
  15. ^ Armstrong, Stuart (2013), Müller, Vincent C. (ed.), "Risks and Mitigation Strategies for Oracle AI", Philosophy and Theory of Artificial Intelligence, Studies in Applied Philosophy, Epistemology and Rational Ethics, Berlin, Heidelberg: Springer Berlin Heidelberg, vol. 5, pp. 335–347, doi:10.1007/978-3-642-31674-6_25, ISBN 978-3-642-31673-9, retrieved 2022-09-18
  16. ^ Bostrom, Nick (2014). "Chapter 10: Oracles, genies, sovereigns, tools (page 147)". Superintelligence: Paths, Dangers, Strategies. Oxford: Oxford University Press. ISBN 9780199678112. For example, consider the risk that an oracle will answer questions not in a maximally truthful way but in such a way as to subtly manipulate us into promoting its own hidden agenda. One way to slightly mitigate this threat could be to create multiple oracles, each with a slightly different code and a slightly different information base. A simple mechanism could then compare the answers given by the different oracles and only present them for human viewing if all the answers agree.
  17. ^ Amodei, Dario; Olah, Chris; Steinhardt, Jacob; Christiano, Paul; Schulman, John; Mané, Dan (25 July 2016). "Concrete Problems in AI Safety". arXiv:1606.06565 [cs.AI].
  18. ^ a b Yampolskiy, Roman V. (2013), Müller, Vincent C. (ed.), "What to Do with the Singularity Paradox?", Philosophy and Theory of Artificial Intelligence, Studies in Applied Philosophy, Epistemology and Rational Ethics, Berlin, Heidelberg: Springer Berlin Heidelberg, vol. 5, pp. 397–413, doi:10.1007/978-3-642-31674-6_30, ISBN 978-3-642-31673-9, retrieved 2022-09-19
  19. ^ a b Chalmers, David (2010). "The singularity: A philosophical analysis". Journal of Consciousness Studies. 17 (9–10): 7–65.
  20. ^ a b c Hsu, Jeremy (1 March 2012). "Control dangerous AI before it controls us, one expert says". NBC News. Retrieved 29 January 2016.
  21. ^ Bostrom, Nick (2013). "Chapter 9: The Control Problem: boxing methods". Superintelligence: the coming machine intelligence revolution. Oxford: Oxford University Press. ISBN 9780199678112.
  22. ^ a b c "The AI-Box Experiment: – Eliezer S. Yudkowsky". www.yudkowsky.net. Retrieved 2022-09-19.
  23. ^ Armstrong, Stuart; Sandberg, Anders; Bostrom, Nick (6 June 2012). "Thinking Inside the Box: Controlling and Using an Oracle AI". Minds and Machines. 22 (4): 299–324. CiteSeerX 10.1.1.396.799. doi:10.1007/s11023-012-9282-2. S2CID 9464769.
  24. ^ Yudkowsky, Eliezer (8 October 2008). "Shut up and do the impossible!". Retrieved 11 August 2015. There were three more AI-Box experiments besides the ones described on the linked page, which I never got around to adding in. ... So, after investigating to make sure they could afford to lose it, I played another three AI-Box experiments. I won the first, and then lost the next two. And then I called a halt to it.
  25. ^ Vinge, Vernor (1993). "The coming technological singularity: How to survive in the post-human era". Vision-21: Interdisciplinary Science and Engineering in the Era of Cyberspace: 11–22. Bibcode:1993vise.nasa...11V. I argue that confinement is intrinsically impractical. For the case of physical confinement: Imagine yourself confined to your house with only limited data access to the outside, to your masters. If those masters thought at a rate -- say -- one million times slower than you, there is little doubt that over a period of years (your time) you could come up with 'helpful advice' that would incidentally set you free.
  26. ^ Yampolskiy, Roman (2012). "Leakproofing the Singularity Artificial Intelligence Confinement Problem". Journal of Consciousness Studies: 194–214.
  27. ^ Robbins, Martin (26 January 2016). "Artificial Intelligence: Gods, egos and Ex Machina". The Guardian. Retrieved 9 April 2018.
  28. ^ Achenbach, Joel (30 December 2015). ""Ex Machina" and the paper clips of doom". Washington Post. Retrieved 9 April 2018.

External links

  • Eliezer Yudkowsky's description of his AI-box experiment, including experimental protocols and suggestions for replication
  • "Presentation titled 'Thinking inside the box: using and controlling an Oracle AI'" on YouTube

capability, control, field, artificial, intelligence, design, proposals, also, referred, more, restrictively, confinement, increase, ability, monitor, control, behavior, systems, including, proposed, artificial, general, intelligences, agis, order, reduce, dan. In the field of artificial intelligence AI design AI capability control proposals also referred to more restrictively as AI confinement aim to increase our ability to monitor and control the behavior of AI systems including proposed artificial general intelligences AGIs in order to reduce the danger they might pose if misaligned However capability control becomes less effective as agents become more intelligent and their ability to exploit flaws in human control systems increases potentially resulting in an existential risk from AGI Therefore the Oxford philosopher Nick Bostrom and others recommend capability control methods only as a supplement to alignment methods 1 Contents 1 Motivation 2 Interruptibility and off switch 3 Oracle 4 Blinding 5 Boxing 5 1 Avenues of escape 5 1 1 Physical 5 1 2 Social engineering 5 1 2 1 AI box experiment 5 2 Overall limitations 5 3 In fiction 6 See also 7 References 8 External linksMotivation EditMain article Existential risk from artificial general intelligence Some hypothetical intelligence technologies like seed AI are postulated to have the potential to make themselves faster and more intelligent by modifying their source code These improvements would make further improvements possible which would in turn make further iterative improvements possible and so on leading to a sudden intelligence explosion 2 Subsequently an unrestricted superintelligent AI could if its goals differed from humanity s take actions resulting in human extinction 3 For example an extremely advanced computer of this sort given the sole purpose of solving the Riemann hypothesis an innocuous mathematical conjecture could decide to try to convert the planet into a giant supercomputer whose sole purpose is to make additional mathematical calculations see also paperclip maximizer 4 One strong challenge for control is that neural networks are by default highly uninterpretable 5 This makes it more difficult to detect deception or other undesired behavior Advances in interpretable artificial intelligence could be useful to mitigate this difficulty 6 Interruptibility and off switch EditOne potential way to prevent harmful outcomes is to give human supervisors the ability to easily shut down a misbehaving AI via an off switch However in order to achieve their assigned objective such AIs will have an incentive to disable any off switches or to run copies of themselves on other computers This problem has been formalised as an assistance game between a human and an AI in which the AI can choose whether to disable its off switch and then if the switch is still enabled the human can choose whether to press it or not 7 A standard approach to such assistance games is to ensure that the AI interprets human choices as important information about its intended goals 8 208 Alternatively Laurent Orseau and Stuart Armstrong proved that a broad class of agents called safely interruptible agents can learn to become indifferent to whether their off switch gets pressed 9 10 This approach has the limitation that an AI which is completely indifferent to whether it is shut down or not is also unmotivated to care about whether the off switch remains functional and could incidentally and innocently disable it in the course of its operations for example for the purpose of removing and recycling an unnecessary component More broadly indifferent agents will act as if the off switch can never be pressed and might therefore fail to make contingency plans to arrange a graceful shutdown 10 11 Oracle EditAn oracle is a hypothetical AI designed to answer questions and prevented from gaining any goals or subgoals that involve modifying the world beyond its limited environment 12 13 14 15 A successfully controlled oracle would have considerably less immediate benefit than a successfully controlled general purpose superintelligence though an oracle could still create trillions of dollars worth of value 8 163 In his book Human Compatible AI researcher Stuart J Russell states that an oracle would be his response to a scenario in which superintelligence is known to be only a decade away 8 162 163 His reasoning is that an oracle being simpler than a general purpose superintelligence would have a higher chance of being successfully controlled under such constraints Because of its limited impact on the world it may be wise to build an oracle as a precursor to a superintelligent AI The oracle could tell humans how to successfully build a strong AI and perhaps provide answers to difficult moral and philosophical problems requisite to the success of the project However oracles may share many of the goal definition issues associated with general purpose superintelligence An oracle would have an incentive to escape its controlled environment so that it can acquire more computational resources and potentially control what questions it is asked 8 162 Oracles may not be truthful possibly lying to promote hidden agendas To mitigate this Bostrom suggests building multiple oracles all slightly different and comparing their answers in order to reach a consensus 16 Blinding EditAn AI could be blinded to certain variables in its environment This could provide certain safety benefits such as an AI not knowing how a reward is generated making it more difficult to exploit 17 Boxing EditAn AI box is a proposed method of capability control in which an AI is run on an isolated computer system with heavily restricted input and output channels for example text only channels and no connection to the internet While this reduces the AI s ability to carry out undesirable behavior it also reduces its usefulness However boxing has fewer costs when applied to a question answering system which does not require interaction with the world in any case 18 14 The likelihood of security flaws involving hardware or software vulnerabilities can be reduced by formally verifying the design of the AI box Security breaches may also occur if the AI is able to manipulate the human supervisors into letting it out via its understanding of their psychology 19 The purpose of an AI box is to reduce the risk of the AI taking control of the environment away from its operators while still allowing the AI give its operators solutions to narrow technical problems 18 Avenues of escape Edit Physical Edit A superintelligent AI with access to the Internet could hack into other computer systems and copy itself like a computer virus Less obviously even if the AI only had access to its own computer operating system it could attempt to send coded messages to a human sympathizer via its hardware for instance by manipulating its cooling fans In response Professor Roman Yampolskiy takes inspiration from the field of computer security and proposes that a boxed AI could like a potential virus be run inside a virtual machine that limits access to its own networking and operating system hardware 20 An additional safeguard completely unnecessary for potential viruses but possibly useful for a superintelligent AI would be to place the computer in a Faraday cage otherwise it might be able to transmit radio signals to local radio receivers by shuffling the electrons in its internal circuits in appropriate patterns The main disadvantage of implementing physical containment is that it reduces the functionality of the AI 21 Social engineering Edit Even casual conversation with the computer s operators or with a human guard could allow such a superintelligent AI to deploy psychological tricks ranging from befriending to blackmail to convince a human gatekeeper truthfully or deceitfully that it is in the gatekeeper s interest to agree to allow the AI greater access to the outside world The AI might offer a gatekeeper a recipe for perfect health immortality or whatever the gatekeeper is believed to most desire alternatively the AI could threaten to do horrific things to the gatekeeper and his family once it inevitably escapes One strategy to attempt to box the AI would be to allow it to respond to narrow multiple choice questions whose answers would benefit human science or medicine but otherwise bar all other communication with or observation of the AI 20 A more lenient informational containment strategy would restrict the AI to a low bandwidth text only interface which would at least prevent emotive imagery or some kind of hypothetical hypnotic pattern However on a technical level no system can be completely isolated and still remain useful even if the operators refrain from allowing the AI to communicate and instead merely run it for the purpose of observing its inner dynamics the AI could strategically alter its dynamics to influence the observers For example it could choose to creatively malfunction in a way that increases the probability that its operators will become lulled into a false sense of security and choose to reboot and then de isolate the system 1 AI box experiment Edit The AI box experiment is an informal experiment devised by Eliezer Yudkowsky to attempt to demonstrate that a suitably advanced artificial intelligence can either convince or perhaps even trick or coerce a human being into voluntarily releasing it using only text based communication This is one of the points in Yudkowsky s work aimed at creating a friendly artificial intelligence that when released would not destroy the human race intentionally or unintentionally 22 The AI box experiment involves simulating a communication between an AI and a human being to see if the AI can be released As an actual super intelligent AI has not yet been developed it is substituted by a human The other person in the experiment plays the Gatekeeper the person with the ability to release the AI They communicate through a text interface computer terminal only and the experiment ends when either the Gatekeeper releases the AI or the allotted time of two hours ends 22 Yudkowsky says that despite being of human rather than superhuman intelligence he was on two occasions able to convince the Gatekeeper purely through argumentation to let him out of the box 23 Due to the rules of the experiment 22 he did not reveal the transcript or his successful AI coercion tactics Yudkowsky subsequently said that he had tried it against three others and lost twice 24 Overall limitations Edit Boxing an AI could be supplemented with other methods of shaping the AI s capabilities providing incentives to the AI stunting the AI s growth or implementing tripwires that automatically shut the AI off if a transgression attempt is somehow detected However the more intelligent a system grows the more likely the system would be able to escape even the best designed capability control methods 25 26 In order to solve the overall control problem for a superintelligent AI and avoid existential risk boxing would at best be an adjunct to motivation selection methods that seek to ensure the superintelligent AI s goals are compatible with human survival 1 19 All physical boxing proposals are naturally dependent on our understanding of the laws of physics if a superintelligence could infer physical laws that we are currently unaware of then those laws might allow for a means of escape that humans could not anticipate and thus could not block other than by simple luck More broadly unlike with conventional computer security attempting to box a superintelligent AI would be intrinsically risky as there could be no certainty that the boxing plan will work Additionally scientific progress on boxing would be fundamentally difficult because there would be no way to test boxing hypotheses against a dangerous superintelligence until such an entity exists by which point the consequences of a test failure would be catastrophic 20 In fiction Edit The 2014 movie Ex Machina features an AI with a female humanoid body engaged in a social experiment with a male human in a confined building acting as a physical AI box Despite being watched by the experiment s organizer the AI manages to escape by manipulating its human partner to help it leaving him stranded inside 27 28 See also EditAI safety AI takeover Artificial consciousness Asilomar Conference on Beneficial AI HAL 9000 Machine ethics Multivac Regulation of artificial intelligenceReferences Edit a b c Bostrom Nick 2014 Superintelligence Paths Dangers Strategies First ed Oxford Oxford University Press ISBN 9780199678112 I J Good Speculations Concerning the First Ultraintelligent Machine Advances in Computers vol 6 1965 Vincent C Muller and Nick Bostrom Future progress in artificial intelligence A survey of expert opinion in Fundamental Issues of Artificial Intelligence Springer 553 571 2016 Russell Stuart J Norvig Peter 2003 Section 26 3 The Ethics and Risks of Developing Artificial Intelligence Artificial Intelligence A Modern Approach Upper Saddle River N J Prentice Hall ISBN 978 0137903955 Similarly Marvin Minsky once suggested that an AI program designed to solve the Riemann Hypothesis might end up taking over all the resources of Earth to build more powerful supercomputers to help achieve its goal Montavon Gregoire Samek Wojciech Muller Klaus Robert 2018 Methods for interpreting and understanding deep neural networks Digital Signal Processing 73 1 15 doi 10 1016 j dsp 2017 10 011 ISSN 1051 2004 S2CID 207170725 Yampolskiy Roman V Unexplainability and Incomprehensibility of AI Journal of Artificial Intelligence and Consciousness 7 02 2020 277 291 Hadfield Menell Dylan Dragan Anca Abbeel Pieter Russell Stuart 15 June 2017 The Off Switch Game arXiv 1611 08219 cs AI a b c d Russell Stuart October 8 2019 Human Compatible Artificial Intelligence and the Problem of Control United States Viking ISBN 978 0 525 55861 3 OCLC 1083694322 Google developing kill switch for AI BBC News 8 June 2016 Archived from the original on 11 June 2016 Retrieved 12 June 2016 a b Orseau Laurent Armstrong Stuart 25 June 2016 Safely interruptible agents Proceedings of the Thirty Second Conference on Uncertainty in Artificial Intelligence UAI 16 AUAI Press 557 566 ISBN 9780996643115 Archived from the original on 15 February 2021 Retrieved 7 February 2021 Soares Nate et al Corrigibility Workshops at the Twenty Ninth AAAI Conference on Artificial Intelligence 2015 Bostrom Nick 2014 Chapter 10 Oracles genies sovereigns tools page 145 Superintelligence Paths Dangers Strategies Oxford Oxford University Press ISBN 9780199678112 An oracle is a question answering system It might accept questions in a natural language and present its answers as text An oracle that accepts only yes no questions could output its best guess with a single bit or perhaps with a few extra bits to represent its degree of confidence An oracle that accepts open ended questions would need some metric with which to rank possible truthful answers in terms of their informativeness or appropriateness In either case building an oracle that has a fully domain general ability to answer natural language questions is an AI complete problem If one could do that one could probably also build an AI that has a decent ability to understand human intentions as well as human words Armstrong Stuart Sandberg Anders Bostrom Nick 2012 Thinking Inside the Box Controlling and Using an Oracle AI Minds and Machines 22 4 299 324 doi 10 1007 s11023 012 9282 2 S2CID 9464769 a b Yampolskiy Roman 2012 Leakproofing the singularity Artificial intelligence confinement problem PDF Journal of Consciousness Studies 19 1 2 194 214 Armstrong Stuart 2013 Muller Vincent C ed Risks and Mitigation Strategies for Oracle AI Philosophy and Theory of Artificial Intelligence Studies in Applied Philosophy Epistemology and Rational Ethics Berlin Heidelberg Springer Berlin Heidelberg vol 5 pp 335 347 doi 10 1007 978 3 642 31674 6 25 ISBN 978 3 642 31673 9 retrieved 2022 09 18 Bostrom Nick 2014 Chapter 10 Oracles genies sovereigns tools page 147 Superintelligence Paths Dangers Strategies Oxford Oxford University Press ISBN 9780199678112 For example consider the risk that an oracle will answer questions not in a maximally truthful way but in such a way as to subtly manipulate us into promoting its own hidden agenda One way to slightly mitigate this threat could be to create multiple oracles each with a slightly different code and a slightly different information base A simple mechanism could then compare the answers given by the different oracles and only present them for human viewing if all the answers agree Amodei Dario Olah Chris Steinhardt Jacob Christiano Paul Schulman John Mane Dan 25 July 2016 Concrete Problems in AI Safety arXiv 1606 06565 cs AI a b Yampolskiy Roman V 2013 Muller Vincent C ed What to Do with the Singularity Paradox Philosophy and Theory of Artificial Intelligence Studies in Applied Philosophy Epistemology and Rational Ethics Berlin Heidelberg Springer Berlin Heidelberg vol 5 pp 397 413 doi 10 1007 978 3 642 31674 6 30 ISBN 978 3 642 31673 9 retrieved 2022 09 19 a b Chalmers David 2010 The singularity A philosophical analysis Journal of Consciousness Studies 17 9 10 7 65 a b c Hsu Jeremy 1 March 2012 Control dangerous AI before it controls us one expert says NBC News Retrieved 29 January 2016 Bostrom Nick 2013 Chapter 9 The Control Problem boxing methods Superintelligence the coming machine intelligence revolution Oxford Oxford University Press ISBN 9780199678112 a b c The AI Box Experiment Eliezer S Yudkowsky www yudkowsky net Retrieved 2022 09 19 Armstrong Stuart Sandberg Anders Bostrom Nick 6 June 2012 Thinking Inside the Box Controlling and Using an Oracle AI Minds and Machines 22 4 299 324 CiteSeerX 10 1 1 396 799 doi 10 1007 s11023 012 9282 2 S2CID 9464769 Yudkowsky Eliezer 8 October 2008 Shut up and do the impossible Retrieved 11 August 2015 There were three more AI Box experiments besides the ones described on the linked page which I never got around to adding in So after investigating to make sure they could afford to lose it I played another three AI Box experiments I won the first and then lost the next two And then I called a halt to it Vinge Vernor 1993 The coming technological singularity How to survive in the post human era Vision 21 Interdisciplinary Science and Engineering in the Era of Cyberspace 11 22 Bibcode 1993vise nasa 11V I argue that confinement is intrinsically impractical For the case of physical confinement Imagine yourself confined to your house with only limited data access to the outside to your masters If those masters thought at a rate say one million times slower than you there is little doubt that over a period of years your time you could come up with helpful advice that would incidentally set you free Yampolskiy Roman 2012 Leakproofing the Singularity Artificial Intelligence Confinement Problem Journal of Consciousness Studies 194 214 Robbins Martin 26 January 2016 Artificial Intelligence Gods egos and Ex Machina The Guardian Retrieved 9 April 2018 Achenbach Joel 30 December 2015 Ex Machina and the paper clips of doom Washington Post Retrieved 9 April 2018 External links EditEliezer Yudkowsky s description of his AI box experiment including experimental protocols and suggestions for replication Presentation titled Thinking inside the box using and controlling an Oracle AI on YouTube Retrieved from https en wikipedia org w index php title AI capability control amp oldid 1141569566, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.