fbpx
Wikipedia

Self-play

Self-play is a technique for improving the performance of reinforcement learning agents. Intuitively, agents learn to improve their performance by playing "against themselves".

Definition and motivation Edit

In multi-agent reinforcement learning experiments, researchers try to optimize the performance of a learning agent on a given task, in cooperation or competition with one or more agents. These agents learn by trial-and-error, and researchers may choose to have the learning algorithm play the role of two or more of the different agents. When successfully executed, this technique has a double advantage:

  1. It provides a straightforward way to determine the actions of the other agents, resulting in a meaningful challenge.
  2. It increases the amount of experience that can be used to improve the policy, by a factor of two or more, since the viewpoints of each of the different agents can be used for learning.

[1] argues that most of the games that people play for fun are "Games of Skill", meaning games whose space of all possible strategies looks like a spinning top. In more detail, we can partition the space of strategies into sets  , such that any  , the strategy   beats the strategy  . Then, in population-based self-play, if the population is larger than  , then the algorithm would converge to the best possible strategy.

Usage Edit

Self-play is used by the AlphaZero program to improve its performance in the games of chess, shogi and go.[2]

Self-play is also used to train the Cicero AI system to outperform humans at the game of Diplomacy. The technique is also used in training the DeepNash system to play the game Stratego.[3][4]

Comparison of different self-play techniques Edit

  • Self-Play (SP):
    • Train agents against itself.
    • Yields an open-ended curriculum whereby opponent's and agent's strengths match.
    • Susceptible to cycles in strategy space: Agent forgets how to play against its prior versions.
  • Fictitious Self-Play (FSP):
    • Training an agent against a uniform distribution of all previous policies.
    • Wasting a large number of interactions against weaker opponents.
  • Prioritized Fictitious Self-Play (PFSP):
    • Yields a curriculum over opponents that provide a good learning signal
    • Matches agent A with a frozen opponent B from the set of candidates C with a specific probability.

Connections to other disciplines Edit

Self-play has been compared to the epistemological concept of tabula rasa that describes the way that humans acquire knowledge from a "blank slate".[5]

Further reading Edit

  • DiGiovanni, Anthony; Zell, Ethan; et al. (2021). "Survey of Self-Play in Reinforcement Learning". arXiv:2107.02850 [cs.GT].

References Edit

  1. ^ Czarnecki, Wojciech M.; Gidel, Gauthier; Tracey, Brendan; Tuyls, Karl; Omidshafiei, Shayegan; Balduzzi, David; Jaderberg, Max (2020). "Real World Games Look Like Spinning Tops". Advances in Neural Information Processing Systems. Curran Associates, Inc. 33: 17443–17454.
  2. ^ Silver, David; Hubert, Thomas; Schrittwieser, Julian; Antonoglou, Ioannis; Lai, Matthew; Guez, Arthur; Lanctot, Marc; Sifre, Laurent; Kumaran, Dharshan; Graepel, Thore; Lillicrap, Timothy; Simonyan, Karen; Hassabis, Demis (5 December 2017). "Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm". arXiv:1712.01815 [cs.AI].
  3. ^ Snyder, Alison (2022-12-01). "Two new AI systems beat humans at complex games". Axios. Retrieved 2022-12-29.
  4. ^ Erich_Grunewald. "Notes on Meta's Diplomacy-Playing AI". {{cite journal}}: Cite journal requires |journal= (help)
  5. ^ Laterre, Alexandre (2018). "Ranked Reward: Enabling Self-Play Reinforcement Learning for Combinatorial Optimization". arXiv:1712.01815 [cs.AI].


self, play, self, play, redirects, here, term, also, refer, masturbation, technique, improving, performance, reinforcement, learning, agents, intuitively, agents, learn, improve, their, performance, playing, against, themselves, contents, definition, motivatio. Self play redirects here The term may also refer to masturbation Self play is a technique for improving the performance of reinforcement learning agents Intuitively agents learn to improve their performance by playing against themselves Contents 1 Definition and motivation 2 Usage 3 Comparison of different self play techniques 4 Connections to other disciplines 5 Further reading 6 ReferencesDefinition and motivation EditIn multi agent reinforcement learning experiments researchers try to optimize the performance of a learning agent on a given task in cooperation or competition with one or more agents These agents learn by trial and error and researchers may choose to have the learning algorithm play the role of two or more of the different agents When successfully executed this technique has a double advantage It provides a straightforward way to determine the actions of the other agents resulting in a meaningful challenge It increases the amount of experience that can be used to improve the policy by a factor of two or more since the viewpoints of each of the different agents can be used for learning 1 argues that most of the games that people play for fun are Games of Skill meaning games whose space of all possible strategies looks like a spinning top In more detail we can partition the space of strategies into sets L 1 L 2 L n displaystyle L 1 L 2 L n nbsp such that any i lt j p i L i p j L j displaystyle i lt j pi i in L i pi j in L j nbsp the strategy p j displaystyle pi j nbsp beats the strategy p i displaystyle pi i nbsp Then in population based self play if the population is larger than max i L i displaystyle max i L i nbsp then the algorithm would converge to the best possible strategy Usage EditSelf play is used by the AlphaZero program to improve its performance in the games of chess shogi and go 2 Self play is also used to train the Cicero AI system to outperform humans at the game of Diplomacy The technique is also used in training the DeepNash system to play the game Stratego 3 4 Comparison of different self play techniques EditThis section needs expansion with Citations and formatting You can help by adding to it March 2023 Self Play SP Train agents against itself Yields an open ended curriculum whereby opponent s and agent s strengths match Susceptible to cycles in strategy space Agent forgets how to play against its prior versions Fictitious Self Play FSP Training an agent against a uniform distribution of all previous policies Wasting a large number of interactions against weaker opponents Prioritized Fictitious Self Play PFSP Yields a curriculum over opponents that provide a good learning signal Matches agent A with a frozen opponent B from the set of candidates C with a specific probability Connections to other disciplines EditSelf play has been compared to the epistemological concept of tabula rasa that describes the way that humans acquire knowledge from a blank slate 5 Further reading EditDiGiovanni Anthony Zell Ethan et al 2021 Survey of Self Play in Reinforcement Learning arXiv 2107 02850 cs GT References Edit Czarnecki Wojciech M Gidel Gauthier Tracey Brendan Tuyls Karl Omidshafiei Shayegan Balduzzi David Jaderberg Max 2020 Real World Games Look Like Spinning Tops Advances in Neural Information Processing Systems Curran Associates Inc 33 17443 17454 Silver David Hubert Thomas Schrittwieser Julian Antonoglou Ioannis Lai Matthew Guez Arthur Lanctot Marc Sifre Laurent Kumaran Dharshan Graepel Thore Lillicrap Timothy Simonyan Karen Hassabis Demis 5 December 2017 Mastering Chess and Shogi by Self Play with a General Reinforcement Learning Algorithm arXiv 1712 01815 cs AI Snyder Alison 2022 12 01 Two new AI systems beat humans at complex games Axios Retrieved 2022 12 29 Erich Grunewald Notes on Meta s Diplomacy Playing AI a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Laterre Alexandre 2018 Ranked Reward Enabling Self Play Reinforcement Learning for Combinatorial Optimization arXiv 1712 01815 cs AI nbsp This artificial intelligence related article is a stub You can help Wikipedia by expanding it vte Retrieved from https en wikipedia org w index php title Self play amp oldid 1179619180, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.