fbpx
Wikipedia

AIXI

AIXI ['ai̯k͡siː] is a theoretical mathematical formalism for artificial general intelligence. It combines Solomonoff induction with sequential decision theory. AIXI was first proposed by Marcus Hutter in 2000[1] and several results regarding AIXI are proved in Hutter's 2005 book Universal Artificial Intelligence.[2]

AIXI is a reinforcement learning (RL) agent. It maximizes the expected total rewards received from the environment. Intuitively, it simultaneously considers every computable hypothesis (or environment). In each time step, it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken. The promised rewards are then weighted by the subjective belief that this program constitutes the true environment. This belief is computed from the length of the program: longer programs are considered less likely, in line with Occam's razor. AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs.

Definition edit

AIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment  . The interaction proceeds in time steps, from   to  , where   is the lifespan of the AIXI agent. At time step t, the agent chooses an action   (e.g. a limb movement) and executes it in the environment, and the environment responds with a "percept"  , which consists of an "observation"   (e.g., a camera image) and a reward  , distributed according to the conditional probability  , where   is the "history" of actions, observations and rewards. The environment   is thus mathematically represented as a probability distribution over "percepts" (observations and rewards) which depend on the full history, so there is no Markov assumption (as opposed to other RL algorithms). Note again that this probability distribution is unknown to the AIXI agent. Furthermore, note again that   is computable, that is, the observations and rewards received by the agent from the environment   can be computed by some program (which runs on a Turing machine), given the past actions of the AIXI agent.[3]

The only goal of the AIXI agent is to maximise  , that is, the sum of rewards from time step 1 to m.

The AIXI agent is associated with a stochastic policy  , which is the function it uses to choose actions at every time step, where   is the space of all possible actions that AIXI can take and   is the space of all possible "percepts" that can be produced by the environment. The environment (or probability distribution)   can also be thought of as a stochastic policy (which is a function):  , where the   is the Kleene star operation.

In general, at time step   (which ranges from 1 to m), AIXI, having previously executed actions   (which is often abbreviated in the literature as  ) and having observed the history of percepts   (which can be abbreviated as  ), chooses and executes in the environment the action,  , defined as follows [4]

 

or, using parentheses, to disambiguate the precedences

 

Intuitively, in the definition above, AIXI considers the sum of the total reward over all possible "futures" up to   time steps ahead (that is, from   to  ), weighs each of them by the complexity of programs   (that is, by  ) consistent with the agent's past (that is, the previously executed actions,  , and received percepts,  ) that can generate that future, and then picks the action that maximises expected future rewards.[3]

Let us break this definition down in order to attempt to fully understand it.

  is the "percept" (which consists of the observation   and reward  ) received by the AIXI agent at time step   from the environment (which is unknown and stochastic). Similarly,   is the percept received by AIXI at time step   (the last time step where AIXI is active).

  is the sum of rewards from time step   to time step  , so AIXI needs to look into the future to choose its action at time step  .

  denotes a monotone universal Turing machine, and   ranges over all (deterministic) programs on the universal machine  , which receives as input the program   and the sequence of actions   (that is, all actions), and produces the sequence of percepts  . The universal Turing machine   is thus used to "simulate" or compute the environment responses or percepts, given the program   (which "models" the environment) and all actions of the AIXI agent: in this sense, the environment is "computable" (as stated above). Note that, in general, the program which "models" the current and actual environment (where AIXI needs to act) is unknown because the current environment is also unknown.

  is the length of the program   (which is encoded as a string of bits). Note that  . Hence, in the definition above,   should be interpreted as a mixture (in this case, a sum) over all computable environments (which are consistent with the agent's past), each weighted by its complexity  . Note that   can also be written as  , and   is the sequence of actions already executed in the environment by the AIXI agent. Similarly,  , and   is the sequence of percepts produced by the environment so far.

Let us now put all these components together in order to understand this equation or definition.

At time step t, AIXI chooses the action   where the function   attains its maximum.

Parameters edit

The parameters to AIXI are the universal Turing machine U and the agent's lifetime m, which need to be chosen. The latter parameter can be removed by the use of discounting.

The meaning of the word AIXI edit

According to Hutter, the word "AIXI" can have several interpretations. AIXI can stand for AI based on Solomonoff's distribution, denoted by   (which is the Greek letter xi), or e.g. it can stand for AI "crossed" (X) with induction (I). There are other interpretations.

Optimality edit

AIXI's performance is measured by the expected total number of rewards it receives. AIXI has been proven to be optimal in the following ways.[2]

  • Pareto optimality: there is no other agent that performs at least as well as AIXI in all environments while performing strictly better in at least one environment.[citation needed]
  • Balanced Pareto optimality: like Pareto optimality, but considering a weighted sum of environments.
  • Self-optimizing: a policy p is called self-optimizing for an environment   if the performance of p approaches the theoretical maximum for   when the length of the agent's lifetime (not time) goes to infinity. For environment classes where self-optimizing policies exist, AIXI is self-optimizing.

It was later shown by Hutter and Jan Leike that balanced Pareto optimality is subjective and that any policy can be considered Pareto optimal, which they describe as undermining all previous optimality claims for AIXI.[5]

However, AIXI does have limitations. It is restricted to maximizing rewards based on percepts as opposed to external states. It also assumes it interacts with the environment solely through action and percept channels, preventing it from considering the possibility of being damaged or modified. Colloquially, this means that it doesn't consider itself to be contained by the environment it interacts with. It also assumes the environment is computable.[6]

Computational aspects edit

Like Solomonoff induction, AIXI is incomputable. However, there are computable approximations of it. One such approximation is AIXItl, which performs at least as well as the provably best time t and space l limited agent.[2] Another approximation to AIXI with a restricted environment class is MC-AIXI (FAC-CTW) (which stands for Monte Carlo AIXI FAC-Context-Tree Weighting), which has had some success playing simple games such as partially observable Pac-Man.[3][7]

See also edit

References edit

  1. ^ Marcus Hutter (2000). A Theory of Universal Artificial Intelligence based on Algorithmic Complexity. arXiv:cs.AI/0004001. Bibcode:2000cs........4001H.
  2. ^ a b c — (2005). Universal Artificial Intelligence: Sequential Decisions Based on Algorithmic Probability. Texts in Theoretical Computer Science an EATCS Series. Springer. doi:10.1007/b138233. ISBN 978-3-540-22139-5. S2CID 33352850.
  3. ^ a b c Veness, Joel; Kee Siong Ng; Hutter, Marcus; Uther, William; Silver, David (2009). "A Monte Carlo AIXI Approximation". arXiv:0909.0801 [cs.AI].
  4. ^ Universal Artificial Intelligence
  5. ^ Leike, Jan; Hutter, Marcus (2015). Bad Universal Priors and Notions of Optimality (PDF). Proceedings of the 28th Conference on Learning Theory.
  6. ^ Soares, Nate. "Formalizing Two Problems of Realistic World-Models" (PDF). Intelligence.org. Retrieved 2015-07-19.
  7. ^ Playing Pacman using AIXI Approximation – YouTube
  • "Universal Algorithmic Intelligence: A mathematical top->down approach", Marcus Hutter, arXiv:cs/0701125; also in Artificial General Intelligence, eds. B. Goertzel and C. Pennachin, Springer, 2007, ISBN 9783540237334, pp. 227–290, doi:10.1007/978-3-540-68677-4_8.

aixi, this, article, rely, excessively, sources, closely, associated, with, subject, potentially, preventing, article, from, being, verifiable, neutral, please, help, improve, replacing, them, with, more, appropriate, citations, reliable, independent, third, p. This article may rely excessively on sources too closely associated with the subject potentially preventing the article from being verifiable and neutral Please help improve it by replacing them with more appropriate citations to reliable independent third party sources September 2023 Learn how and when to remove this template message AIXI ai k siː is a theoretical mathematical formalism for artificial general intelligence It combines Solomonoff induction with sequential decision theory AIXI was first proposed by Marcus Hutter in 2000 1 and several results regarding AIXI are proved in Hutter s 2005 book Universal Artificial Intelligence 2 AIXI is a reinforcement learning RL agent It maximizes the expected total rewards received from the environment Intuitively it simultaneously considers every computable hypothesis or environment In each time step it looks at every possible program and evaluates how many rewards that program generates depending on the next action taken The promised rewards are then weighted by the subjective belief that this program constitutes the true environment This belief is computed from the length of the program longer programs are considered less likely in line with Occam s razor AIXI then selects the action that has the highest expected total reward in the weighted sum of all these programs Contents 1 Definition 1 1 Parameters 2 The meaning of the word AIXI 3 Optimality 4 Computational aspects 5 See also 6 ReferencesDefinition editAIXI is a reinforcement learning agent that interacts with some stochastic and unknown but computable environment m displaystyle mu nbsp The interaction proceeds in time steps from t 1 displaystyle t 1 nbsp to t m displaystyle t m nbsp where m N displaystyle m in mathbb N nbsp is the lifespan of the AIXI agent At time step t the agent chooses an action a t A displaystyle a t in mathcal A nbsp e g a limb movement and executes it in the environment and the environment responds with a percept e t E O R displaystyle e t in mathcal E mathcal O times mathbb R nbsp which consists of an observation o t O displaystyle o t in mathcal O nbsp e g a camera image and a reward r t R displaystyle r t in mathbb R nbsp distributed according to the conditional probability m o t r t a 1 o 1 r 1 a t 1 o t 1 r t 1 a t displaystyle mu o t r t a 1 o 1 r 1 a t 1 o t 1 r t 1 a t nbsp where a 1 o 1 r 1 a t 1 o t 1 r t 1 a t displaystyle a 1 o 1 r 1 a t 1 o t 1 r t 1 a t nbsp is the history of actions observations and rewards The environment m displaystyle mu nbsp is thus mathematically represented as a probability distribution over percepts observations and rewards which depend on the full history so there is no Markov assumption as opposed to other RL algorithms Note again that this probability distribution is unknown to the AIXI agent Furthermore note again that m displaystyle mu nbsp is computable that is the observations and rewards received by the agent from the environment m displaystyle mu nbsp can be computed by some program which runs on a Turing machine given the past actions of the AIXI agent 3 The only goal of the AIXI agent is to maximise t 1 m r t displaystyle sum t 1 m r t nbsp that is the sum of rewards from time step 1 to m The AIXI agent is associated with a stochastic policy p A E A displaystyle pi mathcal A times mathcal E rightarrow mathcal A nbsp which is the function it uses to choose actions at every time step where A displaystyle mathcal A nbsp is the space of all possible actions that AIXI can take and E displaystyle mathcal E nbsp is the space of all possible percepts that can be produced by the environment The environment or probability distribution m displaystyle mu nbsp can also be thought of as a stochastic policy which is a function m A E A E displaystyle mu mathcal A times mathcal E times mathcal A rightarrow mathcal E nbsp where the displaystyle nbsp is the Kleene star operation In general at time step t displaystyle t nbsp which ranges from 1 to m AIXI having previously executed actions a 1 a t 1 displaystyle a 1 dots a t 1 nbsp which is often abbreviated in the literature as a lt t displaystyle a lt t nbsp and having observed the history of percepts o 1 r 1 o t 1 r t 1 displaystyle o 1 r 1 o t 1 r t 1 nbsp which can be abbreviated as e lt t displaystyle e lt t nbsp chooses and executes in the environment the action a t displaystyle a t nbsp defined as follows 4 a t arg max a t o t r t max a m o m r m r t r m q U q a 1 a m o 1 r 1 o m r m 2 length q displaystyle a t arg max a t sum o t r t ldots max a m sum o m r m r t ldots r m sum q U q a 1 ldots a m o 1 r 1 ldots o m r m 2 textrm length q nbsp or using parentheses to disambiguate the precedences a t arg max a t o t r t max a m o m r m r t r m q U q a 1 a m o 1 r 1 o m r m 2 length q displaystyle a t arg max a t left sum o t r t ldots left max a m sum o m r m r t ldots r m left sum q U q a 1 ldots a m o 1 r 1 ldots o m r m 2 textrm length q right right right nbsp Intuitively in the definition above AIXI considers the sum of the total reward over all possible futures up to m t displaystyle m t nbsp time steps ahead that is from t displaystyle t nbsp to m displaystyle m nbsp weighs each of them by the complexity of programs q displaystyle q nbsp that is by 2 length q displaystyle 2 textrm length q nbsp consistent with the agent s past that is the previously executed actions a lt t displaystyle a lt t nbsp and received percepts e lt t displaystyle e lt t nbsp that can generate that future and then picks the action that maximises expected future rewards 3 Let us break this definition down in order to attempt to fully understand it o t r t displaystyle o t r t nbsp is the percept which consists of the observation o t displaystyle o t nbsp and reward r t displaystyle r t nbsp received by the AIXI agent at time step t displaystyle t nbsp from the environment which is unknown and stochastic Similarly o m r m displaystyle o m r m nbsp is the percept received by AIXI at time step m displaystyle m nbsp the last time step where AIXI is active r t r m displaystyle r t ldots r m nbsp is the sum of rewards from time step t displaystyle t nbsp to time step m displaystyle m nbsp so AIXI needs to look into the future to choose its action at time step t displaystyle t nbsp U displaystyle U nbsp denotes a monotone universal Turing machine and q displaystyle q nbsp ranges over all deterministic programs on the universal machine U displaystyle U nbsp which receives as input the program q displaystyle q nbsp and the sequence of actions a 1 a m displaystyle a 1 dots a m nbsp that is all actions and produces the sequence of percepts o 1 r 1 o m r m displaystyle o 1 r 1 ldots o m r m nbsp The universal Turing machine U displaystyle U nbsp is thus used to simulate or compute the environment responses or percepts given the program q displaystyle q nbsp which models the environment and all actions of the AIXI agent in this sense the environment is computable as stated above Note that in general the program which models the current and actual environment where AIXI needs to act is unknown because the current environment is also unknown length q displaystyle textrm length q nbsp is the length of the program q displaystyle q nbsp which is encoded as a string of bits Note that 2 length q 1 2 length q displaystyle 2 textrm length q frac 1 2 textrm length q nbsp Hence in the definition above q U q a 1 a m o 1 r 1 o m r m 2 length q displaystyle sum q U q a 1 ldots a m o 1 r 1 ldots o m r m 2 textrm length q nbsp should be interpreted as a mixture in this case a sum over all computable environments which are consistent with the agent s past each weighted by its complexity 2 length q displaystyle 2 textrm length q nbsp Note that a 1 a m displaystyle a 1 ldots a m nbsp can also be written as a 1 a t 1 a t a m displaystyle a 1 ldots a t 1 a t ldots a m nbsp and a 1 a t 1 a lt t displaystyle a 1 ldots a t 1 a lt t nbsp is the sequence of actions already executed in the environment by the AIXI agent Similarly o 1 r 1 o m r m o 1 r 1 o t 1 r t 1 o t r t o m r m displaystyle o 1 r 1 ldots o m r m o 1 r 1 ldots o t 1 r t 1 o t r t ldots o m r m nbsp and o 1 r 1 o t 1 r t 1 displaystyle o 1 r 1 ldots o t 1 r t 1 nbsp is the sequence of percepts produced by the environment so far Let us now put all these components together in order to understand this equation or definition At time step t AIXI chooses the action a t displaystyle a t nbsp where the function o t r t max a m o m r m r t r m q U q a 1 a m o 1 r 1 o m r m 2 length q displaystyle sum o t r t ldots max a m sum o m r m r t ldots r m sum q U q a 1 ldots a m o 1 r 1 ldots o m r m 2 textrm length q nbsp attains its maximum This article is missing information about description of the selection of actions Please expand the article to include this information Further details may exist on the talk page February 2019 Parameters edit The parameters to AIXI are the universal Turing machine U and the agent s lifetime m which need to be chosen The latter parameter can be removed by the use of discounting The meaning of the word AIXI editAccording to Hutter the word AIXI can have several interpretations AIXI can stand for AI based on Solomonoff s distribution denoted by 3 displaystyle xi nbsp which is the Greek letter xi or e g it can stand for AI crossed X with induction I There are other interpretations Optimality editAIXI s performance is measured by the expected total number of rewards it receives AIXI has been proven to be optimal in the following ways 2 Pareto optimality there is no other agent that performs at least as well as AIXI in all environments while performing strictly better in at least one environment citation needed Balanced Pareto optimality like Pareto optimality but considering a weighted sum of environments Self optimizing a policy p is called self optimizing for an environment m displaystyle mu nbsp if the performance of p approaches the theoretical maximum for m displaystyle mu nbsp when the length of the agent s lifetime not time goes to infinity For environment classes where self optimizing policies exist AIXI is self optimizing It was later shown by Hutter and Jan Leike that balanced Pareto optimality is subjective and that any policy can be considered Pareto optimal which they describe as undermining all previous optimality claims for AIXI 5 However AIXI does have limitations It is restricted to maximizing rewards based on percepts as opposed to external states It also assumes it interacts with the environment solely through action and percept channels preventing it from considering the possibility of being damaged or modified Colloquially this means that it doesn t consider itself to be contained by the environment it interacts with It also assumes the environment is computable 6 Computational aspects editLike Solomonoff induction AIXI is incomputable However there are computable approximations of it One such approximation is AIXItl which performs at least as well as the provably best time t and space l limited agent 2 Another approximation to AIXI with a restricted environment class is MC AIXI FAC CTW which stands for Monte Carlo AIXI FAC Context Tree Weighting which has had some success playing simple games such as partially observable Pac Man 3 7 See also editGodel machineReferences edit Marcus Hutter 2000 A Theory of Universal Artificial Intelligence based on Algorithmic Complexity arXiv cs AI 0004001 Bibcode 2000cs 4001H a b c 2005 Universal Artificial Intelligence Sequential Decisions Based on Algorithmic Probability Texts in Theoretical Computer Science an EATCS Series Springer doi 10 1007 b138233 ISBN 978 3 540 22139 5 S2CID 33352850 a b c Veness Joel Kee Siong Ng Hutter Marcus Uther William Silver David 2009 A Monte Carlo AIXI Approximation arXiv 0909 0801 cs AI Universal Artificial Intelligence Leike Jan Hutter Marcus 2015 Bad Universal Priors and Notions of Optimality PDF Proceedings of the 28th Conference on Learning Theory Soares Nate Formalizing Two Problems of Realistic World Models PDF Intelligence org Retrieved 2015 07 19 Playing Pacman using AIXI Approximation YouTube Universal Algorithmic Intelligence A mathematical top gt down approach Marcus Hutter arXiv cs 0701125 also in Artificial General Intelligence eds B Goertzel and C Pennachin Springer 2007 ISBN 9783540237334 pp 227 290 doi 10 1007 978 3 540 68677 4 8 Retrieved from https en wikipedia org w index php title AIXI amp oldid 1173689526, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.