Goal of Artificial General Intelligence (AGI):
Build general-purpose Super-Intelligences

HOW???
–> Build system by trail & error (Artificial Approach)
–> Mimic human behavior (Natural Approach) (Not covered in this course)

So we need THEORIES to guide us.

For Artificial Approach:

  • AI system includes:
    • Logic/language based
    • Economics inspired
    • Cybernetics
    • Machine Learning
    • Information processing


Intelligence does not have a formal definition yet.

Humanly Thinking:	Cognitive Science
Humanly Acting:		Turing test, Behaviorism
Rationally Thinking:	Laws Thought
Rationally Acting:	Doing the Right Thing (the topic we discuss)

Decision Theory = Probability + Utility Theory
Uncertain world, environmental probability distribution is known.

Universal Induction = Ockham + Bayes + Turing
Sequence prediction for unknown prior distribution.

AI = Decision Theory + Universal Induction



UAI covers all Reinforcement Learning(RL) problem types
–> Statistical Machine Learning:
Mostly independent and identically distributed(i.i.d.) data classification, regression, clustering
–> RL Problems & Algorithms:
Stochastic, unknown, non-i.i.d. environments
–> Artificial Intelligence:
Traditionally deterministic, known world/planning problem



Informal Definition of Intelligence:
Intelligence measures an agent’s ability to achieve goals in a wide range of environments.

Induting a model of the environment –> Make predictions –> Make a decision –> Do the next action

Induction:

Occam’s razor: Take simplest hypothesis consistent with data.

Then how to quantify “simplicity”?
Description Length!

–> Shortest description of object is best explanation.

Shortest description = Shortest prgram for a string on a Turing Machines T –> Best extrapolation = Prediction

\[K_T(x) = \min_{p}\{\ell(p) : T(p) = x\}\]

Kolmogorov-complexity(x):
\(K(x) = K_U(x) \leq K_T(x) + c_T\)

However, K(x) is uncomputable and p can not be easily found.

So we need to have prior for each possible p. That is why we introduce Bayes’ rule here.

Bayes’ rule:
\(P(H_i | D) = \cfrac{P(D|H_i) \cdot P(H_i)}{\sum_{i}P(D|H_i) \cdot P(H_i)}\)

$P(H_i)$ is prior probability.
$P(H_i|D)$ is posterior probability.

$P(D | H_i)$ is easy to describe.
But we do not know how to choose $P(H_i)$.

Epicurus: If more than one theory is consistent with the observations, keep all theories.
So we refine Occam’s razor with Kolmogorov complexity: \(P(H_i) := 2^{-K_{T/U}(H_i)}\)

But if we use T, we do not know how to choose T.
If we use U, it is incomputable.

Solomonoff combined Occam, Epicurus, Bayes and Turing. –> Theory of sequential prediction

$M(x)$ = P(UTM outputs x when input is random)
$M(y|x) = M(xy)/M(x)$

The Minimum Description Length Principle
$y$ of highest $M(y|x)$ = $y$ of smallest $K_T(xy)$.

$x$ similar to $y$ –> $K(x|y) := \min \{\ell(p) : U(p,y) = x\}$