Advanced AI Week1 notes
Goal of Artificial General Intelligence (AGI):
Build general-purpose Super-Intelligences
HOW???
–> Build system by trail & error (Artificial Approach)
–> Mimic human behavior (Natural Approach) (Not covered in this course)
So we need THEORIES to guide us.
For Artificial Approach:
- AI system includes:
- Logic/language based
- Economics inspired
- Cybernetics
- Machine Learning
- Information processing
Intelligence does not have a formal definition yet.
Humanly Thinking: Cognitive Science Humanly Acting: Turing test, Behaviorism Rationally Thinking: Laws Thought Rationally Acting: Doing the Right Thing (the topic we discuss)
Decision Theory = Probability + Utility Theory
Uncertain world, environmental probability distribution is known.
Universal Induction = Ockham + Bayes + Turing
Sequence prediction for unknown prior distribution.
AI = Decision Theory + Universal Induction
UAI covers all Reinforcement Learning(RL) problem types
–> Statistical Machine Learning:
Mostly independent and identically distributed(i.i.d.) data classification, regression, clustering
–> RL Problems & Algorithms:
Stochastic, unknown, non-i.i.d. environments
–> Artificial Intelligence:
Traditionally deterministic, known world/planning problem
Informal Definition of Intelligence:
Intelligence measures an agent’s ability to achieve goals in a wide range of environments.
Induting a model of the environment –> Make predictions –> Make a decision –> Do the next action
Induction:
Occam’s razor: Take simplest hypothesis consistent with data.
Then how to quantify “simplicity”?
Description Length!
–> Shortest description of object is best explanation.
Shortest description = Shortest prgram for a string on a Turing Machines T –> Best extrapolation = Prediction
\[K_T(x) = \min_{p}\{\ell(p) : T(p) = x\}\]Kolmogorov-complexity(x):
\(K(x) = K_U(x) \leq K_T(x) + c_T\)
However, K(x) is uncomputable and p can not be easily found.
So we need to have prior for each possible p. That is why we introduce Bayes’ rule here.
Bayes’ rule:
\(P(H_i | D) = \cfrac{P(D|H_i) \cdot P(H_i)}{\sum_{i}P(D|H_i) \cdot P(H_i)}\)
$P(H_i)$ is prior probability.
$P(H_i|D)$ is posterior probability.
$P(D | H_i)$ is easy to describe.
But we do not know how to choose $P(H_i)$.
Epicurus: If more than one theory is consistent with the observations, keep all theories.
So we refine Occam’s razor with Kolmogorov complexity:
\(P(H_i) := 2^{-K_{T/U}(H_i)}\)
But if we use T, we do not know how to choose T.
If we use U, it is incomputable.
Solomonoff combined Occam, Epicurus, Bayes and Turing. –> Theory of sequential prediction
$M(x)$ = P(UTM outputs x when input is random)
$M(y|x) = M(xy)/M(x)$
The Minimum Description Length Principle
$y$ of highest $M(y|x)$ = $y$ of smallest $K_T(xy)$.
$x$ similar to $y$ –> $K(x|y) := \min \{\ell(p) : U(p,y) = x\}$