Classification of Intelligent Agents

COMP111 Lectures 2020-10-13

Intelligent agents are classified depending on how they map their precepts to their actions. They can be classed under:

Simple reflex agents
- Select actions to execute based upon the current percept
- Don’t take the percept history into account.
- Very simple IFTTT actions.
- Very simple to implement but have limited intelligence.
- They have no memory.

graph TB
    A(Environment)--> |Sensors| B
    B[What the world is like now] --> C
    C[What action I should do now]--> |Actuators| A
    D[Condition-action rules]-->C

Model-based reflex agents
- Maintain an internal state that depends upon the percept history - memory.
- Helps deal with partial observability.
- Knows how the world evolves and how its actions affect the world.

graph TB
    A(Environment)--> |Sensors| B
    B[What the world is like now] --> C
    C[What action I should do now]--> |Actuators| A
    D[Condition-action rules]-->C
    E-->B
    B.->E(State)
    F(How the world evolves)-->B
    G(What my actions do)-->B

Goal-based agents
- Select appropriate actions to achieve desirable state of the environment.
- Knowledge of the current state does not automatically mean that the agent knows what to do.
- It will have to plan what to do over a long sequence of actions.

graph TB
    A(Environment)  --> |Sensors| B
    B[What the world is like now] --> H
    H[What it will be like if I do action A] --> C
    C[What action I should do now]--> |Actuators| A
    D[Condition-action rules]-->C
    B.->E(State)
    E-->B
    F(How the world evolves)-->B
    F --> H
    G(What my actions do)-->B
    G --> H

Utility-based agents
- Make use of a utility function to compare the desirability of different states that result from actions.
- Many actions may satisfy the goal but which one is the most desirable?
- Utility function maps a state, or sequence of states, onto a number to give the degree of usefulness of the state to the agents.
- Agent maximises the value of its utility function.

graph TB
    A(Environment)--> |Sensors| B
    B[What the world is like now] --> H
    H[What it will be like if I do action A] --> I
    I[How happy I will be in such a state] --> C
    J(Utility)-->I
    C[What action I should do now]--> |Actuators| A
    D[Condition-action rules]-->C
    B.->E(State)
    E-->B
    F(How the world evolves)-->B
    F --> H
    G(What my actions do)-->B
    G --> H

Learning agents
- Learning agents find ways of maximising their utility function based on previous experience
- The performance element is the same choice mechanism from the previous agents. This is now informed by the learning element.
- The performance standard informs the learning element how well it’s actions have worked.

graph TB
	A(Environment) --> |Sensors| B[Performance element]
	B --> |Actuators| A
	D[Performance standard] --> C[Critic]
	C --> |Feedback| L[Learning element]
	L --> |Changes| B
	B --> |Knowledge| L
	L --> |Learning Goals| P[Problem Generator]
	P --> B