- The paper presents some general characteristics that intelligent machines should possess and a roadmap to develop such intelligent machines in small, realistic steps.
- Link to the paper
- The intelligent agents should be able to communicate with humans, preferably using language as the medium.
- Such systems can be programmed through natural language and can access much of the human knowledge which is encoded using natural language.
- The learning environment should facilitate interactive communication and the machine should have a minimalistic bit interface for IO to keep the interface simple.
- Further, the machine should be free to use any internal representation for learning tasks.
- Learning allows the machine to adapt to the external environment and correct their mistakes.
- Users should be able to control the motivation of the machine via a communication channel. This is similar to the notion of rewards in reinforcement learning.
- Simulated environment to teach basic linguistic interactions and know-how to operate in the world.
- Though the environment should be challenging enough to force the machine to "learn how to learn", its complexity should be manageable.
- Unlike class AI block worlds, the simulated environment is not intended to teach an exhaustive set of functionality to the agent. The aim is to teach the machine how to learn efficiently by combining already acquired skills.
- Learner or actor
- Teacher
- Assigns tasks and rewards to the learner and provides helpful information.
- Aim is to kick start the learner's efficient learning capabilities without providing enough direct information.
- Environment
- Learner explores the environment by giving orders, asking questions and receiving feedback.
- Environment uses a controlled language which is more explicit and restricted.
Think of learner as a high-level programming language, the teacher as the programmer and the environment as the compiler.
- Generic input and output channels.
- Teacher and environment write to the input channel.
- Reward is written to input channel.
- Learner writes to the output channel and learns to use ambigous prefixes to address the agents and services it needs to interact with.
- Way to provide feedback to the learner.
- Rewards should become sparse as the learner's intelligence grows and "curiosity" should be a learnt strategy.
- Learner should maximise average reward over time so that faster strategies are preferred in case of equal rewards.
- Think of learner progressing through different levels where skills from earlier levels can be used in later levels.
- Tasks need not be ordered within a level.
- Learner starts by performing basic tasks like repeating characters then learns to associate linguistic strings to action sequences. Further, the learner learns to ask questions and "read" natural text.
- Learner is given time to either explore the environment or to interact with the Teacher or to update its internal structure by replaying the previous experience.
- Evaluating the learning agent on only the final behaviour only is not sufficient as it overlooks the number of attempts to reach the optimal behaviour.
- Better approach would be to conduct public competition where developers have access to preprogrammed environment for fixed amount of time and learners are evaluated on tasks that are considerably different from the tasks encountered during training.
A brief overview of the type of tasks is provided here
- Concept of positive and negative rewards.
- Discovery of algorithms.
- Remember facts, skills, and learning strategies.
- To store facts, algorithms and even ability to learn.
- Producing new structures by combining together known facts and skills.
- Understanding new concepts should not always require training examples.
- Computational model should be able to represent any pattern in data (alternatively, represent any algorithm in fixed length).
- Among the various Turning-complete computational systems available, the most natural choice would be a compositional system that can perform computations in parallel.
- Alternatively, a non-growing model with immensely large capacity could be used.
- In a growing model, new cells are connected to ones that spawned them leading to topological structures that can contribute to learning.
- But it is not clear if such topological structures can arise in a large-capacity unstructured model.