[RL Series 1/n] Defining Artificial Intelligence and Reinforcement Learning
intro
I’m learning about Reinforcement Learning, it’s an area that has a lot of intrigue for me. The first I recall hearing of it was when ChatGPT wes released and it was said Reinforcement Learning from Human Feedback was the key to making it so fluent in responses.
Since then I’m studying AI and Data Science for a Masters so with that I’m stepping back to understand the domain in greater detail. No better way to understand than to write it down!
So, below is my abbreviated version of the key points from two papers (from John McCarthy and Marvin Minsky). Following on from this post will be posts on the influence of Psychology, Control Theory, the Markov Chain, Dynamic Programming, and then Temporal Difference. Maybe more, depending where my learning goes (I hope into Deep RL as I’ve developed an agent with a Q-Table, one with a Neural Net would be a lot better!!) With that I’ve got some code later for training and evaluating a Maze Solving agent.
the background to Artificial Intelligence
John McCarthy is documented to be the person that coined the term Artificial Intelligence. He, along with Marvin Minsky, Nathaniel Rochester, and Claude Shannon, submitted a proposal, called A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence, in August 1955 for the Dartmouth College Conference. The start of the proposal states
The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.
Six years later, in 1961, the term reinforcement learning was coined by Minsky in 1961 in his paper titled Steps toward Artificial Intelligence.
The problems of heuristic programming-of making computers solve really difficult problems-are divided into five main areas: Search, Pattern-Recognition, Learning, Planning, and Induction.
A computer can do, in a sense, only what it is told to do. But even when we do not know how to solve a certain problem, we may program a machine (computer) to Search through some large space of solution attempts. Unfortunately, this usually leads to an enormously inefficient process.
With Pattern-Recognition techniques, efficiency can often be improved, by restricting the application of the machine’s methods to appropriate problems. Pattern-Recognition, together with Learning, can be used to exploit generalizations based on accumulated experience, further reducing search. By analyzing the situation, using Planning methods, we may obtain a fundamental improvement by replacing the given search with a much smaller, more appropriate exploration. To manage broad classes of problems, machines will need to construct models of their environments, using some scheme for Induction.
Wherever appropriate, the discussion is supported by extensive citation of the literature and by descriptions of a few of the most successful heuristic (problem-solving) programs constructed to date.
In this paper he goes into what is a Learning System, here is the summary;
In order to solve a new problem, one should first try using methods similar to those that have worked on similar problems. To implement this “basic learning heuristic” one must generalize on past experience, and one way to do this is to use success-reinforced decision models. These learning systems are shown to be averaging devices. Using devices which learn also which events are associated with reinforcement, i.e., reward, we can build more autonomous “secondary reinforcement” systems. In applying such methods to complex problems, one encounters a serious difficulty-in distributing credit for success of a complex strategy among the many decisions that were involved. This problem can be managed by arranging for local reinforcement of partial goals within a hierarchy, and by grading the training sequence of problems to parallel a process of maturation of the machine’s resources.
With that there is a reference to BK Skinner’s work on Science and Human Behaviour (a version from 2014 is available at the BK Skinner Foundation and its importance in the role of reinforcement learning.
The analogy is with “reward” or “extinction” (not punishment) in animal behavior. The important thing about this kind of process is that it is “operant” (a term of Skinner [44]); the reinforcement operator does not initiate behavior, but merely selects that which the Trainer likes from that which has occurred
Here is the schema from the paper, the original Learning System!
Next is From From Animals to Agents: Linking Psychology, Behaviour, Mathematics, and Decision Making