[RL Series 2/n] From Animals to Agents: Linking Psychology, Behaviour, Mathematics, and Decision Making
intro
Maths, computation, the mind, and related fields are a fascination for me.
I had thought I was quite well informed and to a large degree I did know most of the science in more traditional Computer Science (it was my undergraduate degree…). What had slipped me by was reinforcement learning, both its mathematical grounding and value of application. If you’ve come from the previous post ([RL Series 1/n] Defining Artificial Intelligence and Reinforcement Learning) you know I’ve said something like that already. I’ll get over it and stop repeating myself soon, I’m sure 😉.
I’ve been told that an image at the top of a blog post is more engaging and I’d like you to feel welcome and engage, so here’s an image, generated by ChatGPT/DALLE, of the timeline in this post:

more preamble
So the whole Agent-Environment set up was new to me, introduced as part of a course I’m doing on AI Applications. We covered some theory on the Basics of Reinforcement Learning and the following methods:
- Markov Decision Process/Property
- Dynamic Programming
- Monte Carlo
- Temporal Difference
- SARSA (State-Action-Reward-Action)
- Q-Learning
To close it out I developed an environment to train a Maze Solving agent (Q-Learning Maze Solving Agent project).
It’s been a brilliant ride. Check it out, I think it’ll make sense to an engineer as the algorithm is quite straight forward:
- Roll a dice (or similar)
- If it’s a 1, randomly choose a direction, otherwise take the best direction stored in your list
- Calculate the reward for the new location and update your list with that new reward
- Repeat steps 1 to 3
- Let me know what you think (Optional in the Q-Learning algorithm)
In doing this project I researched further, and this post is an overview of the key advancements in the related fields. This post is the output of that effort. It’s a timeline of the major publications in the fields related to reinforcement learning and Neural Networks, up until the turn of the millennium.
After 2000 deep RL becomes a thing (Q-Learning and Neural Networks) plus some humans start to train other humans to continually scroll a website. I’m compartmentalising that until I understand (fully/better/enough of) what happened in getting humans (us) to that point.
So fill your boots. I hope to add more to this series, an example would be the Value and Policy Iteration algorithms, definitely Q-Learning algorithm and code, maybe Monte Carlo examples (I’m curious but it’s not yet on my critical path).
Early Foundations (1894-1913)
1894
- Researcher: C. Lloyd Morgan
- Work: Trial and error
Reportedly coined the term “Trial and error”.
1898
- Researcher: Edward Thorndike
- Work: Animal Intelligence: An Experimental Study of the Associative Processes in Animals
Edward Thorndike was quite a character, he lived in a small apartment with the animals he studied, and his work bridged the gap between psychology and behaviourism (later developed by Skinner and others).
During this period he conducted puzzle box experiments, providing first empirical evidence of learning through consequences.
1903
- Researcher: Edward Thorndike
- Work: Educational Psychology: The Psychology of Learning Volume I
Volume 1, from what I have read he got a lot of feedback on his experiments over ~5 years, this was a marked improvement.
1906
- Researcher: Andrey Markov
- Work: Introduction of Markov chains (note: I do not read Russian so I hope this is the correct paper).
I cannot find the original in English, however it is reproduced in Appendix B of Howard’s 2012 book Dynamic Probabilistic Systems, Volume I: Markov Models .
Developed mathematical foundation for describing state transitions in probability theory. The key property (known as the Markov Property) is crucial for RL frameworks, any change to the current state is independent of the previous states.
Not a paper but a very cool web util to understand and tweak Markov Chains: https://setosa.io/blog/2014/07/26/markov-chains/index.html
1913
- Researcher: Edward Thorndike
- Work: “Educational Psychology: The Psychology of Learning Volume II”
Formalized three fundamental laws of learning, they seem quite appropriate still today.
- Law of Effect
- Law of Exercise
- Law of Readiness.
Early Behavioral Sciences and Mathematics (1938-1952)
1938
- Researcher: B.F. Skinner
- Work: The Behavior of Organisms: An Experimental Analysis
Introduced operant conditioning and systematic study of reinforcement. Minksy referenced his work. I need to read up some more on what he did.
1943
- Researchers: Warren McCulloch & Walter Pitts
- Work: A Logical Calculus of the Ideas Immanent in Nervous Activity
Laid the groundwork for the building of the perceptron.
1943
- Researcher: Clark Hull
- Work: Principles of Behavior
A researcher/scientist that I have not read anything about yet. His work appears to be important as it is reported that he significantly moved the mathematical representations of behaviour forward.
1951-1952
- Researcher: Clark Hull
- Work: A Behavior System
Refined mathematical framework for behavior, expanded behavioral equations.
Simulation: the Monte Carlo Method (1946 - 1949)
1946-1949
- Researchers: Stanislaw Ulam, John von Neumann, Nicholas Metropolis
- Work: The Monte Carlo Method
- History: Monte Carlo Method
Developed in secret at Los Alamos as an improvement to the simulations performed when developing nuclear weapons. Later became crucial for RL as a method for estimating value functions from experience.
Cybernetics, Artificial Intelligence, and the Perceptron (1948 - 1961)
1948
- Researcher: Norbert Wiener
- Work: Cybernetics
A book that brought together ideas and theories on computation, time, feedback, nervous systems, mental capability, and information and language. Was updated in 1961 to include a chapter on learning and self-reproduction, and another on Brain waves and self-organising systems. Seems well worth a read.
1955
- Researcher: John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon
- Work: A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence
“The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”
1958
- Researcher: Frank Rosenblatt
- Work: The Perceptron
Developed the first trainable neural network, capable of basic pattern recognition.
1961
- Researcher: Marvin Minsky
- Work: Steps Toward Artificial Intelligence
First use of the term “reinforcement learning”, connected behavioral and mathematical approaches.
Control Theory and Dynamic Programming Era (1957-1962)
1957
- Researcher: Richard Bellman
- Work: Dynamic Programming
Introduced dynamic programming and the important Bellman Equation (if you know one value in a known system you can calulcate them all) and the concept of value functions, developed Value Iteration algorithm.
1960/1962
- Researcher: Ronald Howard
- Work:
Paper (1960): Dynamic Programming and Markov Processes
Book (1962): Dynamic Programming and Markov Processes
Developed Policy Iteration algorithm, formalized MDPs in decision-making.
Neural Network Development and First Winter (1969-1974)
1969
- Researchers: Marvin Minsky & Seymour Papert
- Work: Perceptrons
Demonstrated limitations of single-layer networks, appears to have been a strong influence for the start of the first AI winter.
1974
- Researcher: Paul Werbos
- Work: Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Science
Developed backpropagation algorithm, didn’t really flourish in the (AI) winter climate.
Modern Neural Network Renaissance (1982-1986)
1982
- Researcher: John Hopfield
- Work: Neural networks and physical systems with emergent collective computational abilities
A Brain-Inspired Algorithm For Memory
1986
- Researchers: David Rumelhart, Geoffrey Hinton, & Ronald Williams
- Work: Learning representations by back-propagating errors
The Nobel Prize winning paper. Popularized backpropagation, demonstrated practical training of multilayer networks. Hinton went on to be involved with further famous and significant advances in the field.
1986
- Researchers: David Rumelhart & James McClelland
- Work: Parallel Distributed Processing Chapter 1 | Parallel Distributed Processing Chapter 2
I understand that this showed how MLPs could overcome previous limitations. I need to look into this.
Modern RL Development (1987-1998)
1987
- Researcher: Richard Sutton & Andrew Barto
- Work: A temporal-difference model of classical conditioning
Introduced Temporal Difference learning, bridging Monte Carlo and Dynamic Programming approaches.
1989
- Researcher: Christopher Watkins
- Work: Learning from Delayed Rewards
This is a very important advancement in reinforcement learning. The Q-Learning algorithm enables off-policy learning. This is worth digging into a lot (including concepts like Actualism and Possibilism).
1998
- Researchers: Richard Sutton & Andrew Barto
- Work: Reinforcement Learning: An Introduction (this is the second edition from 2018)
The Royale with Cheese. The Big Mac.Updating in 2018.
Next up?
To be decided!