[RL Series 2/n] From Animals to Agents: Linking Psychology, Behaviour, Mathematics, and Decision Making

Friday, February 7, 2025

intro

Maths, computation, the mind, and related fields are a fascination for me.

I had thought I was quite well informed and to a large degree I did know most of the science in more traditional Computer Science (it was my undergraduate degree…). What had slipped me by was reinforcement learning, both its mathematical grounding and value of application. If you’ve come from the previous post ([RL Series 1/n] Defining Artificial Intelligence and Reinforcement Learning) you know I’ve said something like that already. I’ll get over it and stop repeating myself soon, I’m sure 😉.

I’ve been told that an image at the top of a blog post is more engaging and I’d like you to feel welcome and engage, so here’s an image, generated by ChatGPT/DALLE, of the timeline in this post:

From Animals to Agents: Linking Psychology, Behaviour, Mathematics, and Decision Making

more preamble

So the whole Agent-Environment set up was new to me, introduced as part of a course I’m doing on AI Applications. We covered some theory on the Basics of Reinforcement Learning and the following methods:

Markov Decision Process/Property
Dynamic Programming
Monte Carlo
Temporal Difference
SARSA (State-Action-Reward-Action)
Q-Learning

To close it out I developed an environment to train a Maze Solving agent (Q-Learning Maze Solving Agent project).

It’s been a brilliant ride. Check it out, I think it’ll make sense to an engineer as the algorithm is quite straight forward:

Roll a dice (or similar)
If it’s a 1, randomly choose a direction, otherwise take the best direction stored in your list
Calculate the reward for the new location and update your list with that new reward
Repeat steps 1 to 3
Let me know what you think (Optional in the Q-Learning algorithm)

In doing this project I researched further, and this post is an overview of the key advancements in the related fields. This post is the output of that effort. It’s a timeline of the major publications in the fields related to reinforcement learning and Neural Networks, up until the turn of the millennium.

After 2000 deep RL becomes a thing (Q-Learning and Neural Networks) plus some humans start to train other humans to continually scroll a website. I’m compartmentalising that until I understand (fully/better/enough of) what happened in getting humans (us) to that point.

So fill your boots. I hope to add more to this series, an example would be the Value and Policy Iteration algorithms, definitely Q-Learning algorithm and code, maybe Monte Carlo examples (I’m curious but it’s not yet on my critical path).

Early Foundations (1894-1913)

1894

Researcher: C. Lloyd Morgan
Work: Trial and error

Reportedly coined the term “Trial and error”.

1898

Researcher: Edward Thorndike
Work: Animal Intelligence: An Experimental Study of the Associative Processes in Animals

Edward Thorndike was quite a character, he lived in a small apartment with the animals he studied, and his work bridged the gap between psychology and behaviourism (later developed by Skinner and others).

During this period he conducted puzzle box experiments, providing first empirical evidence of learning through consequences.

1903

Researcher: Edward Thorndike
Work: Educational Psychology: The Psychology of Learning Volume I

Volume 1, from what I have read he got a lot of feedback on his experiments over ~5 years, this was a marked improvement.

1906

Researcher: Andrey Markov
Work: Introduction of Markov chains (note: I do not read Russian so I hope this is the correct paper).
I cannot find the original in English, however it is reproduced in Appendix B of Howard’s 2012 book Dynamic Probabilistic Systems, Volume I: Markov Models .

Developed mathematical foundation for describing state transitions in probability theory. The key property (known as the Markov Property) is crucial for RL frameworks, any change to the current state is independent of the previous states.

Not a paper but a very cool web util to understand and tweak Markov Chains: https://setosa.io/blog/2014/07/26/markov-chains/index.html

1913

Researcher: Edward Thorndike
Work: “Educational Psychology: The Psychology of Learning Volume II”

Formalized three fundamental laws of learning, they seem quite appropriate still today.

Law of Effect
Law of Exercise
Law of Readiness.

Early Behavioral Sciences and Mathematics (1938-1952)

1938

Researcher: B.F. Skinner
Work: The Behavior of Organisms: An Experimental Analysis

Introduced operant conditioning and systematic study of reinforcement. Minksy referenced his work. I need to read up some more on what he did.

1943

Researchers: Warren McCulloch & Walter Pitts
Work: A Logical Calculus of the Ideas Immanent in Nervous Activity

Laid the groundwork for the building of the perceptron.

1943

Researcher: Clark Hull
Work: Principles of Behavior

A researcher/scientist that I have not read anything about yet. His work appears to be important as it is reported that he significantly moved the mathematical representations of behaviour forward.

1951-1952

Researcher: Clark Hull
Work: A Behavior System

Refined mathematical framework for behavior, expanded behavioral equations.

Simulation: the Monte Carlo Method (1946 - 1949)

1946-1949

Researchers: Stanislaw Ulam, John von Neumann, Nicholas Metropolis
Work: The Monte Carlo Method
History: Monte Carlo Method

Developed in secret at Los Alamos as an improvement to the simulations performed when developing nuclear weapons. Later became crucial for RL as a method for estimating value functions from experience.

Cybernetics, Artificial Intelligence, and the Perceptron (1948 - 1961)

1948

Researcher: Norbert Wiener
Work: Cybernetics

A book that brought together ideas and theories on computation, time, feedback, nervous systems, mental capability, and information and language. Was updated in 1961 to include a chapter on learning and self-reproduction, and another on Brain waves and self-organising systems. Seems well worth a read.

1955

Researcher: John McCarthy, Marvin Minsky, Nathaniel Rochester, and Claude Shannon
Work: A Proposal for the Dartmouth Summer Research Project on Artificial Intelligence

“The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.”

1958

Researcher: Frank Rosenblatt
Work: The Perceptron

Developed the first trainable neural network, capable of basic pattern recognition.

1961

Researcher: Marvin Minsky
Work: Steps Toward Artificial Intelligence

First use of the term “reinforcement learning”, connected behavioral and mathematical approaches.

Control Theory and Dynamic Programming Era (1957-1962)

1957

Researcher: Richard Bellman
Work: Dynamic Programming

Introduced dynamic programming and the important Bellman Equation (if you know one value in a known system you can calulcate them all) and the concept of value functions, developed Value Iteration algorithm.

1960/1962

Researcher: Ronald Howard
Work:
Paper (1960): Dynamic Programming and Markov Processes
Book (1962): Dynamic Programming and Markov Processes

Developed Policy Iteration algorithm, formalized MDPs in decision-making.

Neural Network Development and First Winter (1969-1974)

1969

Researchers: Marvin Minsky & Seymour Papert
Work: Perceptrons

Demonstrated limitations of single-layer networks, appears to have been a strong influence for the start of the first AI winter.

1974

Researcher: Paul Werbos
Work: Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Science

Developed backpropagation algorithm, didn’t really flourish in the (AI) winter climate.

Modern Neural Network Renaissance (1982-1986)

1982

Researcher: John Hopfield
Work: Neural networks and physical systems with emergent collective computational abilities

A Brain-Inspired Algorithm For Memory

1986

Researchers: David Rumelhart, Geoffrey Hinton, & Ronald Williams
Work: Learning representations by back-propagating errors

The Nobel Prize winning paper. Popularized backpropagation, demonstrated practical training of multilayer networks. Hinton went on to be involved with further famous and significant advances in the field.

1986

Researchers: David Rumelhart & James McClelland
Work: Parallel Distributed Processing Chapter 1 | Parallel Distributed Processing Chapter 2

I understand that this showed how MLPs could overcome previous limitations. I need to look into this.

Modern RL Development (1987-1998)

1987

Researcher: Richard Sutton & Andrew Barto
Work: A temporal-difference model of classical conditioning

Introduced Temporal Difference learning, bridging Monte Carlo and Dynamic Programming approaches.

1989

Researcher: Christopher Watkins
Work: Learning from Delayed Rewards

This is a very important advancement in reinforcement learning. The Q-Learning algorithm enables off-policy learning. This is worth digging into a lot (including concepts like Actualism and Possibilism).

1998

Researchers: Richard Sutton & Andrew Barto
Work: Reinforcement Learning: An Introduction (this is the second edition from 2018)

The Royale with Cheese. The Big Mac.Updating in 2018.

Next up?

To be decided!

Learning Reinforcement Learning RL Series