[RL Series 2/n] From Animals to Agents: Linking Psychology, Behaviour, Mathematics, and Decision Making

intro

Maths, computation, the mind, and related fields are a fascination for me.

I had thought I was quite well informed and to a large degree I did know most of the science in more traditional Computer Science (it was my undergraduate degree…). What had slipped me by was reinforcement learning, both its mathematical grounding and value of application. If you’ve come from the previous post ([RL Series 1/n] Defining Artificial Intelligence and Reinforcement Learning) you know I’ve said something like that already. I’ll get over it and stop repeating myself soon, I’m sure 😉.

I’ve been told that an image at the top of a blog post is more engaging and I’d like you to feel welcome and engage, so here’s an image, generated by ChatGPT/DALLE, of the timeline in this post:

From Animals to Agents: Linking Psychology, Behaviour, Mathematics, and Decision Making

more preamble

So the whole Agent-Environment set up was new to me, introduced as part of a course I’m doing on AI Applications. We covered some theory on the Basics of Reinforcement Learning and the following methods:

  • Markov Decision Process/Property
  • Dynamic Programming
  • Monte Carlo
  • Temporal Difference
  • SARSA (State-Action-Reward-Action)
  • Q-Learning

To close it out I developed an environment to train a Maze Solving agent (Q-Learning Maze Solving Agent project).

It’s been a brilliant ride. Check it out, I think it’ll make sense to an engineer as the algorithm is quite straight forward:

  1. Roll a dice (or similar)
  2. If it’s a 1, randomly choose a direction, otherwise take the best direction stored in your list
  3. Calculate the reward for the new location and update your list with that new reward
  4. Repeat steps 1 to 3
  5. Let me know what you think (Optional in the Q-Learning algorithm)

In doing this project I researched further, and this post is an overview of the key advancements in the related fields. This post is the output of that effort. It’s a timeline of the major publications in the fields related to reinforcement learning and Neural Networks, up until the turn of the millennium.

After 2000 deep RL becomes a thing (Q-Learning and Neural Networks) plus some humans start to train other humans to continually scroll a website. I’m compartmentalising that until I understand (fully/better/enough of) what happened in getting humans (us) to that point.

So fill your boots. I hope to add more to this series, an example would be the Value and Policy Iteration algorithms, definitely Q-Learning algorithm and code, maybe Monte Carlo examples (I’m curious but it’s not yet on my critical path).

Early Foundations (1894-1913)

1894

Reportedly coined the term “Trial and error”.

1898

Edward Thorndike was quite a character, he lived in a small apartment with the animals he studied, and his work bridged the gap between psychology and behaviourism (later developed by Skinner and others).

During this period he conducted puzzle box experiments, providing first empirical evidence of learning through consequences.

1903

Volume 1, from what I have read he got a lot of feedback on his experiments over ~5 years, this was a marked improvement.

1906

Developed mathematical foundation for describing state transitions in probability theory. The key property (known as the Markov Property) is crucial for RL frameworks, any change to the current state is independent of the previous states.

Not a paper but a very cool web util to understand and tweak Markov Chains: https://setosa.io/blog/2014/07/26/markov-chains/index.html

1913

Formalized three fundamental laws of learning, they seem quite appropriate still today.

  • Law of Effect
  • Law of Exercise
  • Law of Readiness.

Early Behavioral Sciences and Mathematics (1938-1952)

1938

Introduced operant conditioning and systematic study of reinforcement. Minksy referenced his work. I need to read up some more on what he did.

1943

Laid the groundwork for the building of the perceptron.

1943

A researcher/scientist that I have not read anything about yet. His work appears to be important as it is reported that he significantly moved the mathematical representations of behaviour forward.

1951-1952

Refined mathematical framework for behavior, expanded behavioral equations.

Simulation: the Monte Carlo Method (1946 - 1949)

1946-1949

Developed in secret at Los Alamos as an improvement to the simulations performed when developing nuclear weapons. Later became crucial for RL as a method for estimating value functions from experience.

Cybernetics, Artificial Intelligence, and the Perceptron (1948 - 1961)

1948

A book that brought together ideas and theories on computation, time, feedback, nervous systems, mental capability, and information and language. Was updated in 1961 to include a chapter on learning and self-reproduction, and another on Brain waves and self-organising systems. Seems well worth a read.

1955

The study is to proceed on the basis of the conjecture that every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate it.

1958

Developed the first trainable neural network, capable of basic pattern recognition.

1961

First use of the term “reinforcement learning”, connected behavioral and mathematical approaches.

Control Theory and Dynamic Programming Era (1957-1962)

1957

Introduced dynamic programming and the important Bellman Equation (if you know one value in a known system you can calulcate them all) and the concept of value functions, developed Value Iteration algorithm.

1960/1962

Developed Policy Iteration algorithm, formalized MDPs in decision-making.

Neural Network Development and First Winter (1969-1974)

1969

  • Researchers: Marvin Minsky & Seymour Papert
  • Work: Perceptrons

Demonstrated limitations of single-layer networks, appears to have been a strong influence for the start of the first AI winter.

1974

Developed backpropagation algorithm, didn’t really flourish in the (AI) winter climate.

Modern Neural Network Renaissance (1982-1986)

1982

A Brain-Inspired Algorithm For Memory

1986

The Nobel Prize winning paper. Popularized backpropagation, demonstrated practical training of multilayer networks. Hinton went on to be involved with further famous and significant advances in the field.

1986

I understand that this showed how MLPs could overcome previous limitations. I need to look into this.

Modern RL Development (1987-1998)

1987

Introduced Temporal Difference learning, bridging Monte Carlo and Dynamic Programming approaches.

1989

This is a very important advancement in reinforcement learning. The Q-Learning algorithm enables off-policy learning. This is worth digging into a lot (including concepts like Actualism and Possibilism).

1998

The Royale with Cheese. The Big Mac.Updating in 2018.

Next up?

To be decided!