First test with a “reasoning” model, pleasantly surprised.

Not sure how to integrate it into my workflow though, there’s a big response!!

How do humans decipher reward in an uncertain state and environment?

Imitation seems the most likely, supported by extended solitude usually leading to a depressed state.

Feels like a question to run a human Monte Carlo Tree Search on!

#BeingHuman #ReinforcementLearning #InverseReinforcementLearning

If I could answer any question in science, I’d find out what involvement the neurons in our heart and gut have in decision making and how we view ourselves.

What about you?

#BeingHuman #ThatsNotAWeekendProject 🙃

[RL Series 2/n] From Animals to Agents: Linking Psychology, Behaviour, Mathematics, and Decision Making

intro Maths, computation, the mind, and related fields are a fascination for me. I had thought I was quite well informed and to a large degree I did know most of the science in more traditional Computer Science (it was my undergraduate degree…). What had slipped me by was reinforcement learning, both its mathematical grounding and value of application. If you’ve come from the previous post ([RL Series 1/n] Defining Artificial Intelligence and Reinforcement Learning) you know I’ve said something like that already.

Continue reading →

The challenges of being human: mistaking prediction, narratives, and rhetoric for reasoning

I read an insightful comment within the current wave of LLM Reasoning hype. It has stuck with me. At least two reasons: It reminded me of my view that AGI is already here in the guise of companies It’s also a valid answer as to why I meditate and why Searle’s Chinese Room is mainly wrong Back to the comment, paraphrased it said: “the uncomfortable truth that these reasoning models show us that a lot of activities that we thought need human reasoning to complete simply need functional predictions”

Continue reading →

[RL Series 1/n] Defining Artificial Intelligence and Reinforcement Learning

intro I’m learning about Reinforcement Learning, it’s an area that has a lot of intrigue for me. The first I recall hearing of it was when ChatGPT wes released and it was said Reinforcement Learning from Human Feedback was the key to making it so fluent in responses. Since then I’m studying AI and Data Science for a Masters so with that I’m stepping back to understand the domain in greater detail.

Continue reading →

What is Off-Policy learning?

I’ve recently dug into Temporal Difference algorithms for Reinforcement Learning. The field of study has been a ride, from Animals in the late 1890s to Control Theory, Agents and back to Animals in the 1990s (and on). It’s accumulated in me developing a Q-Learning agent, and learning about hyperparameter sweeps and statistical significance, all relevant to the efficiency of Off-policy learning but topics for another day. I write this as it took a moment for me to realise what off-policy learning actually is.

Continue reading →

Are LLM learning skills rather than being Stochastic Parrots?

A Theory for Emergence of Complex Skills in Language Models Skill-Mix: a Flexible and Expandable Family of Evaluations for AI models www.quantamagazine.org/new-theor… youtu.be/fTMMsreAq… related to the authors Arora, S arxiv.org/search/cs Was that Sarcasm?: A Literature Survey on Sarcasm Detection Can Models Learn Skill Composition from Examples? Instruct-SkillMix: A Powerful Pipeline for LLM Instruction Tuning Goyal, A arxiv.org/search/cs Learning Beyond Pattern Matching? Assaying Mathematical Understanding in LLMs Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Continue reading →

Domain Specific Languages

Ray Myers has started his Year of Domain Specific Languages 🎉 I listened to the first episode yesterday, on my bike because I’m getting fit again, and was reminded of when I did something similar. Got me wondering if this is a DSL? 🤔 Around 2007 I set up a CI/CD system for Pershing, using Microsoft Team Foundation Server, PowerShell, and MSI. I’m writing this to help remember the details, probably needs a diagram, however the main principles:

Continue reading →

Finished reading: Red Mars by Kim Stanley Robinson 📚

A great book, other than it being a highly recommended space opera I had little prior knowledge of it.

It’s a story of building a community and industry, starting with scientists, on Mars. Told from the viewpoint of multiple characters, the protagonists are fascinating and you read their stories about why they are on Mars and what they do when there!

The characters are great, each with a unique viewpoint and set of skills; the charismatic dreamer-leader John, grumpy geologist Ann, passionate leader Maya, supremely focused leader Frank, geeky terraformer Sax, enigmatic botanist Hiroku, rebellious Arkady, homesick psychologist Micheal, and the pragmatic engineer Nadia. There are more as well.

The arcs made me feel for the character, want them to succeed, and challenged my view of what was right for the group of Martians in a surprising way.

Had to stop myself immediately picking up Green Mars so I could reflect on it.

Dopamine as temporal difference errors !! 🤯

I expect I’m sharing a dopamine burst that I experienced! 🤓 I’m listening to The Alignment Problem by Brian Christian 📚 and it’s explaining how Dayan, Montague, and Sejnowski* connected Wolfram Schultz’s work to the Temporal Difference algorithm (iirc that’s, of course!, from Sutton and Barto). A quick search returns these to add to my maybe reading list: Dopamine and Temporal Differences Learning (Montague, Dayan & Sejnowski, 1996) Dopamine and temporal difference learning: A fruitful relationship between neuroscience and AI (Deepmind 2020)

Continue reading →

It is possible for dopamine to write cheques that the environment cannot cash. At which point the value function must come back down.

Nice lunch time walk into the village

Nice summary and stark reminder of what’s happening right now.

Only CEOs are making the decisions… they have a vested interest.

Worth keeping in mind it’s not just computing but robotics that are progressing.

Stuart Russel at the World Knowledge Forum 2024

Stuart Russel on Wikipedia

[video] Crew.ai experiment with Cyber Threat Intelligence

Continue reading →

Agentic behaviours

My initial thoughts, expressed via the medium of sport, on agentic behaviours plus friends view, which I think is better (expected as he’s the Basketball player). Jackson is the ethical agent. Pippen is the organizing agent. Harper is the redundant agent. Note: since looking into this I’m not sure agentic is the right term, now thinking of them as simply components of a system.

Continue reading →

[short] Why use tools with an LLM?

Continue reading →

[short] AI Systems

Continue reading →

Project Euler meets Powershell - Problem #4

<# A palindromic number reads the same both ways. The largest palindrome made from the product of two 2-digit numbers is 9009 = 91 × 99. Find the largest palindrome made from the product of two 3-digit numbers. #> # 998001 - so what's the largest palindrome number less than this one - then check if it's a product of 3-digit numbers function ispalindrome { [cmdletbinding()] param( [int] $number ) process{ $digits = @() $next_int = $number while ($next_int -gt 9) { $digit = $next_int % 10 $digits = $digits + $digit $next_int = ($next_int / 10) - (($next_int % 10)/10) #(99099 / 10) - ((99099 % 10)/10) } $digits = $digits + $next_int #$digits #$digits.

Continue reading →

Project Euler meets Powershell - reworking factorial to avoid PowerShell 1000 recursions limit

Turns out it was the recursion the factorial function - I’ve reworked it to use a for loop function factorial { [cmdletbinding()] param($x) if ($x -lt 1) { return "Has to be on a positive integer" } Write-Verbose "Input is $x" $fact = 1 for ($i = $x; $i -igt 0; $i -= 1){ #Write-verbose "i: $i" $fact = [System.Numerics.BigInteger]::Multiply($i , $fact) } Write-Verbose "i equals $i" $fact } it’s still running for factorial 486847…

Continue reading →