π Explanation of the getKeys Operator
Tuesday, November 11, 2025
An explanation of the getKeys operator in the Introduction to AI Planning paper from Marco Aiello and Ilche Georgievski (16 Dec 2024).
Tuesday, November 11, 2025
An explanation of the getKeys operator in the Introduction to AI Planning paper from Marco Aiello and Ilche Georgievski (16 Dec 2024).
Thursday, November 6, 2025
The more ways a system can achieve a function, the more robust and adaptable it becomes. I think it is fair to say we tend to think of βdegenerateβ as a pejorative. Something broken, collapsing, or inferior. But in complex systems β biological, neural, or artificial β degeneracy means something far more interesting: different structures performing similar functions.
Thursday, November 6, 2025
Generalisation… I am comparing State Spaces and Solution Spaces and realised that I may be talking about generalisation…. The post is diving into the definitions to prompt some thought.
Wednesday, November 5, 2025 β
Planning is offline search.
Planning is offline search.
Planning is…. Yup, it is offline search.
Wednesday, October 22, 2025
β"Domain Modelling is itself the process of learning, you cannot know it all at the start, and should expect to update aspects at any stage of the product development".
Monday, October 6, 2025
Do I need patience to learn to direct coding agents or is it time to learn a new language to learn and develop in that?
Friday, October 3, 2025
This is about connection - both with a fellow human interested in and articulate about Artifical Intelligence and the connection of the information inputed, processed, and produced. The Information - LLM-as-a-Judge - we chat about the survey paper and how it can be applied to modern AI Applications. There’s a human written blog post, a Youtube video, and a NotebookLM to chat to. Fill your boots :)
Friday, October 3, 2025
Google Meet, Youtube, and NotebookLM make for great research utilities.
Thursday, October 2, 2025
Some great comments on pushing deeper into the tech stack, get closer to GPUs, and keep your eyes open, the next thing can come from anywhere.
Monday, September 1, 2025
I asked Claude to output the userStyle to the chat - glad I did as I’m not on PhD level topics but it needed to change approach !
Thursday, August 14, 2025
Functional Information: a way to represent information that has come to be useful over time. That is, information that provides a function for itself or other pieces of information (e.g. a crab!). Could it be used to evaluate what is AGI? It appears as an elegant law and equation that provides opposition to the decay in systems covered by the second law of thermodynamics. A formula to evaluate the evolution of functional information in both the physical and digital worlds!
Tuesday, August 12, 2025
Remastered broadcast of a 1964 lecture by Richard Feynman on the Double Split experiment. Finished with a call to action on having open priors to evidence we see from Mother Nature!
Monday, August 4, 2025
Linking Entropy as the guide of when to use Principle Component Analysis
Monday, August 4, 2025
Term sheet for key statistical ideas
Wednesday, July 23, 2025
This is a bit of a rant. I’ve memories of a senior manager shooting ideas down saying “correlation is not causation, I’ve done Stats at Uni and can prove anything is related to baked beans” It grated sooooo much. Firstly as it was thoughtless rhetoric, either purposefully or accidentally steam rolling ideas. Immediately dismissing any attempts at constructive data driven decisions. Secondly it grated because I didn’t have the tools to show causation.
Thursday, July 10, 2025
A day vibe-coding, as a break from the normal routine of study. Done in new environment with language I’ve not used - VS Code extension in TypeScript .
Friday, July 4, 2025
I am very torn between two possibilities : Building on my Q-Learning Maze Solving Agent I did for AI Applications (Q-Learning Maze Solving Agent) by adding a Neural Network (Sutton and Barton) Building on the Intelligent Agents work I did in AI by applying an Agent Decisions Process (Self-Consistency LLM-Agent) (the process is my interpretation of Russell and Norvig’s work) (The Douglas Adam’s extra option π€) Adding a cached “self-awareness” layer based on a Bayesian Learning Agent that stores it’s certainty on answers it gives.
Friday, June 27, 2025
I wish I had time to finish: my research on the Evolution of Probalisitic Reasoning in AI Particularly Dempster-Shafer and Bayesian Networks How LLMs and Bayesian networks can be used for Risk Management create an youtube/insta/tiktok vid for my latest post on LLM Agent But I don’t!! So this is me putting it to one side…
Friday, June 27, 2025
Polish is cheap in this Brave New World of AI. Being scrappy is a way of being authentic and, most importantly, Being Human!
Thursday, June 26, 2025
Building a Self-Consistency LLM-Agent: From PEAS Analysis to Production Code - a guide to designing an LLM-based agent.
Tuesday, May 20, 2025
Introduction In the previous post, I shared my view on “Why Study Logic?”, we looked at the Knowledge Representation and highlighted the importance of Logic and Reasoning in storing and accessing Knowledge. In this post I’m going to highlight a section from the book “Introduction to Artificial Intelligence” by Wolfgang Ertel. His approach with this book was to make AI more accessible than Russel and Norvig’s 1000+ page bible. It worked for me.
Monday, May 19, 2025
Introduction The purpose of this article is to help me answer the question “Why am I studying Logic?”. If it helps you, that’d be great, let me know! The question comes from a nagging feeling of, why don’t I see logic used more in the ‘real world’. It could be a personal bias as I more easily see the utility of Rosenblatt’s work, where he looked at both Symbolic Logic and Probability Theory to help solve a problem and choose Probability Theory ([NN Series 1/n] From Neurons to Neural Networks: The Perceptron), with that we had the birth of the Artificial Neuron and the rest is history!
Friday, May 16, 2025
“[IA Series 3/n] Intelligent Agents Term Sheet” breaks down essential AI terminology from Russell & Norvig’s seminal textbook. Learn what makes agents rational (or irrational), understand different agent types, and follow a structured 5-step design process from environment analysis to implementation. Perfect reference for AI practitioners and students. Coming next: how agents mirror human traits. #ArtificialIntelligence #IntelligentAgents #AIDesign
Saturday, May 10, 2025
First draft in public π± ππ€ What’s the best way for an agent to build a semantically sound and syntactically correct knowledge base? Dog fooding my course material means the first step is to define the task environment. /Checks notes Task Environment: The description of Performance, Environment, Actuators, and Sensors (PEAS). This provides a complete specification of the problem domain. So how can I implement this π€ First I need to think on the domain, something different to the examples (e.
Tuesday, April 29, 2025
Here’s a “standard” progression of training methodologies: PRE-Training - This is where the model gains broad knowledge, forming the foundation necessary for reasoning. CPT (Continued Pre-training) - Makes the model knowledgeable about specific domains. SFT (Supervised Fine-Tuning) - Makes the model skilled at specific tasks by leveraging knowledge it already has. RL (Reinforcement Learning) - Using methods like GRPO, DPO to align model behavior. Reasoning traces play different roles at each stage:
Tuesday, April 29, 2025
Source: Off Policy “zero RL” in simple terms Results demonstrate that LUFFY encourages the model to imitate high-quality reasoning traces while maintaining exploration of its own sampling space. Authors introduce policy shaping via regularized importance sampling, which amplifies learning signals for low-probability yet crucial actions under “off-policy” guidance. The aspect that is still not clear to me is how there is any exploration of the solution space.
Monday, April 28, 2025
Based on conventional zero-RL methods such as GRPO, LUFFY introduces off-policy reasoning traces (e.g., from DeepSeek-R1) and combines them with models' on-policy roll-outs before advantage computation. … However, naively combining off-policy traces can lead to overly rapid convergence and entropy collapse, causing the model to latch onto superficial patterns rather than acquiring genuine reasoning capabilities. …genuine reasoning capabilities… I am not certain if the implication is that Deepseek-R1 can reason or that it is a reminder that no model cam genuinely reason.
Monday, April 28, 2025
Zero-RL applies reinforcement learning RL to base LM directly, eliciting reasoning potentials using models' own rollouts. A fundamental limitation worth highlighting: it is inherently “on-policy”, constraining learning exclusively to the model’s self-generated outputs through iterative trials and feedback cycles. Despite showing promising results, zero-RL is bounded by the base LLM itself. A key characteristic is that it means a LLM can be trained without Supervised Fine Tuning (SFT).
Monday, April 28, 2025
You are doing Imitation Learning (specifically Behavioral Cloning) because the goal and mechanism involve mimicking the expert’s token sequences. You are doing Transfer Learning (specifically Knowledge Distillation) because you are transferring reasoning knowledge from a teacher model to a student model. You are not doing Off-Policy Reinforcement Learning because the learning process is supervised likelihood maximization, not reward maximization using RL algorithms. Although the data itself is “off-policy” (not generated by the model being trained), the learning paradigm is supervised imitation, not RL.
Saturday, April 26, 2025
Support Vector Machines (SVM) are a mathematical approach for classifying data by finding optimal separating hyperplanes, applicable even in non-linear scenarios using kernel methods.