: [zero-RL] Summarising what LUFFY offers Here’s a “standard” progression of training methodologies: PRE-Training - This is …
: [zero-RL] where is the exploration? Source: Off Policy “zero RL” in simple terms Results demonstrate that LUFFY encourages …
: [zero-RL] LUFFY: Learning to reason Under oFF policY guidance Based on conventional zero-RL methods such as GRPO, LUFFY introduces off-policy reasoning traces …
: [zero-RL] what is it? Zero-RL applies reinforcement learning RL to base LM directly, eliciting reasoning potentials using …
: [zero-RL] When you SFT a smaller LM on the reasoning traces of a larger LM You are doing Imitation Learning (specifically Behavioral Cloning) because the goal and mechanism …
: Notes and links on SVMs (WIP) Support Vector Machines (SVM) are a mathematical approach for classifying data by finding optimal …
: [IA Series 2/n] Search Algorithms and Intelligent Agents The document discusses various search algorithms used by Intelligent Agents for navigating mazes, …
: [IA Series 1/n] AI Search - Terms and Algorithms This text introduces key concepts and algorithms related to intelligent agents in AI, focusing on …
: [Python Series 1/n] Modern Python Package Management: pipx and uv for Data Scientists This post is inspired by a conversation with a fellow Data Science and AI student. It’s from …
: Dystopia? It's already here and that's OK. Here's why. The text reflects on the misuse of technology and ethics in Silicon Valley, highlighting the …
: finally found something I wanted to use ChatGPT image generation for! On the fridge and the family …
: happiness is "django_cotton", "template_partials.apps.SimpleAppConfig", …
: How did America break itself? Ideological sabotage of the scientific method and how to counter it. Great podcast where she talks about why America is broken. Ideological sabotage (which surprised me, …
: China’s first heterogenous humanoid robot training facility …
: Is the EU AI Act Killing Startups? A Medical Device Perspective The analysis concludes that while the EU AI Act does not obstruct startups, it presents both …
: The cold has fully kicked in now, and has a hint of covid about it… 😵😷 Plans to wire up the …
: “But who was learning, you or the machine?” “Well, I suppose we both were” …
: Clearly there are thoughtful, well spoken politicians in America. youtu.be/ubBnUCXj4… I hope …
: BBC news article is very clear… The Russian president has given the US leader just enough to …
: New wave of Innovators: why AI won't replace software engineering There’s a lot of change at the moment, my feed is all about foreign policies, US government …
: [NN Series 5/n] Regularisation: reducing the complexity of a model without compromising accuracy Regularisation is known to reduce overfitting when training a neural network. As with a lot of these …
: A speculative recipe for useful agentic behaviours define actions by Promise Theory train multiple neural nets to classify an action for a given input …
: Flow and decisions - almost a parable (I forget exactly but I’m pretty sure this is from an Alan Watts lecture). A farmer needs some …
: [NN Series 4/n] Feature Normalisation This is an interesting one as I’d thought it was quite academic, with limited utility. Then I …
: From Green Mars by Kim Stanley Robinson.
: [NN Series 3/n] Calculating the error before quantisation: Gradient Descent Next I’m looking at the Adaline in python code. This post is a mixture of what I’ve …
: [NN Series 2/n] Circuits that can be trained to match patterns: The Adaline The text discusses the development and significance of the Adaline artificial neuron, highlighting …
: #BeingHuman - look after your << self >>: love is all it needs. The author shares personal reflections on self-kindness and positive thinking as tools for finding …
: #BeingHuman and a Dad. My wife and I have 3 main concerns with our daughters use of phones and Social Media what her …
: Pondering Agency and Consciousness #BeingHuman Had a nice exchange about Agency with Paul Burchard on LinkedIn this morning. My thinking goes …
: [NN Series 1/n] From Neurons to Neural Networks: The Perceptron This post looks at the Percepton, from Frank Rosenblatt’s original paper to a practical …
: This is not normal nor is it ok. Meta is now the pervy old man you have to teach your kids to avoid. …
: Nice opening. Looking forward to reading more! Nous pouvons et devons bâtir l’intelligence …
: First test with a “reasoning” model, pleasantly surprised. Not sure how to integrate it …
: How do humans decipher reward in an uncertain state and environment? Imitation seems the most …
: If I could answer any question in science, I’d find out what involvement the neurons in our …
: [RL Series 2/n] From Animals to Agents: Linking Psychology, Behaviour, Mathematics, and Decision Making intro Maths, computation, the mind, and related fields are a fascination for me. I had thought I was …
: The challenges of being human: mistaking prediction, narratives, and rhetoric for reasoning I read an insightful comment within the current wave of LLM Reasoning hype. It has stuck with me. At …
: [RL Series 1/n] Defining Artificial Intelligence and Reinforcement Learning intro I’m learning about Reinforcement Learning, it’s an area that has a lot of intrigue …
: What is Off-Policy learning? I’ve recently dug into Temporal Difference algorithms for Reinforcement Learning. The field of …
: Are LLM learning skills rather than being Stochastic Parrots? A Theory for Emergence of Complex Skills in Language Models Skill-Mix: a Flexible and Expandable …
: Domain Specific Languages Ray Myers has started his Year of Domain Specific Languages 🎉 I listened to the first episode …
: Finished reading: Red Mars by Kim Stanley Robinson 📚 A great book, other than it being a highly …
: Dopamine as temporal difference errors !! 🤯 I expect I’m sharing a dopamine burst that I experienced! 🤓 I’m listening to The …
: It is possible for dopamine to write cheques that the environment cannot cash. At which point the …
: Nice lunch time walk into the village
: Nice summary and stark reminder of what’s happening right now. Only CEOs are making the …
: [video] Crew.ai experiment with Cyber Threat Intelligence
: Agentic behaviours My initial thoughts, expressed via the medium of sport, on agentic behaviours plus friends view, …
: [short] Why use tools with an LLM?
: Project Euler meets Powershell - Problem #4 <# A palindromic number reads the same both ways. The largest palindrome made from the product of …
: Project Euler meets Powershell - reworking factorial to avoid PowerShell 1000 recursions limit Turns out it was the recursion the factorial function - I’ve reworked it to use a for loop function …
: Project Euler meets Powershell - Problem #3 amonkeyseulersolutions: I’ve read that an integer p > 1 is prime if and only if the factorial (p …
: Project Euler meets Powershell - largest prime factor of a value? things to do… No more than the square root of the value. test each value? start from the square root …
: Project Euler meets Powershell - isprime I’ve read that an integer p > 1 is prime if and only if the factorial (p - 1)! + 1 is divisible …
: Project Euler meets Powershell - factorial... function factorial { [cmdletbinding()] param([int64] $x) if ($x -lt 1) { return "Has to be on a …
: Project Euler - Problem 2 Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting …
: Project Euler meets Powershell - Problem 1 If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9. The …