Context may be all you need

[IA Series 5/n] The Evolution from Logic to Probability to Deep Learning: A course correction to Transformers

Tuesday, May 20, 2025

Introduction In the previous post, I shared my view on “Why Study Logic?”, we looked at the Knowledge Representation and highlighted the importance of Logic and Reasoning in storing and accessing Knowledge. In this post I’m going to highlight a section from the book “Introduction to Artificial Intelligence” by Wolfgang Ertel. His approach with this book was to make AI more accessible than Russel and Norvig’s 1000+ page bible. It worked for me.

Continue reading →

[IA Series 4/n] A Big Question: Why Study Logic in a World of Probabilistic AI?

Monday, May 19, 2025

Introduction The purpose of this article is to help me answer the question “Why am I studying Logic?”. If it helps you, that’d be great, let me know! The question comes from a nagging feeling of, why don’t I see logic used more in the ‘real world’. It could be a personal bias as I more easily see the utility of Rosenblatt’s work, where he looked at both Symbolic Logic and Probability Theory to help solve a problem and choose Probability Theory ([NN Series 1/n] From Neurons to Neural Networks: The Perceptron), with that we had the birth of the Artificial Neuron and the rest is history!

Continue reading →

"There must be an invisible sun, giving heat to everyone"

Sunday, May 18, 2025

“There must be an invisible sun, giving heat to everyone”

Continue reading →

[IA Series 3/n] Intelligent Agents Term Sheet

Friday, May 16, 2025

“[IA Series 3/n] Intelligent Agents Term Sheet” breaks down essential AI terminology from Russell & Norvig’s seminal textbook. Learn what makes agents rational (or irrational), understand different agent types, and follow a structured 5-step design process from environment analysis to implementation. Perfect reference for AI practitioners and students. Coming next: how agents mirror human traits. #ArtificialIntelligence #IntelligentAgents #AIDesign

Continue reading →

Building an Intelligent Agent

Saturday, May 10, 2025

First draft in public 😱 😆🤓 What’s the best way for an agent to build a semantically sound and syntactically correct knowledge base? Dog fooding my course material means the first step is to define the task environment. /Checks notes Task Environment: The description of Performance, Environment, Actuators, and Sensors (PEAS). This provides a complete specification of the problem domain. So how can I implement this 🤔 First I need to think on the domain, something different to the examples (e.

Continue reading →

[zero-RL] Summarising what LUFFY offers

Tuesday, April 29, 2025

Here’s a “standard” progression of training methodologies: PRE-Training - This is where the model gains broad knowledge, forming the foundation necessary for reasoning. CPT (Continued Pre-training) - Makes the model knowledgeable about specific domains. SFT (Supervised Fine-Tuning) - Makes the model skilled at specific tasks by leveraging knowledge it already has. RL (Reinforcement Learning) - Using methods like GRPO, DPO to align model behavior. Reasoning traces play different roles at each stage:

Continue reading →

[zero-RL] where is the exploration?

Tuesday, April 29, 2025

Source: Off Policy “zero RL” in simple terms Results demonstrate that LUFFY encourages the model to imitate high-quality reasoning traces while maintaining exploration of its own sampling space. Authors introduce policy shaping via regularized importance sampling, which amplifies learning signals for low-probability yet crucial actions under “off-policy” guidance. The aspect that is still not clear to me is how there is any exploration of the solution space.

Continue reading →

[zero-RL] LUFFY: Learning to reason Under oFF policY guidance

Monday, April 28, 2025

Based on conventional zero-RL methods such as GRPO, LUFFY introduces off-policy reasoning traces (e.g., from DeepSeek-R1) and combines them with models' on-policy roll-outs before advantage computation. … However, naively combining off-policy traces can lead to overly rapid convergence and entropy collapse, causing the model to latch onto superficial patterns rather than acquiring genuine reasoning capabilities. …genuine reasoning capabilities… I am not certain if the implication is that Deepseek-R1 can reason or that it is a reminder that no model cam genuinely reason.

Continue reading →

[zero-RL] what is it?

Monday, April 28, 2025

Zero-RL applies reinforcement learning RL to base LM directly, eliciting reasoning potentials using models' own rollouts. A fundamental limitation worth highlighting: it is inherently “on-policy”, constraining learning exclusively to the model’s self-generated outputs through iterative trials and feedback cycles. Despite showing promising results, zero-RL is bounded by the base LLM itself. A key characteristic is that it means a LLM can be trained without Supervised Fine Tuning (SFT).

Continue reading →

[zero-RL] When you SFT a smaller LM on the reasoning traces of a larger LM

Monday, April 28, 2025

You are doing Imitation Learning (specifically Behavioral Cloning) because the goal and mechanism involve mimicking the expert’s token sequences. You are doing Transfer Learning (specifically Knowledge Distillation) because you are transferring reasoning knowledge from a teacher model to a student model. You are not doing Off-Policy Reinforcement Learning because the learning process is supervised likelihood maximization, not reward maximization using RL algorithms. Although the data itself is “off-policy” (not generated by the model being trained), the learning paradigm is supervised imitation, not RL.

Continue reading →

Notes and links on SVMs (WIP)

Saturday, April 26, 2025

Support Vector Machines (SVM) are a mathematical approach for classifying data by finding optimal separating hyperplanes, applicable even in non-linear scenarios using kernel methods.

Continue reading →

[IA Series 2/n] Search Algorithms and Intelligent Agents

Thursday, April 24, 2025

The document discusses various search algorithms used by Intelligent Agents for navigating mazes, detailing their types, characteristics, tradeoffs, and implementations.

Continue reading →

[IA Series 1/n] AI Search - Terms and Algorithms

Thursday, April 24, 2025

This text introduces key concepts and algorithms related to intelligent agents in AI, focusing on search terms, uninformed and informed search strategies, and adversarial search techniques.

Continue reading →

[Python Series 1/n] Modern Python Package Management: pipx and uv for Data Scientists

Tuesday, April 22, 2025

This post is inspired by a conversation with a fellow Data Science and AI student. It’s from the conversation, co-authored with Claude. Hope it’s useful! Were to begin? When you’re starting your data science journey with Python, one of the first roadblocks you’ll encounter is package management. If you’ve tried conda and found it frustrating (as many do), you’re not alone. Let’s explore two modern tools that make Python package management simpler and more reliable: pipx and uv.

Continue reading →

Dystopia? It's already here and that's OK. Here's why.

Sunday, April 20, 2025

The text reflects on the misuse of technology and ethics in Silicon Valley, highlighting the importance of awareness and compassion amidst current challenges.

Continue reading →

Sunday, April 13, 2025 →

finally found something I wanted to use ChatGPT image generation for! On the fridge and the family loves it, going to be a busy week 👨🏻‍🌾

Sunday, April 13, 2025 →

happiness is

    "django_cotton",
    "template_partials.apps.SimpleAppConfig",

unhappiness (for what seems like an eternity) is:

    "template_partials.apps.SimpleAppConfig",
    "django_cotton",

How did America break itself? Ideological sabotage of the scientific method and how to counter it.

Friday, April 11, 2025

Great podcast where she talks about why America is broken. Ideological sabotage (which surprised me, I’d thought the initial perpetrators would have done it for money) of the scientific method to protect the “right of freedom” has done exactly the opposite… It really feels like some Americas are stuck fighting against no longer existence foes, either the British tyranny and taxation without representation or the war between capitalism and communism.

Continue reading →

Thursday, April 10, 2025 →

China’s first heterogenous humanoid robot training facility

www.globaltimes.cn/page/2025…

During the first phase of the project, the robots will be trained with approximately 45 atomic skills such as grasping, picking, placing and transporting

A single action may need to be repeated up to 600 times a day by a data collector for the robots to learn from

10 key scenarios, including industrial, domestic, and tourism services

It is expected that the collection of over 10 million real-machine data entries will be achieved within the year

Is the EU AI Act Killing Startups? A Medical Device Perspective

Monday, April 7, 2025

The analysis concludes that while the EU AI Act does not obstruct startups, it presents both challenges and opportunities for innovation within a complex regulatory landscape.

Continue reading →