Definitions Reference
Working definitions to track and build into the documentation of my research. Generally they are included in the framework or extensions, though I need to learn more about Markov Blankets as I think that could be a boundary between the two state spaces. What the agent can sense and take action on.
Otherwise this post is in an order that has trial logic, both in growing on the initial agency through to planning and learning - potential full autonomy.
2.1 Foundational Definitions
2.1.1 Agency
-
Agent Function (Russell & Norvig, 2020): A specification mapping percept history to action selection: f: P* → A. Defines what the agent decides.
-
Agent Program (Russell & Norvig, 2020): An implementation of the agent function on a specific architecture, mapping current percept to action. Defines how the agent decides.
-
Agent Taxonomy (Russell & Norvig, 2020): The progression distinguished by internal state maintenance and reasoning sophistication:
-
Simple Reflex: Condition → action rules; no internal state; responds only to current percept.
-
Model-Based Reflex: Maintains internal state representing unobserved world aspects; condition → action rules operate on internal state.
-
Goal-Based: Holds explicit goal representation; uses predictive model to simulate hypothetical action sequences; searches for paths achieving goal before acting. Requires generative model of world dynamics P(s'|s,a) to predict outcomes of actions not yet taken.
-
Utility-Based: Goal-based with preference ordering over outcomes; maximizes expected utility rather than satisfying binary goal predicate.
-
-
Weak Agency (Wooldridge & Jennings, 1995): A software system exhibiting autonomy, reactivity, and pro-activeness without implying consciousness or mental states.
2.1.2 Actor Functional Architecture (Ghallab, Nau & Traverso, 2024)
-
Actor: A computational artifact capable of autonomous operation in its environment. Can be software or embodied with sensory-motor devices.
-
Planning: Determining what to do. Open-loop search over predicted states using a predictive model. Synthesizes an organized set of actions leading to a goal. The designer/orchestrator holds the goal representation and performs goal-based search offline.
“Planning is organized as an open-loop search, a look-ahead process based on predictions.”
-
Acting: Determining how to do chosen actions. Closed-loop process with feedback from observed effects. Progressive refinement of abstract actions into concrete commands given current context.
“Acting is a closed-loop process, with feedback from observed effects and events used as input for subsequent decisions.”
-
Learning: Improving performance with greater autonomy and versatility. Two modes:
- End-to-end: Reactive black-box function; effective but difficult to verify.
- Model-based: Explicit predictive models; supports analysis and explanation.
“An actor learns if it improves its performance with more autonomy and versatility, including ways to perform new tasks, and adaptation to new or changing environments.”
-
Descriptive Model: Specifies what effects an action may have and when it is feasible. Relations from precondition to effects. Used during planning.
-
Operational Model: Specifies how to perform an action: what commands to execute in current context. Used during acting.
-
Note: Agent Function/Agent Program parallels Descriptive/Operational models. Derivation rules formalize the transformation.
2.1.3 Rationality Constraints
-
Bounded Rationality (Simon, 1955): Rational agents under computational constraints do not optimize globally; they satisfice, selecting the first solution meeting the aspiration level within the available search budget.
Mapping to AtomicGuard:
- Aspiration level = guard returning ⊤
- Search budget = retry budget before ⊥_fatal
- Satisfice = accept first passing generation
-
Control Boundary (Sutton & Barto, 1998): The agent comprises only components modifiable by the control policy. Components outside this boundary constitute the environment.
2.1.4 Cooperation Models
-
Promise Theory (Burgess, 2015): A model of voluntary cooperation where autonomous agents issue promises regarding intended behavior. The consumer bears responsibility for verifying promise fulfillment, replacing command-and-control assumptions.
Application: The orchestrator treats LLM outputs as if they were promises, applying consumer-side verification via guards. The LLM lacks intentionality—promises are imputed by the architectural pattern.
2.1.5 Model Types
-
Discriminative Model: Directly learns P(s|o) — a mapping from observations to hidden state estimates. Answers “given this data, what’s the classification?” without modeling how observations arise.
- Cannot generate synthetic observations
- Does not support counterfactual reasoning
- Handles missing/partial observations poorly
-
Generative Model: Learns the full joint P(o, s) = P(o|s) × P(s), then derives P(s|o) via Bayes' rule.
- Can generate synthetic observations (sample s, then sample o|s)
- Supports counterfactual reasoning
- Handles missing/partial observations naturally
-
Planning Implication: Goal-based planning requires a generative model to simulate hypothetical futures P(s'|s,a). Precondition-gated execution only needs discriminative guards P(valid|output).
| Aspect | Discriminative | Generative |
|---|---|---|
| Learns | P(s|o) directly | P(o, s) = P(o|s) × P(s) |
| Can generate synthetic observations? | No | Yes |
| Supports counterfactual reasoning? | No | Yes |
| Handles missing/partial observations? | Poorly | Naturally |
2.2 Planning-Execution Separation
2.2.1 Offline Planning (Goal-Based Search)
-
Workflow as Pre-Computed Plan: The workflow structure is the output of goal-based search performed at design time. The designer reasons about hypothetical action sequences to construct a state machine satisfying goal predicates.
-
Goal Representation: The designer/orchestrator holds explicit goal representations. Runtime agents inherit the goal structure implicitly through preconditions and guard predicates.
-
Requires Generative Model: Planning simulates “what if I do A, then B?” using P(s'|s,a). This is goal-based reasoning per R&N.
-
Outputs: Workflow state machine, guard specifications, action preconditions, postcondition assertions.
2.2.2 Online Execution (Precondition-Gated Reflex)
-
State Sensing: The agent observes current environment state through predicate evaluation. Multiple facets may be sensed simultaneously (e.g., specification alignment, test coverage, code correctness).
-
Action Applicability Function: φ: S × A → {applicable, blocked} determines which actions are available. Guards implement applicability predicates at runtime.
-
Guard-Verified Transitions: State commits only after guard validation. Invalid generations are rejected without polluting workflow state.
-
No Runtime Goal Search: The executing agent follows pre-computed structure. Goals are implicit in precondition ordering, not explicit representations the agent reasons about. This is model-based reflex, not goal-based.
-
Uses Discriminative Model: Guards classify P(valid|output) without simulating futures.
2.2.3 Execution Trace (Directed Acyclic Graph)
-
Structure: A DAG capturing the complete execution history where:
- Nodes: Generation events, guard evaluations, state snapshots, artifact versions
- Edges: State transitions, retry branches, artifact dependencies, causal links
-
Properties:
- Append-only: History is never modified, only extended
- Immutable: Past nodes and edges cannot be altered
- Enables Replay: Any execution path can be reconstructed
-
Retry Branching: Multiple generation attempts at the same workflow state produce sibling nodes. Only the branch reaching guard satisfaction (⊤) advances the workflow.
-
Artifact Dependencies: Edges encode data flow (output of A feeds input of B) orthogonal to control flow.
-
Relation to S_env: The execution trace is the information state.
-
Bridge to Learning: Substrate for in-context learning; training data for model adaptation.
2.3 Learning Modes
2.3.1 Intra-Episode (In-Context Learning)
- S_env accumulates generation history, guard feedback, and artifact provenance within a single execution episode.
- LLM conditions on prior attempts, guard failure reasons, and successful patterns without weight modification.
- Satisficing applies: learning continues until guard returns ⊤ or retry budget exhausted.
2.3.2 Inter-Episode (Model Adaptation)
- Execution traces provide training signal for:
- LoRA / adapter parameter updates
- Distillation from successful execution paths
- Reinforcement from guard feedback (⊤ as reward signal)
- Operates outside the intra-episode control boundary.
2.4 Architectural Definitions
2.4.1 Control Boundary (Generative Application)
Applying Sutton & Barto’s definition to LLM-based systems:
-
Intra-Episode: Agent controls context composition (C) and workflow state transitions (S_workflow). The LLM is part of the environment.
-
Inter-Episode: With sufficient compute, the agent may control adapter parameters (LoRA) or distilled weights.
-
Base Model: Pre-trained weights remain permanently in the environment, providing a stochastic generation oracle.
2.4.2 Dual-State Architecture
The system state space S separates into:
-
S_workflow (Control State): Deterministic FSM tracking goal progress, guard satisfaction, transition history. Commits only on guard success.
-
S_env (Information State): The execution trace DAG. Append-only, versioned, accumulates all generations and guard feedback. Enables in-context learning without polluting control flow.
2.4.3 Atomic Action Pair
Generator-guard coupling ensuring deterministic control over stochastic generation:
-
Generator: Produces candidate output conditioned on context (prompt, S_env history).
-
Guard: Deterministic predicate evaluating generation validity. Discriminative: classifies P(valid|output).
-
Tri-State Semantics:
- ⊤ (success): Commit to S_workflow
- ⊥_retry (recoverable): Append to S_env, re-invoke generator
- ⊥_fatal (unrecoverable): Escalate
-
Satisficing Interpretation:
- ⊤ = aspiration level met (Simon)
- Retry budget = search constraint
- First ⊤ accepted; global optimum not sought
2.5 Active Inference Correspondence
2.5.1 Friston’s Active Inference
- Generative model: P(o, s) = P(o|s)P(s)
- Inference problem: Given observation o, infer hidden state s
- Free energy: F = E_Q[log Q(s) - log P(o, s)]
- Perception: Minimize F w.r.t. Q(s) → approximate P(s|o)
- Action: Minimize F w.r.t. π → select actions reducing expected surprise
2.5.2 AtomicGuard as Inversion
Notation:
- σ — workflow state (specifications, preconditions, postconditions) — known
- g — generated output from LLM — stochastic, unknown until produced
- G(·) — guard function (deterministic validation)
- L(·|σ) — LLM generation distribution conditioned on prompt derived from σ
Validation model: G(g, σ) → {0, 1}
Generation problem: Given known state σ, obtain g such that G(g, σ) = 1
2.5.3 Structural Comparison
| Active Inference | AtomicGuard |
|---|---|
| P(o|s) — likelihood | G(g, σ) — guard (inverse likelihood) |
| P(s) — prior over states | σ — deterministic specification |
| Q(s) — approximate posterior | L(g|σ) — generation distribution |
| Minimize D_KL[Q(s) || P(s|o)] | Retry until G(g, σ) = 1 |
| Surprise: -log P(o) | Validation failure: G(g, σ) = 0 |
2.5.4 Free Energy Analogue
Validation energy:
V(g, σ) = -log P(G(g, σ) = 1 | g, σ)
For deterministic guards: V(g, σ) = 0 if valid, ∞ if invalid.
Retry loop minimizes expected validation energy:
E_L[V(g, σ)] → 0 as retries → success
2.5.5 The Inversion
- Friston: min_Q D_KL[Q(s) || P(s|o)] — adjust beliefs to match observations
- AtomicGuard: sample g ~ L(·|σ) until G(g, σ) = 1 — adjust outputs to match specifications
Epistemic uncertainty moves from state inference to output acceptance.
2.5.6 Markov Blanket Analogue
Guards form the boundary between deterministic workflow control (W) and stochastic content generation (I).
Information flow:
- W → prompt construction → LLM
- LLM → g → G(g, σ) → W (state transition or retry)
2.5.7 Proposition
AtomicGuard externalizes the generative model validation that Active Inference internalizes. Where Friston’s agent updates beliefs Q(s) to minimize surprise about observations, AtomicGuard’s guards reject outputs g that violate known specifications σ. Both achieve convergence toward consistency—Friston through belief revision, AtomicGuard through rejection sampling with retry budgets.
Inverted Epistemic Polarity: The world model is known (specifications), the generative process is unknown (LLM), and validation replaces inference.
Russell & Norvig ↔ Friston Mapping
| R&N Concept | Friston Equivalent | Role |
|---|---|---|
| Sensor model | P(o|s) | Likelihood mapping |
| Transition model P(s'|s,a) | P(s|π) | Policy-conditioned dynamics |
| Belief state | Q(s) | Approximate posterior |
| Prior | P(s), P(π) | Beliefs before observation |
Summary: Framework Position
| Offline (Design) | Online (Execution) | |
|---|---|---|
| Who | Human / orchestrator | Executing agent |
| Agent Type | Goal-based | Model-based reflex |
| Model Required | Generative P(s'|s,a) | Discriminative P(valid|output) |
| Searches | Hypothetical futures | N/A (follows pre-computed path) |
| Goals | Explicit representation | Implicit in preconditions |
| Friston Analogue | Planning as inference | Validation as rejection sampling |