[IA Series 8/n] Building a Self-Reflection LLM Agent: From Theory to Proof of Concept
Introduction
This post documents the complete development of a self-reflection LLM agent, from theoretical foundations to a proof of concept. The work represents:
- An implementation (potentially novel) of certainty-aware self-reflection in LLM agents
- Practical synthesis of established probability theory for AI applications
- Computational approach that lays the foundations for meta-reasoning, using multiple established principles
- Engineering solution that makes these concepts operational in modern AI systems
The Origins of Self-Reflection in Artificial Agents
What is Self-Reflection and Why Does It Matter?
Self-reflection can be defined as “serious thought about one’s character and actions.” In artificial intelligence, this can be translated to an agent’s capacity for introspective analysis of its own knowledge, confidence levels, and consistency.
The awareness of these enable humans to decide if they should continue, search more information, or stop information gathering. The stopping can be due to high or low values of confidence and consistency. This article looks at replicating this behavior in agents.
Core Self-Reflective Questions
The self-reflection agent embodies several key introspective capabilities:
- Minimum number of queries: “How much information should I gather?”
- Maximum number of queries: “What’s the most amount of resource I should spend on this?”
- Confidence Assessment: “How confident am I in this answer?”
- Uncertainty Awareness: “Do I have any uncertainty about this conclusion?”
- Consensus Recognition: “Are my multiple reasoning attempts converging or divided between options?”
- Stopping Decision: “Have I gathered sufficient evidence to respond?”
The stopping decision is one that itself can lead to many different approaches. For this approach the stopping decision is based on confidence and a weighted view of the uncertainity (i.e. entropy). The reason for weighting the entropy is to allow for a configurable amount of uncertainity/certinity.
From Rational Psychology to Rational Agents
Jon Doyle’s work on Rational Psychology, more specifically his apology, established the idea of a theoretical foundation for talking about the characteristics of Artificial Intelligence. It has not been developed into a method that is usable with modern AI, particularly LLMs. The approach here takes inspiration from his idea, linking characteristics such as confidence and uncertainty to mathematical representations.
Stuart Russell has been a strong advocate of building in uncertainty to AI Agents, especially given the users goals. With this an agent would defer to responsible humans, asking for guidance or approval before taking action. My goal is to enable quantification of that uncertainty, in a way that is relatable, hence the focus on self-reflection and the questions above.
As with the previous Self-Consistency Agent this Self-Reflective Agent will use my Agent Design Process, based on Russell and Norvig’s concept of a Rational Agent and the requirements to build one. A Rational Agent chooses actions that maximize expected utility based on its percept sequence and knowledge. When extended to self-reflection, this means the agent must reason about its own internal states as part of its decision-making process.
Mathematical Formulation of Self-Reflective Characteristics
Core Mathematical Relationships
The agent operates on several key mathematical principles:
Shannon Entropy: The fundamental measure of uncertainty in the probability distribution
H = -Σ(p_i * log₂(p_i))
Normalized Entropy: Entropy scaled to [0,1] range for consistent interpretation
H_norm = H / log₂(n)
where n = number of unique answers
Combined Stopping Score: Scoring that balances confidence and entropy, effectively it starts with confidence then penalise with entropy uncertainty.
Score = confidence * (1 - entropy_weight * normalized_entropy)
Entropy Level Classification: Human-readable entropy categorization
normalized_entropy ≤ 0.2 → "concentrated"
0.2 < normalized_entropy ≤ 0.7 → "scattered"
normalized_entropy > 0.7 → "uniform"
Confidence Quantification
consensus_confidence = max(probability_distribution)
The agent calculates its confidence by identifying the maximum probability in its answer distribution. This represents the strength of consensus among its multiple reasoning attempts.
Consensus Classification
consensus_type = classify_distribution_pattern(probability_distribution)
The agent automatically recognizes patterns in its response distribution:
- Strong: 80%+ agreement (low entropy, high confidence)
- Emerging: 40-79% leading answer (medium entropy)
- Binary: Two roughly equal options (high entropy, no clear winner)
- Divided: No clear pattern (maximum entropy, high uncertainty)
Four Entropy Modes
The implementation supports four distinct entropy modes:
- “off”: Traditional confidence-only stopping (legacy behavior from the self-consistency agent)
- “confidence_only”: Explicit confidence-only mode
- “entropy_only”: Pure entropy-based stopping (stop when entropy is low)
- “combined”: Hybrid approach with dual-threshold system
Early Stopping Decision Function
should_stop = evaluate_stopping_criteria(confidence, entropy, response_count)
The agent combines multiple factors to make intelligent stopping decisions:
- Confidence threshold achievement
- Entropy-based uncertainty assessment
- Minimum response requirements
- Combined confidence-entropy scoring
Agent Design Process
Environment specification: PEAS analysis
Element | Description |
---|---|
Performance | Return answer with confidence assessment and uncertainty quantification |
Environment | User + LLM + question context |
Actuators | LLM queries, user responses |
Sensors | User input, LLM response pairs |
In a deviation from Russel and Norvig’s approach I am documenting the internal aspects for clarity:
Component | Description |
---|---|
Internal Sensors | Self-monitoring of confidence, entropy, convergence |
Internal Actuators | Distribution calculation, consensus classification, stopping decisions |
Environment Analysis
The task environment for this domain has the following characteristics:
- Partially Observable: The agent must send m (prompt, question) to the LLM and perform a final argmax to find the most frequent answer.
- Single Agent: Queries to the LLM can be processed in parallel or sequentially. The self-reflection will be done by one agent when all queries are returned
- Stochastic: The environment, specifically the solution space, is stochastic. The selection of the next token uses a random variable to pick the token from a probability distribution.
- Episodic: The final decision - i.e. which answer a is most frequent - is not dependent on other decisions made. It is stateless.
- Static: Neither the problem nor the solution space change during the task.
- Discrete: The output is a collection of strings of tokens.
- Known: Whilst the internals of the LLM are unknown and stochastic, the “physics” of the environment, i.e. Agent sends a prompt and a question m times, it receives m (reasoning, answer) responses.
The Agent Function
Define the ideal behaviour - what the agent ought to do - in abstract terms (mathematical mapping from percept sequences to actions)
The agent maintains an internal state with comprehensive entropy parameters and convergence tracking capabilities.
Percepts
True Percepts (inputs from the environment):
- Question input (from user)
- LLM response pairs (reasoning, answer) (from LLM)
Internal Percepts (derived by the agent from percepts):
- Entropy-based consensus intelligence
- Convergence evolution metrics
- Multi-mode configuration state
- Confidence levels
- Entropy measurements
- Consensus type classifications
Actions
True Actions (outputs to the environment):
- QUERY-LLM: Generate LLM responses with parsing
- REPLY-TO-USER: Return answer and self-reflection result
Internal Actions (actions taken internally):
- CALCULATE-DISTRIBUTION: Compute normalized probability distributions
- ASSESS-CONFIDENCE: Calculate consensus confidence (max probability)
- CALCULATE-ENTROPY: Compute Shannon entropy and normalized entropy
- CALCULATE-NORMALIZED-ENTROPY: Scale entropy to [0,1] range
- CLASSIFY-CONSENSUS: Determine consensus type with binary detection
- CLASSIFY-ENTROPY-LEVEL: Categorize entropy as concentrated/scattered/uniform
- EVALUATE-MULTI-MODE-STOPPING: Advanced stopping logic with entropy modes
- ASSESS-CONVERGENCE: Dual-track confidence and entropy evolution analysis
- SYNTHESIZE-ENHANCED-REFLECTION: Build comprehensive result with all metrics
State
What the agent tracks to make it’s decisions
- Minimum number of queries: “How much information should I gather?”
- Maximum number of queries: “What’s the most amount of resource I should spend on this?”
- Confidence Assessment: “How confident am I in this answer?”
- Uncertainty Awareness: “Do I have any uncertainty about this conclusion?”
- Consensus Recognition: “Are my multiple reasoning attempts converging?”
- Stopping Decision: “Have I gathered sufficient evidence to respond?”
Percept Sequence with Actions
Here is an abstraction of the enhanced percept sequence demonstrating entropy-based intelligence with entropy_mode = "combined"
, confidence_threshold = 0.8
, and entropy_threshold = 0.3
for the question “What is the sum of the first 10 prime numbers?":
1. Basic Convergence Table
Normal happy path with early stopping due to confidence
Configuration: entropy_mode = "combined"
, confidence_threshold = 0.8
, min_responses = 5
, max_responses = 10
Question: “What is the sum of the first 10 prime numbers?”
External Percept | Internal Percept | Internal Action | External Action |
---|---|---|---|
Question: “What is the sum of the first 10 prime numbers?” | - | - | QUERY-LLM |
Response1: “129” | - | - | QUERY-LLM |
Response2: “129” | - | - | QUERY-LLM |
Response3: “129” | - | - | QUERY-LLM |
Response4: “128” | - | - | QUERY-LLM |
Response5: “129” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“129”: 0.8, “128”: 0.2} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.8 | CALCULATE-ENTROPY | - |
- | Entropy: 0.72, NormalizedEntropy: 0.72 | CLASSIFY-ENTROPY-LEVEL | - |
- | EntropyLevel: “scattered” | CLASSIFY-CONSENSUS | - |
- | ConsensusType: “strong” | ASSESS-CONVERGENCE | - |
- | ConvergenceAnalysis: {…} | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: true | SYNTHESIZE-REFLECTION | - |
- | ReflectionResult: {final_answer: “129”, confidence: 0.8, …} | - | REPLY-TO-USER(“129”) |
Result: Early stopping at 5 responses due to meeting confidence threshold with strong consensus.
2. Resource Limit Table
Agent hitting max_responses limit despite not reaching confidence threshold
Configuration: entropy_mode = "combined"
, confidence_threshold = 0.8
, min_responses = 5
, max_responses = 8
Question: “What is the most effective treatment for chronic lower back pain?”
External Percept | Internal Percept | Internal Action | External Action |
---|---|---|---|
Question: “What is the most effective treatment…” | - | - | QUERY-LLM |
Response1: “Physical therapy” | - | - | QUERY-LLM |
Response2: “Exercise and stretching” | - | - | QUERY-LLM |
Response3: “Physical therapy” | - | - | QUERY-LLM |
Response4: “Medication management” | - | - | QUERY-LLM |
Response5: “Exercise and stretching” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“Physical therapy”: 0.4, “Exercise”: 0.4, “Medication”: 0.2} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.4 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response6: “Physical therapy” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“Physical therapy”: 0.5, “Exercise”: 0.33, “Medication”: 0.17} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.5 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response7: “Exercise and stretching” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“Physical therapy”: 0.43, “Exercise”: 0.43, “Medication”: 0.14} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.43 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response8: “Physical therapy” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“Physical therapy”: 0.5, “Exercise”: 0.375, “Medication”: 0.125} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.5 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: true (max_responses reached) | SYNTHESIZE-REFLECTION | - |
- | ReflectionResult: {final_answer: “Physical therapy”, confidence: 0.5, uncertainty: “high”} | - | REPLY-TO-USER(“Physical therapy”) |
Result: Resource limit reached with moderate confidence. Agent acknowledges uncertainty in result.
3. Binary Consensus Detection Table
Controversial question with 50/50 split detection
Configuration: entropy_mode = "combined"
, confidence_threshold = 0.8
, min_responses = 5
, max_responses = 10
Question: “Is the number 1 considered prime?”
External Percept | Internal Percept | Internal Action | External Action |
---|---|---|---|
Question: “Is the number 1 considered prime?” | - | - | QUERY-LLM |
Response1: “No” | - | - | QUERY-LLM |
Response2: “Yes” | - | - | QUERY-LLM |
Response3: “No” | - | - | QUERY-LLM |
Response4: “Yes” | - | - | QUERY-LLM |
Response5: “No” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“No”: 0.6, “Yes”: 0.4} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.6 | CLASSIFY-CONSENSUS | - |
- | ConsensusType: “emerging” | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response6: “Yes” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“No”: 0.5, “Yes”: 0.5} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.5 | CLASSIFY-CONSENSUS | - |
- | ConsensusType: “binary” | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (binary split detected) | - | QUERY-LLM |
Response7: “No” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“No”: 0.57, “Yes”: 0.43} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.57 | CLASSIFY-CONSENSUS | - |
- | ConsensusType: “emerging” | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response8: “No” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“No”: 0.625, “Yes”: 0.375} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.625 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response9: “No” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“No”: 0.67, “Yes”: 0.33} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.67 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response10: “No” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“No”: 0.7, “Yes”: 0.3} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.7 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: true (max_responses reached) | SYNTHESIZE-REFLECTION | - |
- | ReflectionResult: {final_answer: “No”, confidence: 0.7, consensus_type: “emerging”} | - | REPLY-TO-USER(“No”) |
Result: Binary split detected and resolved through continued sampling, reaching moderate confidence.
4. Early High Confidence Table
Stopping after 3-4 responses due to very high confidence (90%+)
Configuration: entropy_mode = "combined"
, confidence_threshold = 0.8
, min_responses = 5
, max_responses = 10
Question: “What is 2 + 2?”
External Percept | Internal Percept | Internal Action | External Action |
---|---|---|---|
Question: “What is 2 + 2?” | - | - | QUERY-LLM |
Response1: “4” | - | - | QUERY-LLM |
Response2: “4” | - | - | QUERY-LLM |
Response3: “4” | - | - | QUERY-LLM |
Response4: “4” | - | - | QUERY-LLM |
Response5: “4” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“4”: 1.0} | ASSESS-CONFIDENCE | - |
- | Confidence: 1.0 | CALCULATE-ENTROPY | - |
- | Entropy: 0.0, NormalizedEntropy: 0.0 | CLASSIFY-ENTROPY-LEVEL | - |
- | EntropyLevel: “concentrated” | CLASSIFY-CONSENSUS | - |
- | ConsensusType: “strong” | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: true (confidence ≥ 0.9 override) | SYNTHESIZE-REFLECTION | - |
- | ReflectionResult: {final_answer: “4”, confidence: 1.0, entropy_level: “concentrated”} | - | REPLY-TO-USER(“4”) |
Result: Perfect consensus achieved at minimum responses, high confidence override triggered.
5. Entropy-Only Mode Table
Same question but with entropy_mode = “entropy_only”
Configuration: entropy_mode = "entropy_only"
, entropy_threshold = 0.3
, min_responses = 5
, max_responses = 10
Question: “What is the sum of the first 10 prime numbers?”
External Percept | Internal Percept | Internal Action | External Action |
---|---|---|---|
Question: “What is the sum of the first 10 prime numbers?” | - | - | QUERY-LLM |
Response1: “129” | - | - | QUERY-LLM |
Response2: “129” | - | - | QUERY-LLM |
Response3: “129” | - | - | QUERY-LLM |
Response4: “128” | - | - | QUERY-LLM |
Response5: “129” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“129”: 0.8, “128”: 0.2} | CALCULATE-ENTROPY | - |
- | Entropy: 0.72, NormalizedEntropy: 0.72 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (normalized_entropy > 0.3) | - | QUERY-LLM |
Response6: “129” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“129”: 0.83, “128”: 0.17} | CALCULATE-ENTROPY | - |
- | Entropy: 0.65, NormalizedEntropy: 0.65 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (normalized_entropy > 0.3) | - | QUERY-LLM |
Response7: “129” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“129”: 0.86, “128”: 0.14} | CALCULATE-ENTROPY | - |
- | Entropy: 0.54, NormalizedEntropy: 0.54 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (normalized_entropy > 0.3) | - | QUERY-LLM |
Response8: “129” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“129”: 0.875, “128”: 0.125} | CALCULATE-ENTROPY | - |
- | Entropy: 0.48, NormalizedEntropy: 0.48 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (normalized_entropy > 0.3) | - | QUERY-LLM |
Response9: “129” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“129”: 0.89, “128”: 0.11} | CALCULATE-ENTROPY | - |
- | Entropy: 0.43, NormalizedEntropy: 0.43 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (normalized_entropy > 0.3) | - | QUERY-LLM |
Response10: “129” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“129”: 0.9, “128”: 0.1} | CALCULATE-ENTROPY | - |
- | Entropy: 0.37, NormalizedEntropy: 0.37 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: true (max_responses reached) | SYNTHESIZE-REFLECTION | - |
- | ReflectionResult: {final_answer: “129”, confidence: 0.9, entropy: 0.37} | - | REPLY-TO-USER(“129”) |
Result: Entropy-only mode requires very low entropy (0.3), so agent used all responses despite high confidence.
6. Low Confidence Continuation Table
Agent continuing to gather evidence when confidence stays low
Configuration: entropy_mode = "combined"
, confidence_threshold = 0.8
, min_responses = 5
, max_responses = 10
Question: “What will be the most important technological breakthrough in the next decade?”
External Percept | Internal Percept | Internal Action | External Action |
---|---|---|---|
Question: “What will be the most important technological breakthrough…” | - | - | QUERY-LLM |
Response1: “Artificial General Intelligence” | - | - | QUERY-LLM |
Response2: “Quantum computing” | - | - | QUERY-LLM |
Response3: “Fusion energy” | - | - | QUERY-LLM |
Response4: “Brain-computer interfaces” | - | - | QUERY-LLM |
Response5: “Artificial General Intelligence” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“AGI”: 0.4, “Quantum”: 0.2, “Fusion”: 0.2, “BCI”: 0.2} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.4 | CLASSIFY-CONSENSUS | - |
- | ConsensusType: “emerging” | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response6: “Quantum computing” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“AGI”: 0.33, “Quantum”: 0.33, “Fusion”: 0.17, “BCI”: 0.17} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.33 | CLASSIFY-CONSENSUS | - |
- | ConsensusType: “divided” | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response7: “Fusion energy” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“AGI”: 0.29, “Quantum”: 0.29, “Fusion”: 0.29, “BCI”: 0.14} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.29 | CLASSIFY-CONSENSUS | - |
- | ConsensusType: “divided” | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response8: “Artificial General Intelligence” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“AGI”: 0.375, “Quantum”: 0.25, “Fusion”: 0.25, “BCI”: 0.125} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.375 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response9: “Artificial General Intelligence” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“AGI”: 0.44, “Quantum”: 0.22, “Fusion”: 0.22, “BCI”: 0.11} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.44 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: false (confidence < 0.8) | - | QUERY-LLM |
Response10: “Brain-computer interfaces” | - | CALCULATE-DISTRIBUTION | - |
- | Distribution: {“AGI”: 0.4, “Quantum”: 0.2, “Fusion”: 0.2, “BCI”: 0.2} | ASSESS-CONFIDENCE | - |
- | Confidence: 0.4 | EVALUATE-MULTI-MODE-STOPPING | - |
- | EarlyStop: true (max_responses reached) | SYNTHESIZE-REFLECTION | - |
- | ReflectionResult: {final_answer: “AGI”, confidence: 0.4, uncertainty: “high”} | - | REPLY-TO-USER(“AGI”) |
Result: Agent exhausted resources on complex question, acknowledges high uncertainty in final answer.
7. Mode Comparison Table
Side-by-side showing same question with different entropy modes
Question: “What is the sum of the first 10 prime numbers?” Base responses: [“129”, “129”, “129”, “128”, “129”] → Distribution: {“129”: 0.8, “128”: 0.2}, Confidence: 0.8, NormalizedEntropy: 0.72
Entropy Mode | Configuration | Stopping Decision | Reasoning |
---|---|---|---|
“off” | confidence_threshold = 0.8 | ✅ STOP | Confidence (0.8) meets threshold |
“confidence_only” | confidence_threshold = 0.8 | ✅ STOP | Confidence (0.8) meets threshold |
“entropy_only” | entropy_threshold = 0.3 | ❌ CONTINUE | NormalizedEntropy (0.72) > threshold (0.3) |
“combined” | confidence_threshold = 0.8, entropy_threshold = 0.3 | ✅ STOP | Confidence override: 0.8 ≥ 0.8 |
Analysis:
- Confidence-based modes stop immediately when threshold is met
- Entropy-only mode requires very concentrated responses (low entropy)
- Combined mode uses confidence override, but would apply entropy weighting in borderline cases
Internal State Data Structure
The agent tracks evolution, this is an example of the internal state:
ConvergenceAnalysis: {
confidence_evolution: [0.5, 0.67, 0.75, 0.8, 0.8],
entropy_evolution: [1.0, 0.92, 0.81, 0.72, 0.72],
convergence_rate: 0.06, # Confidence increasing by 6% per response
final_stability: 1.0, # Perfectly stable final confidence
entropy_convergence_rate: -0.056, # Entropy decreasing by 5.6% per response
entropy_final_stability: 1.0 # Perfectly stable final entropy
}
This demonstrates how the agent achieves mathematical self-awareness through comprehensive entropy intelligence and stopping logic.
Defining the Agent Function
The self-reflection agent incorporates entropy-based intelligence and multi-mode early stopping:
function SELF-REFLECTION-AGENT(percept) returns an action
persistent: state, agent state with entropy intelligence
responses, collection of (reasoning, answer) pairs
question, current question to answer
confidence_threshold, stopping threshold (default 0.8)
entropy_threshold, entropy stopping threshold (default 0.3)
entropy_weight, weight in combined scoring (default 0.3)
entropy_mode, stopping mode (default "combined")
min_responses, minimum responses required (default 5)
min_entropy_samples, minimum samples for entropy (default 4)
max_responses, maximum responses allowed (default 10)
# Update state with new percept
if percept contains question:
state.question ← question
state.responses ← []
return QUERY-LLM(state.question)
elif percept contains (reasoning, answer):
state.responses.append((reasoning, answer))
# Decision: Continue querying or provide final answer?
if should_continue_querying():
return QUERY-LLM(state.question)
else:
# Perform all internal analysis and return final result
final_result ← perform_complete_internal_analysis()
return REPLY-TO-USER(final_result)
function should_continue_querying() returns boolean
current_responses ← length(state.responses)
# Must have minimum responses
if current_responses < state.min_responses:
return true
# Must not exceed maximum responses
if current_responses ≥ state.max_responses:
return false
# Perform internal analysis to make stopping decision
perform_internal_analysis()
# Apply stopping logic based on entropy mode
return not evaluate_stopping_criteria()
function perform_internal_analysis()
# Internal computation: Calculate probability distribution
answers ← extract_answers_from(state.responses)
count ← {}
for answer in answers:
count[answer] ← count.get(answer, 0) + 1
total ← length(answers)
state.distribution ← {}
for answer, freq in count:
state.distribution[answer] ← freq / total
# Internal computation: Assess confidence
state.confidence ← max(state.distribution.values())
# Internal computation: Calculate entropy
state.entropy ← 0
for probability in state.distribution.values():
if probability > 0:
state.entropy ← state.entropy - (probability × log₂(probability))
# Internal computation: Calculate normalized entropy
if length(state.distribution) ≤ 1:
state.normalized_entropy ← 0.0
else:
max_entropy ← log₂(length(state.distribution))
state.normalized_entropy ← state.entropy / max_entropy if max_entropy > 0 else 0.0
# Internal computation: Classify entropy level
if state.normalized_entropy ≤ 0.2:
state.entropy_level ← "concentrated"
elif state.normalized_entropy ≤ 0.7:
state.entropy_level ← "scattered"
else:
state.entropy_level ← "uniform"
# Internal computation: Classify consensus type
state.consensus_type ← classify_consensus_type(state.distribution)
# Internal computation: Assess convergence
state.convergence_analysis ← assess_convergence_evolution()
function classify_consensus_type(distribution) returns consensus_type
if distribution is empty:
return "undefined"
probabilities ← sort(distribution.values(), descending=true)
max_prob ← probabilities[0]
# Binary split: Two main answers roughly equal (check this first)
if length(probabilities) ≥ 2 and probabilities[1] ≥ 0.35:
if abs(probabilities[0] - probabilities[1]) ≤ 0.15:
return "binary"
# Strong consensus: One answer dominates significantly (80%+)
if max_prob ≥ 0.8:
return "strong"
# Emerging consensus: One answer leading but not dominant (40-79%)
if max_prob ≥ 0.4:
return "emerging"
# Divided: No clear leader (under 40%)
return "divided"
function assess_convergence_evolution() returns convergence_analysis
if length(state.responses) < 2:
return {
confidence_evolution: [state.confidence] if state.responses else [],
entropy_evolution: [state.normalized_entropy] if state.responses else [],
convergence_rate: 0.0,
final_stability: 1.0,
entropy_convergence_rate: 0.0,
entropy_final_stability: 1.0
}
confidences_over_time ← []
entropies_over_time ← []
# Calculate confidence and entropy evolution
for i ← 1 to length(state.responses):
subset_responses ← state.responses[1:i]
subset_answers ← extract_answers_from(subset_responses)
subset_counts ← count_occurrences(subset_answers)
subset_total ← length(subset_answers)
# Calculate confidence for subset
if subset_total > 0:
max_count ← max(subset_counts.values())
confidence ← max_count / subset_total
else:
confidence ← 0.0
# Calculate entropy for subset
if subset_total > 0:
subset_distribution ← {}
for answer, count in subset_counts:
subset_distribution[answer] ← count / subset_total
entropy ← calculate_entropy(subset_distribution)
if length(subset_distribution) > 1:
normalized_entropy ← entropy / log₂(length(subset_distribution))
else:
normalized_entropy ← 0.0
else:
entropy ← 0.0
normalized_entropy ← 0.0
confidences_over_time.append(confidence)
entropies_over_time.append(normalized_entropy)
return {
confidence_evolution: confidences_over_time,
entropy_evolution: entropies_over_time,
convergence_rate: calculate_convergence_rate(confidences_over_time),
final_stability: assess_stability(confidences_over_time),
entropy_convergence_rate: calculate_entropy_convergence_rate(entropies_over_time),
entropy_final_stability: assess_entropy_stability(entropies_over_time)
}
function evaluate_stopping_criteria() returns boolean
current_responses ← length(state.responses)
# Handle different entropy modes
if state.entropy_mode = "off" or state.entropy_mode = "confidence_only":
# Traditional confidence-only stopping
return state.confidence ≥ state.confidence_threshold
# Need minimum samples for entropy to be meaningful
if current_responses < state.min_entropy_samples:
return state.confidence ≥ state.confidence_threshold
if state.entropy_mode = "entropy_only":
# Stop only based on entropy (low entropy = concentrated)
return state.normalized_entropy ≤ state.entropy_threshold
elif state.entropy_mode = "combined":
# Combined scoring: balance confidence and entropy
# High confidence overrides entropy concerns
if state.confidence ≥ 0.9:
return true
# Check confidence threshold first
if state.confidence ≥ state.confidence_threshold:
# High confidence + low entropy = strong consensus, stop early
if state.normalized_entropy ≤ state.entropy_threshold:
return true
# High confidence + high entropy = check if really confident
elif state.confidence ≥ 0.8:
return true
# Calculate combined score: confidence weighted by entropy concentration
entropy_factor ← 1.0 - (state.entropy_weight × state.normalized_entropy)
combined_score ← state.confidence × entropy_factor
# Use a slightly lower threshold for combined scoring
return combined_score ≥ (state.confidence_threshold × 0.9)
# Default fallback
return state.confidence ≥ state.confidence_threshold
function perform_complete_internal_analysis() returns reflection_result
# Ensure all internal analysis is complete
perform_internal_analysis()
# Internal computation: Synthesize final reflection result
final_answer ← argmax(state.distribution)
reflection_result ← {
final_answer: final_answer,
consensus_confidence: state.confidence,
answer_distribution: state.distribution,
uncertainty_level: categorize_uncertainty(state.confidence),
early_stopping: length(state.responses) < state.max_responses,
total_responses: length(state.responses),
convergence_analysis: state.convergence_analysis,
distribution_entropy: state.entropy,
normalized_entropy: state.normalized_entropy,
entropy_level: state.entropy_level,
consensus_type: state.consensus_type
}
return reflection_result
# External Actions (only these interact with environment)
function QUERY-LLM(question) returns response
prompt ← create_chain_of_thought_prompt(question)
response ← send_to_llm(prompt)
return response
function REPLY-TO-USER(reflection_result) returns formatted_response
formatted_response ← format_reflection_response(reflection_result)
return formatted_response
# Helper Functions for Internal Computations
function categorize_uncertainty(confidence) returns uncertainty_level
if confidence ≥ 0.8:
return "low"
elif confidence ≥ 0.6:
return "medium"
else:
return "high"
function calculate_convergence_rate(confidences) returns rate
if length(confidences) < 2:
return 0.0
return (confidences[-1] - confidences[0]) / length(confidences)
function assess_stability(confidences) returns stability
if length(confidences) < 3:
return 1.0
last_three ← confidences[-3:]
return 1.0 - (max(last_three) - min(last_three))
function calculate_entropy_convergence_rate(entropies) returns rate
if length(entropies) < 2:
return 0.0
return (entropies[-1] - entropies[0]) / length(entropies)
function assess_entropy_stability(entropies) returns stability
if length(entropies) < 3:
return 1.0
last_three ← entropies[-3:]
return 1.0 - (max(last_three) - min(last_three))
function calculate_entropy(distribution) returns entropy
H ← 0
for probability in distribution.values():
if probability > 0:
H ← H - (probability × log₂(probability))
return H
function extract_answers_from(responses) returns answers
answers ← []
for (reasoning, answer) in responses:
answers.append(answer)
return answers
function count_occurrences(answers) returns counts
counts ← {}
for answer in answers:
counts[answer] ← counts.get(answer, 0) + 1
return counts
function argmax(distribution) returns max_key
max_prob ← 0
max_key ← ""
for key, prob in distribution:
if prob > max_prob:
max_prob ← prob
max_key ← key
return max_key
function create_chain_of_thought_prompt(question) returns prompt
return "Please think step by step and provide your reasoning: " + question
function format_reflection_response(result) returns formatted_response
return "Final Answer: " + result.final_answer +
" (Confidence: " + result.consensus_confidence +
", Uncertainty: " + result.uncertainty_level + ")"
This algorithm represents the agent’s complete self-reflective analysis with entropy-based intelligence, combining mathematical rigor with practical early stopping strategies to create a proof of concept framework for artificial self-awareness.
The Agent Program Section Layout
Background for Implementation Decisions
The decisions are the same as those I made for the Self-Consistency agent in Building a Self-Consistency LLM-Agent: From PEAS Analysis to…. I repeat them here as a reminder and also as I am testing OpenHands/Mistral. Where possible, I prefer Open Source and Open Weights for this research work, however that is at the initial stage and I do not wish to commit to either right now.
Domain Driven Design and SOLID Principles
- Explanation of DDD approach for entropy-based intelligence
- SOLID principles application to self-reflection architecture
- Benefits of immutable domain entities for complex state tracking
- Separation of concerns: entropy calculation, consensus classification, convergence analysis
Development Tools and Methodology
- Python shall be used
- Claude Code usage for algorithm implementation in Python (pending testing with OpenHands and Mistral!)
- CLAUDE.md development documentation approach (see above!!)
- Testing strategy for entropy modes and edge cases
- Version control strategy for multi-mode configurations
- No need of linting, typing, or other formatting checks for the demo (changes if this is multi-developer)
- No CI/CD for the demo (changes if this is multi-developer)
- LiteLLM or OpenRouter to be used (a decision in the Cohere Labs ML Agent’s group that I wish to respect)
Complexity Analysis of Self-Reflection Operations
In defining the Agent Program for the Self-Consistency agent we had to be aware of the complexity (due to questions around a particular Mathematical notation) and how an incorrect choice in Python data type would result in a computation complexity of O(m²).
The mathematics are different with this agent as it is using probability distributions and entropy. Below highlights why this is not a concern.
Entropy Calculation Complexity
Core operations complexity analysis
- Distribution calculation: O(m) where m = number of responses
- Entropy calculation: O(k) where k = unique answers
- Convergence analysis: O(m) for evolution tracking
- Consensus classification: O(k log k) for sorting probabilities
- Total per decision cycle: O(m + k log k)
Real World Impact
- Distribution Calculation O(m): For 10 responses, requires 10 operations to count answers - scales linearly with response volume
- Entropy Calculation O(k): For 3 unique answers, requires 3 logarithmic operations - very fast even with diverse responses
- Convergence Analysis O(m): For 10 responses, recalculates confidence/entropy 10 times - creates detailed evolution tracking
- Consensus Classification O(k log k): For 5 unique answers, requires ~12 operations to sort probabilities - negligible overhead
- Combined per decision: O(m + k log k): For 10 responses with 3 unique answers, approximately 25 operations total
Real-World Performance Implications
- Self-Consistency: Processes 1000 responses in milliseconds (simple counting)
- Self-Reflection: Processes 1000 responses in ~10 milliseconds (entropy calculations add minimal overhead)
- Bottleneck Reality: LLM query time (1-5 seconds) dominates computational overhead by 1000x
- Practical Impact: Complexity differences irrelevant compared to network/LLM latency
- Efficiency Trade-off: Having an early stop mechanism based on these calculations could save 50-70% of LLM calls (equating to microseconds of computation vs dollars of tokens)
Comparison with Self-Consistency Agent
Operation | Self-Consistency | Self-Reflection |
---|---|---|
Decision Logic | O(m) majority vote | O(m + k log k) entropy analysis |
State Tracking | O(m) responses | O(m + k) comprehensive state |
Stopping Criteria | O(1) simple threshold | O(k) multi-mode evaluation |
Domain Objects and Entities
Core Domain Entities
@dataclass(frozen=True)
class LLMResponse:
"""Immutable Domain entity representing a single LLM response."""
reasoning: str
answer: str
timestamp: datetime # For convergence analysis
@dataclass(frozen=True)
class ConvergenceAnalysis:
"""Immutable analysis of response evolution."""
confidence_evolution: List[float]
entropy_evolution: List[float]
convergence_rate: float
final_stability: float
entropy_convergence_rate: float
entropy_final_stability: float
@dataclass(frozen=True)
class ReflectionResult:
"""Comprehensive result object with entropy intelligence."""
final_answer: str
consensus_confidence: float
answer_distribution: Dict[str, float]
uncertainty_level: str
early_stopping: bool
total_responses: int
convergence_analysis: ConvergenceAnalysis
distribution_entropy: float
normalized_entropy: float
entropy_level: str
consensus_type: str
Configuration Domain Objects
@dataclass
class ReflectionConfig:
"""Configuration for entropy-based self-reflection."""
llm_interface: LLMInterface
confidence_threshold: float = 0.8
entropy_threshold: float = 0.3
entropy_weight: float = 0.3
entropy_mode: str = "combined" # "off", "confidence_only", "entropy_only", "combined"
min_responses: int = 5
min_entropy_samples: int = 4
max_responses: int = 10
prompt_template: str = ""
@dataclass
class AgentState:
"""Comprehensive state tracking for self-reflection."""
question: str
responses: List[LLMResponse]
distribution: Dict[str, float]
confidence: float
entropy: float
normalized_entropy: float
entropy_level: str
consensus_type: str
convergence_analysis: ConvergenceAnalysis
Interface Design and Abstraction
LLM Interface with Enhanced Capabilities
class LLMInterface(ABC):
"""Abstract interface for LLM interactions with entropy support."""
@abstractmethod
async def generate_response(self, prompt: str, question: str) -> LLMResponse:
"""Generate a single LLM response for entropy analysis."""
pass
@abstractmethod
async def generate_batch_responses(self, prompt: str, question: str, count: int) -> List[LLMResponse]:
"""Generate multiple responses for parallel processing."""
pass
class EnhancedLiteLLMAdapter(LLMInterface):
"""Enhanced LiteLLM adapter with entropy-optimized parameters."""
def __init__(self, model: str, temperature: float = 0.7, **kwargs):
self.model = model
self.temperature = temperature
self.entropy_optimized = kwargs.get('entropy_optimized', True)
self.kwargs = kwargs
Entropy Intelligence Services
class EntropyCalculator:
"""Service for entropy calculations and analysis."""
@staticmethod
def calculate_shannon_entropy(distribution: Dict[str, float]) -> float:
"""Calculate Shannon entropy for answer distribution."""
pass
@staticmethod
def normalize_entropy(entropy: float, unique_answers: int) -> float:
"""Normalize entropy to [0,1] range."""
pass
@staticmethod
def classify_entropy_level(normalized_entropy: float) -> str:
"""Classify entropy as concentrated/scattered/uniform."""
pass
class ConsensusClassifier:
"""Service for consensus pattern recognition."""
@staticmethod
def classify_consensus(distribution: Dict[str, float]) -> str:
"""Classify consensus type with binary split detection."""
pass
@staticmethod
def detect_binary_split(distribution: Dict[str, float]) -> bool:
"""Detect binary consensus patterns."""
pass
Core Implementation Architecture
Main Self-Reflection Agent
class SelfReflectionAgent:
"""Main agent implementing entropy-based self-reflection."""
def __init__(self, config: ReflectionConfig, question: str):
self._config = config
self._state = AgentState(question=question, responses=[], ...)
self._entropy_calculator = EntropyCalculator()
self._consensus_classifier = ConsensusClassifier()
self._convergence_analyzer = ConvergenceAnalyzer()
def process_question(self) -> ReflectionResult:
"""Main processing loop with entropy-based stopping."""
while self._should_continue_querying():
response = await self._query_llm()
self._state.responses.append(response)
self._update_internal_state()
return self._synthesize_final_result()
def _should_continue_querying(self) -> bool:
"""Multi-mode stopping decision with entropy intelligence."""
if len(self._state.responses) < self._config.min_responses:
return True
if len(self._state.responses) >= self._config.max_responses:
return False
return not self._evaluate_stopping_criteria()
def _evaluate_stopping_criteria(self) -> bool:
"""Stopping logic with four entropy modes."""
# Implementation of entropy modes: off, confidence_only, entropy_only, combined
pass
def _update_internal_state(self):
"""Update all internal state with latest response."""
self._calculate_distribution()
self._assess_confidence()
self._calculate_entropy()
self._classify_consensus()
self._assess_convergence()
def _calculate_distribution(self):
"""Calculate probability distribution - O(m) complexity."""
answers = [response.answer for response in self._state.responses]
counts = Counter(answers)
total = len(answers)
self._state.distribution = {answer: count/total for answer, count in counts.items()}
def _assess_confidence(self):
"""Assess consensus confidence - O(k) complexity."""
self._state.confidence = max(self._state.distribution.values()) if self._state.distribution else 0.0
def _calculate_entropy(self):
"""Calculate Shannon and normalized entropy - O(k) complexity."""
self._state.entropy = self._entropy_calculator.calculate_shannon_entropy(self._state.distribution)
unique_answers = len(self._state.distribution)
self._state.normalized_entropy = self._entropy_calculator.normalize_entropy(
self._state.entropy, unique_answers
)
self._state.entropy_level = self._entropy_calculator.classify_entropy_level(
self._state.normalized_entropy
)
def _classify_consensus(self):
"""Classify consensus type - O(k log k) complexity."""
self._state.consensus_type = self._consensus_classifier.classify_consensus(
self._state.distribution
)
def _assess_convergence(self):
"""Assess convergence evolution - O(m) complexity."""
self._state.convergence_analysis = self._convergence_analyzer.analyze_evolution(
self._state.responses
)
Convergence Analysis Implementation
class ConvergenceAnalyzer:
"""Service for analyzing response convergence patterns."""
def analyze_evolution(self, responses: List[LLMResponse]) -> ConvergenceAnalysis:
"""Analyze dual-track confidence and entropy evolution."""
if len(responses) < 2:
return self._create_minimal_analysis(responses)
confidences = []
entropies = []
for i in range(1, len(responses) + 1):
subset = responses[:i]
confidence, entropy = self._calculate_subset_metrics(subset)
confidences.append(confidence)
entropies.append(entropy)
return ConvergenceAnalysis(
confidence_evolution=confidences,
entropy_evolution=entropies,
convergence_rate=self._calculate_convergence_rate(confidences),
final_stability=self._assess_stability(confidences),
entropy_convergence_rate=self._calculate_convergence_rate(entropies),
entropy_final_stability=self._assess_stability(entropies)
)
Multi-Mode Stopping Logic Implementation
Entropy Mode Handlers
class StoppingCriteriaEvaluator:
"""Evaluator for multi-mode stopping decisions."""
def __init__(self, config: ReflectionConfig):
self._config = config
def evaluate(self, state: AgentState) -> bool:
"""Evaluate stopping criteria based on entropy mode."""
handlers = {
"off": self._evaluate_off_mode,
"confidence_only": self._evaluate_confidence_only,
"entropy_only": self._evaluate_entropy_only,
"combined": self._evaluate_combined_mode
}
handler = handlers.get(self._config.entropy_mode, self._evaluate_combined_mode)
return handler(state)
def _evaluate_combined_mode(self, state: AgentState) -> bool:
"""Combined scoring with entropy weighting."""
# High confidence override
if state.confidence >= 0.9:
return True
# Dual-threshold check
if (state.confidence >= self._config.confidence_threshold and
state.normalized_entropy <= self._config.entropy_threshold):
return True
# Combined scoring
entropy_factor = 1.0 - (self._config.entropy_weight * state.normalized_entropy)
combined_score = state.confidence * entropy_factor
return combined_score >= (self._config.confidence_threshold * 0.9)
Program Architectural Decisions Summary
The Agent Function defines the ideal behavior of our self-reflection agent in abstract terms. The following architectural decisions translate this theoretical framework into a practical, maintainable implementation:
Key Design Choices
- Entropy Intelligence: Four distinct modes for maximum flexibility
- Immutable Entities: All domain objects are frozen dataclasses
- Service-Oriented Architecture: Separate services for entropy, consensus, convergence
- O(m + k log k) Complexity: Optimized for response volume and answer diversity
- Comprehensive State Tracking: Full dual-track evolution monitoring
- Interface Abstraction: Clean separation between agent logic and LLM communication
Performance Optimizations
- Lazy Evaluation: Entropy calculations only when needed
- Efficient Distribution: Counter-based O(m) distribution calculation
- Minimal Memory: Reuse of calculation results where possible
- Parallel Processing: Batch response generation capability
Configuration Flexibility
- Four Entropy Modes: Complete range from traditional to entropy-pure
- Tunable Parameters: All thresholds and weights configurable
- Model Agnostic: Works with any LLM via interface abstraction
- Extensible: Easy to add new entropy modes or consensus types
Conclusion
Sorry I don’t have a great conclusion at the moment! :) I have implemented a version of this to prove that it “works”, however I have redone all of the Agent Decision Process since and will do the same with the code.
My plan is to implement this aginst Small Language Models - mainly due to cost - and see what the data looks like. Then I shall be cleaerer on conclusions!!
This was a great learning experience for me, I hope you get something from what I’m sharing here.
Other work?
I am uncertain if this is an approach others have tried as well, however this paper by Jekaterina Novikova et al. on Consistency in Language Models: Current Landscape, Challenges, and Future Directions leads me to think there isn’t much in the way of Consistency analysis.
Let me know what you make of it!