[IA Series 8/n] Building a Self-Reflection LLM Agent: From Theory to Proof of Concept

Introduction

This post documents the complete development of a self-reflection LLM agent, from theoretical foundations to a proof of concept. The work represents:

  • An implementation (potentially novel) of certainty-aware self-reflection in LLM agents
  • Practical synthesis of established probability theory for AI applications
  • Computational approach that lays the foundations for meta-reasoning, using multiple established principles
  • Engineering solution that makes these concepts operational in modern AI systems

The Origins of Self-Reflection in Artificial Agents

What is Self-Reflection and Why Does It Matter?

Self-reflection can be defined as “serious thought about one’s character and actions.” In artificial intelligence, this can be translated to an agent’s capacity for introspective analysis of its own knowledge, confidence levels, and consistency.

The awareness of these enable humans to decide if they should continue, search more information, or stop information gathering. The stopping can be due to high or low values of confidence and consistency. This article looks at replicating this behavior in agents.

Core Self-Reflective Questions

The self-reflection agent embodies several key introspective capabilities:

  • Minimum number of queries: “How much information should I gather?”
  • Maximum number of queries: “What’s the most amount of resource I should spend on this?”
  • Confidence Assessment: “How confident am I in this answer?”
  • Uncertainty Awareness: “Do I have any uncertainty about this conclusion?”
  • Consensus Recognition: “Are my multiple reasoning attempts converging or divided between options?”
  • Stopping Decision: “Have I gathered sufficient evidence to respond?”

The stopping decision is one that itself can lead to many different approaches. For this approach the stopping decision is based on confidence and a weighted view of the uncertainity (i.e. entropy). The reason for weighting the entropy is to allow for a configurable amount of uncertainity/certinity.

From Rational Psychology to Rational Agents

Jon Doyle’s work on Rational Psychology, more specifically his apology, established the idea of a theoretical foundation for talking about the characteristics of Artificial Intelligence. It has not been developed into a method that is usable with modern AI, particularly LLMs. The approach here takes inspiration from his idea, linking characteristics such as confidence and uncertainty to mathematical representations.

Stuart Russell has been a strong advocate of building in uncertainty to AI Agents, especially given the users goals. With this an agent would defer to responsible humans, asking for guidance or approval before taking action. My goal is to enable quantification of that uncertainty, in a way that is relatable, hence the focus on self-reflection and the questions above.

As with the previous Self-Consistency Agent this Self-Reflective Agent will use my Agent Design Process, based on Russell and Norvig’s concept of a Rational Agent and the requirements to build one. A Rational Agent chooses actions that maximize expected utility based on its percept sequence and knowledge. When extended to self-reflection, this means the agent must reason about its own internal states as part of its decision-making process.

Mathematical Formulation of Self-Reflective Characteristics

Core Mathematical Relationships

The agent operates on several key mathematical principles:

Shannon Entropy: The fundamental measure of uncertainty in the probability distribution

H = -Σ(p_i * log₂(p_i))

Normalized Entropy: Entropy scaled to [0,1] range for consistent interpretation

H_norm = H / log₂(n)

where n = number of unique answers

Combined Stopping Score: Scoring that balances confidence and entropy, effectively it starts with confidence then penalise with entropy uncertainty.

Score = confidence * (1 - entropy_weight * normalized_entropy)

Entropy Level Classification: Human-readable entropy categorization

normalized_entropy ≤ 0.2    → "concentrated"
0.2 < normalized_entropy ≤ 0.7 → "scattered"
normalized_entropy > 0.7     → "uniform"

Confidence Quantification

consensus_confidence = max(probability_distribution)

The agent calculates its confidence by identifying the maximum probability in its answer distribution. This represents the strength of consensus among its multiple reasoning attempts.

Consensus Classification

consensus_type = classify_distribution_pattern(probability_distribution)

The agent automatically recognizes patterns in its response distribution:

  • Strong: 80%+ agreement (low entropy, high confidence)
  • Emerging: 40-79% leading answer (medium entropy)
  • Binary: Two roughly equal options (high entropy, no clear winner)
  • Divided: No clear pattern (maximum entropy, high uncertainty)

Four Entropy Modes

The implementation supports four distinct entropy modes:

  1. “off”: Traditional confidence-only stopping (legacy behavior from the self-consistency agent)
  2. “confidence_only”: Explicit confidence-only mode
  3. “entropy_only”: Pure entropy-based stopping (stop when entropy is low)
  4. “combined”: Hybrid approach with dual-threshold system

Early Stopping Decision Function

should_stop = evaluate_stopping_criteria(confidence, entropy, response_count)

The agent combines multiple factors to make intelligent stopping decisions:

  • Confidence threshold achievement
  • Entropy-based uncertainty assessment
  • Minimum response requirements
  • Combined confidence-entropy scoring

Agent Design Process

Environment specification: PEAS analysis

Element Description
Performance Return answer with confidence assessment and uncertainty quantification
Environment User + LLM + question context
Actuators LLM queries, user responses
Sensors User input, LLM response pairs

In a deviation from Russel and Norvig’s approach I am documenting the internal aspects for clarity:

Component Description
Internal Sensors Self-monitoring of confidence, entropy, convergence
Internal Actuators Distribution calculation, consensus classification, stopping decisions

Environment Analysis

The task environment for this domain has the following characteristics:

  • Partially Observable: The agent must send m (prompt, question) to the LLM and perform a final argmax to find the most frequent answer.
  • Single Agent: Queries to the LLM can be processed in parallel or sequentially. The self-reflection will be done by one agent when all queries are returned
  • Stochastic: The environment, specifically the solution space, is stochastic. The selection of the next token uses a random variable to pick the token from a probability distribution.
  • Episodic: The final decision - i.e. which answer a is most frequent - is not dependent on other decisions made. It is stateless.
  • Static: Neither the problem nor the solution space change during the task.
  • Discrete: The output is a collection of strings of tokens.
  • Known: Whilst the internals of the LLM are unknown and stochastic, the “physics” of the environment, i.e. Agent sends a prompt and a question m times, it receives m (reasoning, answer) responses.

The Agent Function

Define the ideal behaviour - what the agent ought to do - in abstract terms (mathematical mapping from percept sequences to actions)

The agent maintains an internal state with comprehensive entropy parameters and convergence tracking capabilities.

Percepts

True Percepts (inputs from the environment):

  • Question input (from user)
  • LLM response pairs (reasoning, answer) (from LLM)

Internal Percepts (derived by the agent from percepts):

  • Entropy-based consensus intelligence
  • Convergence evolution metrics
  • Multi-mode configuration state
  • Confidence levels
  • Entropy measurements
  • Consensus type classifications

Actions

True Actions (outputs to the environment):

  • QUERY-LLM: Generate LLM responses with parsing
  • REPLY-TO-USER: Return answer and self-reflection result

Internal Actions (actions taken internally):

  • CALCULATE-DISTRIBUTION: Compute normalized probability distributions
  • ASSESS-CONFIDENCE: Calculate consensus confidence (max probability)
  • CALCULATE-ENTROPY: Compute Shannon entropy and normalized entropy
  • CALCULATE-NORMALIZED-ENTROPY: Scale entropy to [0,1] range
  • CLASSIFY-CONSENSUS: Determine consensus type with binary detection
  • CLASSIFY-ENTROPY-LEVEL: Categorize entropy as concentrated/scattered/uniform
  • EVALUATE-MULTI-MODE-STOPPING: Advanced stopping logic with entropy modes
  • ASSESS-CONVERGENCE: Dual-track confidence and entropy evolution analysis
  • SYNTHESIZE-ENHANCED-REFLECTION: Build comprehensive result with all metrics

State

What the agent tracks to make it’s decisions

  • Minimum number of queries: “How much information should I gather?”
  • Maximum number of queries: “What’s the most amount of resource I should spend on this?”
  • Confidence Assessment: “How confident am I in this answer?”
  • Uncertainty Awareness: “Do I have any uncertainty about this conclusion?”
  • Consensus Recognition: “Are my multiple reasoning attempts converging?”
  • Stopping Decision: “Have I gathered sufficient evidence to respond?”

Percept Sequence with Actions

Here is an abstraction of the enhanced percept sequence demonstrating entropy-based intelligence with entropy_mode = "combined", confidence_threshold = 0.8, and entropy_threshold = 0.3 for the question “What is the sum of the first 10 prime numbers?":

1. Basic Convergence Table

Normal happy path with early stopping due to confidence

Configuration: entropy_mode = "combined", confidence_threshold = 0.8, min_responses = 5, max_responses = 10 Question: “What is the sum of the first 10 prime numbers?”

External Percept Internal Percept Internal Action External Action
Question: “What is the sum of the first 10 prime numbers?” - - QUERY-LLM
Response1: “129” - - QUERY-LLM
Response2: “129” - - QUERY-LLM
Response3: “129” - - QUERY-LLM
Response4: “128” - - QUERY-LLM
Response5: “129” - CALCULATE-DISTRIBUTION -
- Distribution: {“129”: 0.8, “128”: 0.2} ASSESS-CONFIDENCE -
- Confidence: 0.8 CALCULATE-ENTROPY -
- Entropy: 0.72, NormalizedEntropy: 0.72 CLASSIFY-ENTROPY-LEVEL -
- EntropyLevel: “scattered” CLASSIFY-CONSENSUS -
- ConsensusType: “strong” ASSESS-CONVERGENCE -
- ConvergenceAnalysis: {…} EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: true SYNTHESIZE-REFLECTION -
- ReflectionResult: {final_answer: “129”, confidence: 0.8, …} - REPLY-TO-USER(“129”)

Result: Early stopping at 5 responses due to meeting confidence threshold with strong consensus.


2. Resource Limit Table

Agent hitting max_responses limit despite not reaching confidence threshold

Configuration: entropy_mode = "combined", confidence_threshold = 0.8, min_responses = 5, max_responses = 8 Question: “What is the most effective treatment for chronic lower back pain?”

External Percept Internal Percept Internal Action External Action
Question: “What is the most effective treatment…” - - QUERY-LLM
Response1: “Physical therapy” - - QUERY-LLM
Response2: “Exercise and stretching” - - QUERY-LLM
Response3: “Physical therapy” - - QUERY-LLM
Response4: “Medication management” - - QUERY-LLM
Response5: “Exercise and stretching” - CALCULATE-DISTRIBUTION -
- Distribution: {“Physical therapy”: 0.4, “Exercise”: 0.4, “Medication”: 0.2} ASSESS-CONFIDENCE -
- Confidence: 0.4 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response6: “Physical therapy” - CALCULATE-DISTRIBUTION -
- Distribution: {“Physical therapy”: 0.5, “Exercise”: 0.33, “Medication”: 0.17} ASSESS-CONFIDENCE -
- Confidence: 0.5 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response7: “Exercise and stretching” - CALCULATE-DISTRIBUTION -
- Distribution: {“Physical therapy”: 0.43, “Exercise”: 0.43, “Medication”: 0.14} ASSESS-CONFIDENCE -
- Confidence: 0.43 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response8: “Physical therapy” - CALCULATE-DISTRIBUTION -
- Distribution: {“Physical therapy”: 0.5, “Exercise”: 0.375, “Medication”: 0.125} ASSESS-CONFIDENCE -
- Confidence: 0.5 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: true (max_responses reached) SYNTHESIZE-REFLECTION -
- ReflectionResult: {final_answer: “Physical therapy”, confidence: 0.5, uncertainty: “high”} - REPLY-TO-USER(“Physical therapy”)

Result: Resource limit reached with moderate confidence. Agent acknowledges uncertainty in result.


3. Binary Consensus Detection Table

Controversial question with 50/50 split detection

Configuration: entropy_mode = "combined", confidence_threshold = 0.8, min_responses = 5, max_responses = 10 Question: “Is the number 1 considered prime?”

External Percept Internal Percept Internal Action External Action
Question: “Is the number 1 considered prime?” - - QUERY-LLM
Response1: “No” - - QUERY-LLM
Response2: “Yes” - - QUERY-LLM
Response3: “No” - - QUERY-LLM
Response4: “Yes” - - QUERY-LLM
Response5: “No” - CALCULATE-DISTRIBUTION -
- Distribution: {“No”: 0.6, “Yes”: 0.4} ASSESS-CONFIDENCE -
- Confidence: 0.6 CLASSIFY-CONSENSUS -
- ConsensusType: “emerging” EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response6: “Yes” - CALCULATE-DISTRIBUTION -
- Distribution: {“No”: 0.5, “Yes”: 0.5} ASSESS-CONFIDENCE -
- Confidence: 0.5 CLASSIFY-CONSENSUS -
- ConsensusType: “binary” EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (binary split detected) - QUERY-LLM
Response7: “No” - CALCULATE-DISTRIBUTION -
- Distribution: {“No”: 0.57, “Yes”: 0.43} ASSESS-CONFIDENCE -
- Confidence: 0.57 CLASSIFY-CONSENSUS -
- ConsensusType: “emerging” EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response8: “No” - CALCULATE-DISTRIBUTION -
- Distribution: {“No”: 0.625, “Yes”: 0.375} ASSESS-CONFIDENCE -
- Confidence: 0.625 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response9: “No” - CALCULATE-DISTRIBUTION -
- Distribution: {“No”: 0.67, “Yes”: 0.33} ASSESS-CONFIDENCE -
- Confidence: 0.67 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response10: “No” - CALCULATE-DISTRIBUTION -
- Distribution: {“No”: 0.7, “Yes”: 0.3} ASSESS-CONFIDENCE -
- Confidence: 0.7 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: true (max_responses reached) SYNTHESIZE-REFLECTION -
- ReflectionResult: {final_answer: “No”, confidence: 0.7, consensus_type: “emerging”} - REPLY-TO-USER(“No”)

Result: Binary split detected and resolved through continued sampling, reaching moderate confidence.


4. Early High Confidence Table

Stopping after 3-4 responses due to very high confidence (90%+)

Configuration: entropy_mode = "combined", confidence_threshold = 0.8, min_responses = 5, max_responses = 10 Question: “What is 2 + 2?”

External Percept Internal Percept Internal Action External Action
Question: “What is 2 + 2?” - - QUERY-LLM
Response1: “4” - - QUERY-LLM
Response2: “4” - - QUERY-LLM
Response3: “4” - - QUERY-LLM
Response4: “4” - - QUERY-LLM
Response5: “4” - CALCULATE-DISTRIBUTION -
- Distribution: {“4”: 1.0} ASSESS-CONFIDENCE -
- Confidence: 1.0 CALCULATE-ENTROPY -
- Entropy: 0.0, NormalizedEntropy: 0.0 CLASSIFY-ENTROPY-LEVEL -
- EntropyLevel: “concentrated” CLASSIFY-CONSENSUS -
- ConsensusType: “strong” EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: true (confidence ≥ 0.9 override) SYNTHESIZE-REFLECTION -
- ReflectionResult: {final_answer: “4”, confidence: 1.0, entropy_level: “concentrated”} - REPLY-TO-USER(“4”)

Result: Perfect consensus achieved at minimum responses, high confidence override triggered.


5. Entropy-Only Mode Table

Same question but with entropy_mode = “entropy_only”

Configuration: entropy_mode = "entropy_only", entropy_threshold = 0.3, min_responses = 5, max_responses = 10 Question: “What is the sum of the first 10 prime numbers?”

External Percept Internal Percept Internal Action External Action
Question: “What is the sum of the first 10 prime numbers?” - - QUERY-LLM
Response1: “129” - - QUERY-LLM
Response2: “129” - - QUERY-LLM
Response3: “129” - - QUERY-LLM
Response4: “128” - - QUERY-LLM
Response5: “129” - CALCULATE-DISTRIBUTION -
- Distribution: {“129”: 0.8, “128”: 0.2} CALCULATE-ENTROPY -
- Entropy: 0.72, NormalizedEntropy: 0.72 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (normalized_entropy > 0.3) - QUERY-LLM
Response6: “129” - CALCULATE-DISTRIBUTION -
- Distribution: {“129”: 0.83, “128”: 0.17} CALCULATE-ENTROPY -
- Entropy: 0.65, NormalizedEntropy: 0.65 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (normalized_entropy > 0.3) - QUERY-LLM
Response7: “129” - CALCULATE-DISTRIBUTION -
- Distribution: {“129”: 0.86, “128”: 0.14} CALCULATE-ENTROPY -
- Entropy: 0.54, NormalizedEntropy: 0.54 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (normalized_entropy > 0.3) - QUERY-LLM
Response8: “129” - CALCULATE-DISTRIBUTION -
- Distribution: {“129”: 0.875, “128”: 0.125} CALCULATE-ENTROPY -
- Entropy: 0.48, NormalizedEntropy: 0.48 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (normalized_entropy > 0.3) - QUERY-LLM
Response9: “129” - CALCULATE-DISTRIBUTION -
- Distribution: {“129”: 0.89, “128”: 0.11} CALCULATE-ENTROPY -
- Entropy: 0.43, NormalizedEntropy: 0.43 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (normalized_entropy > 0.3) - QUERY-LLM
Response10: “129” - CALCULATE-DISTRIBUTION -
- Distribution: {“129”: 0.9, “128”: 0.1} CALCULATE-ENTROPY -
- Entropy: 0.37, NormalizedEntropy: 0.37 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: true (max_responses reached) SYNTHESIZE-REFLECTION -
- ReflectionResult: {final_answer: “129”, confidence: 0.9, entropy: 0.37} - REPLY-TO-USER(“129”)

Result: Entropy-only mode requires very low entropy (0.3), so agent used all responses despite high confidence.


6. Low Confidence Continuation Table

Agent continuing to gather evidence when confidence stays low

Configuration: entropy_mode = "combined", confidence_threshold = 0.8, min_responses = 5, max_responses = 10 Question: “What will be the most important technological breakthrough in the next decade?”

External Percept Internal Percept Internal Action External Action
Question: “What will be the most important technological breakthrough…” - - QUERY-LLM
Response1: “Artificial General Intelligence” - - QUERY-LLM
Response2: “Quantum computing” - - QUERY-LLM
Response3: “Fusion energy” - - QUERY-LLM
Response4: “Brain-computer interfaces” - - QUERY-LLM
Response5: “Artificial General Intelligence” - CALCULATE-DISTRIBUTION -
- Distribution: {“AGI”: 0.4, “Quantum”: 0.2, “Fusion”: 0.2, “BCI”: 0.2} ASSESS-CONFIDENCE -
- Confidence: 0.4 CLASSIFY-CONSENSUS -
- ConsensusType: “emerging” EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response6: “Quantum computing” - CALCULATE-DISTRIBUTION -
- Distribution: {“AGI”: 0.33, “Quantum”: 0.33, “Fusion”: 0.17, “BCI”: 0.17} ASSESS-CONFIDENCE -
- Confidence: 0.33 CLASSIFY-CONSENSUS -
- ConsensusType: “divided” EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response7: “Fusion energy” - CALCULATE-DISTRIBUTION -
- Distribution: {“AGI”: 0.29, “Quantum”: 0.29, “Fusion”: 0.29, “BCI”: 0.14} ASSESS-CONFIDENCE -
- Confidence: 0.29 CLASSIFY-CONSENSUS -
- ConsensusType: “divided” EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response8: “Artificial General Intelligence” - CALCULATE-DISTRIBUTION -
- Distribution: {“AGI”: 0.375, “Quantum”: 0.25, “Fusion”: 0.25, “BCI”: 0.125} ASSESS-CONFIDENCE -
- Confidence: 0.375 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response9: “Artificial General Intelligence” - CALCULATE-DISTRIBUTION -
- Distribution: {“AGI”: 0.44, “Quantum”: 0.22, “Fusion”: 0.22, “BCI”: 0.11} ASSESS-CONFIDENCE -
- Confidence: 0.44 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: false (confidence < 0.8) - QUERY-LLM
Response10: “Brain-computer interfaces” - CALCULATE-DISTRIBUTION -
- Distribution: {“AGI”: 0.4, “Quantum”: 0.2, “Fusion”: 0.2, “BCI”: 0.2} ASSESS-CONFIDENCE -
- Confidence: 0.4 EVALUATE-MULTI-MODE-STOPPING -
- EarlyStop: true (max_responses reached) SYNTHESIZE-REFLECTION -
- ReflectionResult: {final_answer: “AGI”, confidence: 0.4, uncertainty: “high”} - REPLY-TO-USER(“AGI”)

Result: Agent exhausted resources on complex question, acknowledges high uncertainty in final answer.


7. Mode Comparison Table

Side-by-side showing same question with different entropy modes

Question: “What is the sum of the first 10 prime numbers?” Base responses: [“129”, “129”, “129”, “128”, “129”] → Distribution: {“129”: 0.8, “128”: 0.2}, Confidence: 0.8, NormalizedEntropy: 0.72

Entropy Mode Configuration Stopping Decision Reasoning
“off” confidence_threshold = 0.8 STOP Confidence (0.8) meets threshold
“confidence_only” confidence_threshold = 0.8 STOP Confidence (0.8) meets threshold
“entropy_only” entropy_threshold = 0.3 CONTINUE NormalizedEntropy (0.72) > threshold (0.3)
“combined” confidence_threshold = 0.8, entropy_threshold = 0.3 STOP Confidence override: 0.8 ≥ 0.8

Analysis:

  • Confidence-based modes stop immediately when threshold is met
  • Entropy-only mode requires very concentrated responses (low entropy)
  • Combined mode uses confidence override, but would apply entropy weighting in borderline cases

Internal State Data Structure

The agent tracks evolution, this is an example of the internal state:

ConvergenceAnalysis: {
    confidence_evolution: [0.5, 0.67, 0.75, 0.8, 0.8],
    entropy_evolution: [1.0, 0.92, 0.81, 0.72, 0.72],
    convergence_rate: 0.06,          # Confidence increasing by 6% per response
    final_stability: 1.0,            # Perfectly stable final confidence
    entropy_convergence_rate: -0.056, # Entropy decreasing by 5.6% per response
    entropy_final_stability: 1.0     # Perfectly stable final entropy
}

This demonstrates how the agent achieves mathematical self-awareness through comprehensive entropy intelligence and stopping logic.

Defining the Agent Function

The self-reflection agent incorporates entropy-based intelligence and multi-mode early stopping:

function SELF-REFLECTION-AGENT(percept) returns an action
    persistent: state, agent state with entropy intelligence
                responses, collection of (reasoning, answer) pairs
                question, current question to answer
                confidence_threshold, stopping threshold (default 0.8)
                entropy_threshold, entropy stopping threshold (default 0.3)
                entropy_weight, weight in combined scoring (default 0.3)
                entropy_mode, stopping mode (default "combined")
                min_responses, minimum responses required (default 5)
                min_entropy_samples, minimum samples for entropy (default 4)
                max_responses, maximum responses allowed (default 10)

    # Update state with new percept
    if percept contains question:
        state.question ← question
        state.responses ← []
        return QUERY-LLM(state.question)
    elif percept contains (reasoning, answer):
        state.responses.append((reasoning, answer))
        
        # Decision: Continue querying or provide final answer?
        if should_continue_querying():
            return QUERY-LLM(state.question)
        else:
            # Perform all internal analysis and return final result
            final_result ← perform_complete_internal_analysis()
            return REPLY-TO-USER(final_result)

function should_continue_querying() returns boolean
    current_responses ← length(state.responses)
    
    # Must have minimum responses
    if current_responses < state.min_responses:
        return true
    
    # Must not exceed maximum responses
    if current_responses ≥ state.max_responses:
        return false
    
    # Perform internal analysis to make stopping decision
    perform_internal_analysis()
    
    # Apply stopping logic based on entropy mode
    return not evaluate_stopping_criteria()

function perform_internal_analysis()
    # Internal computation: Calculate probability distribution
    answers ← extract_answers_from(state.responses)
    count ← {}
    for answer in answers:
        count[answer] ← count.get(answer, 0) + 1
    
    total ← length(answers)
    state.distribution ← {}
    for answer, freq in count:
        state.distribution[answer] ← freq / total
    
    # Internal computation: Assess confidence
    state.confidence ← max(state.distribution.values())
    
    # Internal computation: Calculate entropy
    state.entropy ← 0
    for probability in state.distribution.values():
        if probability > 0:
            state.entropy ← state.entropy - (probability × log₂(probability))
    
    # Internal computation: Calculate normalized entropy
    if length(state.distribution) ≤ 1:
        state.normalized_entropy ← 0.0
    else:
        max_entropy ← log₂(length(state.distribution))
        state.normalized_entropy ← state.entropy / max_entropy if max_entropy > 0 else 0.0
    
    # Internal computation: Classify entropy level
    if state.normalized_entropy ≤ 0.2:
        state.entropy_level ← "concentrated"
    elif state.normalized_entropy ≤ 0.7:
        state.entropy_level ← "scattered"
    else:
        state.entropy_level ← "uniform"
    
    # Internal computation: Classify consensus type
    state.consensus_type ← classify_consensus_type(state.distribution)
    
    # Internal computation: Assess convergence
    state.convergence_analysis ← assess_convergence_evolution()

function classify_consensus_type(distribution) returns consensus_type
    if distribution is empty:
        return "undefined"
    
    probabilities ← sort(distribution.values(), descending=true)
    max_prob ← probabilities[0]
    
    # Binary split: Two main answers roughly equal (check this first)
    if length(probabilities) ≥ 2 and probabilities[1] ≥ 0.35:
        if abs(probabilities[0] - probabilities[1]) ≤ 0.15:
            return "binary"
    
    # Strong consensus: One answer dominates significantly (80%+)
    if max_prob ≥ 0.8:
        return "strong"
    
    # Emerging consensus: One answer leading but not dominant (40-79%)
    if max_prob ≥ 0.4:
        return "emerging"
    
    # Divided: No clear leader (under 40%)
    return "divided"

function assess_convergence_evolution() returns convergence_analysis
    if length(state.responses) < 2:
        return {
            confidence_evolution: [state.confidence] if state.responses else [],
            entropy_evolution: [state.normalized_entropy] if state.responses else [],
            convergence_rate: 0.0,
            final_stability: 1.0,
            entropy_convergence_rate: 0.0,
            entropy_final_stability: 1.0
        }
    
    confidences_over_time ← []
    entropies_over_time ← []
    
    # Calculate confidence and entropy evolution
    for i ← 1 to length(state.responses):
        subset_responses ← state.responses[1:i]
        subset_answers ← extract_answers_from(subset_responses)
        subset_counts ← count_occurrences(subset_answers)
        subset_total ← length(subset_answers)
        
        # Calculate confidence for subset
        if subset_total > 0:
            max_count ← max(subset_counts.values())
            confidence ← max_count / subset_total
        else:
            confidence ← 0.0
        
        # Calculate entropy for subset
        if subset_total > 0:
            subset_distribution ← {}
            for answer, count in subset_counts:
                subset_distribution[answer] ← count / subset_total
            
            entropy ← calculate_entropy(subset_distribution)
            if length(subset_distribution) > 1:
                normalized_entropy ← entropy / log₂(length(subset_distribution))
            else:
                normalized_entropy ← 0.0
        else:
            entropy ← 0.0
            normalized_entropy ← 0.0
        
        confidences_over_time.append(confidence)
        entropies_over_time.append(normalized_entropy)
    
    return {
        confidence_evolution: confidences_over_time,
        entropy_evolution: entropies_over_time,
        convergence_rate: calculate_convergence_rate(confidences_over_time),
        final_stability: assess_stability(confidences_over_time),
        entropy_convergence_rate: calculate_entropy_convergence_rate(entropies_over_time),
        entropy_final_stability: assess_entropy_stability(entropies_over_time)
    }

function evaluate_stopping_criteria() returns boolean
    current_responses ← length(state.responses)
    
    # Handle different entropy modes
    if state.entropy_mode = "off" or state.entropy_mode = "confidence_only":
        # Traditional confidence-only stopping
        return state.confidence ≥ state.confidence_threshold
    
    # Need minimum samples for entropy to be meaningful
    if current_responses < state.min_entropy_samples:
        return state.confidence ≥ state.confidence_threshold
    
    if state.entropy_mode = "entropy_only":
        # Stop only based on entropy (low entropy = concentrated)
        return state.normalized_entropy ≤ state.entropy_threshold
    
    elif state.entropy_mode = "combined":
        # Combined scoring: balance confidence and entropy
        
        # High confidence overrides entropy concerns
        if state.confidence ≥ 0.9:
            return true
        
        # Check confidence threshold first
        if state.confidence ≥ state.confidence_threshold:
            # High confidence + low entropy = strong consensus, stop early
            if state.normalized_entropy ≤ state.entropy_threshold:
                return true
            # High confidence + high entropy = check if really confident
            elif state.confidence ≥ 0.8:
                return true
        
        # Calculate combined score: confidence weighted by entropy concentration
        entropy_factor ← 1.0 - (state.entropy_weight × state.normalized_entropy)
        combined_score ← state.confidence × entropy_factor
        
        # Use a slightly lower threshold for combined scoring
        return combined_score ≥ (state.confidence_threshold × 0.9)
    
    # Default fallback
    return state.confidence ≥ state.confidence_threshold

function perform_complete_internal_analysis() returns reflection_result
    # Ensure all internal analysis is complete
    perform_internal_analysis()
    
    # Internal computation: Synthesize final reflection result
    final_answer ← argmax(state.distribution)
    
    reflection_result ← {
        final_answer: final_answer,
        consensus_confidence: state.confidence,
        answer_distribution: state.distribution,
        uncertainty_level: categorize_uncertainty(state.confidence),
        early_stopping: length(state.responses) < state.max_responses,
        total_responses: length(state.responses),
        convergence_analysis: state.convergence_analysis,
        distribution_entropy: state.entropy,
        normalized_entropy: state.normalized_entropy,
        entropy_level: state.entropy_level,
        consensus_type: state.consensus_type
    }
    
    return reflection_result

# External Actions (only these interact with environment)
function QUERY-LLM(question) returns response
    prompt ← create_chain_of_thought_prompt(question)
    response ← send_to_llm(prompt)
    return response

function REPLY-TO-USER(reflection_result) returns formatted_response
    formatted_response ← format_reflection_response(reflection_result)
    return formatted_response

# Helper Functions for Internal Computations
function categorize_uncertainty(confidence) returns uncertainty_level
    if confidence ≥ 0.8:
        return "low"
    elif confidence ≥ 0.6:
        return "medium"
    else:
        return "high"

function calculate_convergence_rate(confidences) returns rate
    if length(confidences) < 2:
        return 0.0
    return (confidences[-1] - confidences[0]) / length(confidences)

function assess_stability(confidences) returns stability
    if length(confidences) < 3:
        return 1.0
    last_three ← confidences[-3:]
    return 1.0 - (max(last_three) - min(last_three))

function calculate_entropy_convergence_rate(entropies) returns rate
    if length(entropies) < 2:
        return 0.0
    return (entropies[-1] - entropies[0]) / length(entropies)

function assess_entropy_stability(entropies) returns stability
    if length(entropies) < 3:
        return 1.0
    last_three ← entropies[-3:]
    return 1.0 - (max(last_three) - min(last_three))

function calculate_entropy(distribution) returns entropy
    H ← 0
    for probability in distribution.values():
        if probability > 0:
            H ← H - (probability × log₂(probability))
    return H

function extract_answers_from(responses) returns answers
    answers ← []
    for (reasoning, answer) in responses:
        answers.append(answer)
    return answers

function count_occurrences(answers) returns counts
    counts ← {}
    for answer in answers:
        counts[answer] ← counts.get(answer, 0) + 1
    return counts

function argmax(distribution) returns max_key
    max_prob ← 0
    max_key ← ""
    for key, prob in distribution:
        if prob > max_prob:
            max_prob ← prob
            max_key ← key
    return max_key

function create_chain_of_thought_prompt(question) returns prompt
    return "Please think step by step and provide your reasoning: " + question

function format_reflection_response(result) returns formatted_response
    return "Final Answer: " + result.final_answer + 
           " (Confidence: " + result.consensus_confidence + 
           ", Uncertainty: " + result.uncertainty_level + ")"

This algorithm represents the agent’s complete self-reflective analysis with entropy-based intelligence, combining mathematical rigor with practical early stopping strategies to create a proof of concept framework for artificial self-awareness.

The Agent Program Section Layout

Background for Implementation Decisions

The decisions are the same as those I made for the Self-Consistency agent in Building a Self-Consistency LLM-Agent: From PEAS Analysis to…. I repeat them here as a reminder and also as I am testing OpenHands/Mistral. Where possible, I prefer Open Source and Open Weights for this research work, however that is at the initial stage and I do not wish to commit to either right now.

Domain Driven Design and SOLID Principles

  • Explanation of DDD approach for entropy-based intelligence
  • SOLID principles application to self-reflection architecture
  • Benefits of immutable domain entities for complex state tracking
  • Separation of concerns: entropy calculation, consensus classification, convergence analysis

Development Tools and Methodology

  • Python shall be used
  • Claude Code usage for algorithm implementation in Python (pending testing with OpenHands and Mistral!)
  • CLAUDE.md development documentation approach (see above!!)
  • Testing strategy for entropy modes and edge cases
  • Version control strategy for multi-mode configurations
  • No need of linting, typing, or other formatting checks for the demo (changes if this is multi-developer)
  • No CI/CD for the demo (changes if this is multi-developer)
  • LiteLLM or OpenRouter to be used (a decision in the Cohere Labs ML Agent’s group that I wish to respect)

Complexity Analysis of Self-Reflection Operations

In defining the Agent Program for the Self-Consistency agent we had to be aware of the complexity (due to questions around a particular Mathematical notation) and how an incorrect choice in Python data type would result in a computation complexity of O(m²).

The mathematics are different with this agent as it is using probability distributions and entropy. Below highlights why this is not a concern.

Entropy Calculation Complexity

Core operations complexity analysis
  • Distribution calculation: O(m) where m = number of responses
  • Entropy calculation: O(k) where k = unique answers
  • Convergence analysis: O(m) for evolution tracking
  • Consensus classification: O(k log k) for sorting probabilities
  • Total per decision cycle: O(m + k log k)
Real World Impact
  • Distribution Calculation O(m): For 10 responses, requires 10 operations to count answers - scales linearly with response volume
  • Entropy Calculation O(k): For 3 unique answers, requires 3 logarithmic operations - very fast even with diverse responses
  • Convergence Analysis O(m): For 10 responses, recalculates confidence/entropy 10 times - creates detailed evolution tracking
  • Consensus Classification O(k log k): For 5 unique answers, requires ~12 operations to sort probabilities - negligible overhead
  • Combined per decision: O(m + k log k): For 10 responses with 3 unique answers, approximately 25 operations total

Real-World Performance Implications

  • Self-Consistency: Processes 1000 responses in milliseconds (simple counting)
  • Self-Reflection: Processes 1000 responses in ~10 milliseconds (entropy calculations add minimal overhead)
  • Bottleneck Reality: LLM query time (1-5 seconds) dominates computational overhead by 1000x
  • Practical Impact: Complexity differences irrelevant compared to network/LLM latency
  • Efficiency Trade-off: Having an early stop mechanism based on these calculations could save 50-70% of LLM calls (equating to microseconds of computation vs dollars of tokens)

Comparison with Self-Consistency Agent

Operation Self-Consistency Self-Reflection
Decision Logic O(m) majority vote O(m + k log k) entropy analysis
State Tracking O(m) responses O(m + k) comprehensive state
Stopping Criteria O(1) simple threshold O(k) multi-mode evaluation

Domain Objects and Entities

Core Domain Entities

@dataclass(frozen=True)
class LLMResponse:
    """Immutable Domain entity representing a single LLM response."""
    reasoning: str
    answer: str
    timestamp: datetime  # For convergence analysis
    
@dataclass(frozen=True)
class ConvergenceAnalysis:
    """Immutable analysis of response evolution."""
    confidence_evolution: List[float]
    entropy_evolution: List[float]
    convergence_rate: float
    final_stability: float
    entropy_convergence_rate: float
    entropy_final_stability: float

@dataclass(frozen=True)
class ReflectionResult:
    """Comprehensive result object with entropy intelligence."""
    final_answer: str
    consensus_confidence: float
    answer_distribution: Dict[str, float]
    uncertainty_level: str
    early_stopping: bool
    total_responses: int
    convergence_analysis: ConvergenceAnalysis
    distribution_entropy: float
    normalized_entropy: float
    entropy_level: str
    consensus_type: str

Configuration Domain Objects

@dataclass
class ReflectionConfig:
    """Configuration for entropy-based self-reflection."""
    llm_interface: LLMInterface
    confidence_threshold: float = 0.8
    entropy_threshold: float = 0.3
    entropy_weight: float = 0.3
    entropy_mode: str = "combined"  # "off", "confidence_only", "entropy_only", "combined"
    min_responses: int = 5
    min_entropy_samples: int = 4
    max_responses: int = 10
    prompt_template: str = ""

@dataclass
class AgentState:
    """Comprehensive state tracking for self-reflection."""
    question: str
    responses: List[LLMResponse]
    distribution: Dict[str, float]
    confidence: float
    entropy: float
    normalized_entropy: float
    entropy_level: str
    consensus_type: str
    convergence_analysis: ConvergenceAnalysis

Interface Design and Abstraction

LLM Interface with Enhanced Capabilities

class LLMInterface(ABC):
    """Abstract interface for LLM interactions with entropy support."""
    
    @abstractmethod
    async def generate_response(self, prompt: str, question: str) -> LLMResponse:
        """Generate a single LLM response for entropy analysis."""
        pass
    
    @abstractmethod
    async def generate_batch_responses(self, prompt: str, question: str, count: int) -> List[LLMResponse]:
        """Generate multiple responses for parallel processing."""
        pass

class EnhancedLiteLLMAdapter(LLMInterface):
    """Enhanced LiteLLM adapter with entropy-optimized parameters."""
    
    def __init__(self, model: str, temperature: float = 0.7, **kwargs):
        self.model = model
        self.temperature = temperature
        self.entropy_optimized = kwargs.get('entropy_optimized', True)
        self.kwargs = kwargs

Entropy Intelligence Services

class EntropyCalculator:
    """Service for entropy calculations and analysis."""
    
    @staticmethod
    def calculate_shannon_entropy(distribution: Dict[str, float]) -> float:
        """Calculate Shannon entropy for answer distribution."""
        pass
    
    @staticmethod
    def normalize_entropy(entropy: float, unique_answers: int) -> float:
        """Normalize entropy to [0,1] range."""
        pass
    
    @staticmethod
    def classify_entropy_level(normalized_entropy: float) -> str:
        """Classify entropy as concentrated/scattered/uniform."""
        pass

class ConsensusClassifier:
    """Service for consensus pattern recognition."""
    
    @staticmethod
    def classify_consensus(distribution: Dict[str, float]) -> str:
        """Classify consensus type with binary split detection."""
        pass
    
    @staticmethod
    def detect_binary_split(distribution: Dict[str, float]) -> bool:
        """Detect binary consensus patterns."""
        pass

Core Implementation Architecture

Main Self-Reflection Agent

class SelfReflectionAgent:
    """Main agent implementing entropy-based self-reflection."""
    
    def __init__(self, config: ReflectionConfig, question: str):
        self._config = config
        self._state = AgentState(question=question, responses=[], ...)
        self._entropy_calculator = EntropyCalculator()
        self._consensus_classifier = ConsensusClassifier()
        self._convergence_analyzer = ConvergenceAnalyzer()
    
    def process_question(self) -> ReflectionResult:
        """Main processing loop with entropy-based stopping."""
        while self._should_continue_querying():
            response = await self._query_llm()
            self._state.responses.append(response)
            self._update_internal_state()
        
        return self._synthesize_final_result()
    
    def _should_continue_querying(self) -> bool:
        """Multi-mode stopping decision with entropy intelligence."""
        if len(self._state.responses) < self._config.min_responses:
            return True
        
        if len(self._state.responses) >= self._config.max_responses:
            return False
        
        return not self._evaluate_stopping_criteria()
    
    def _evaluate_stopping_criteria(self) -> bool:
        """Stopping logic with four entropy modes."""
        # Implementation of entropy modes: off, confidence_only, entropy_only, combined
        pass
    
    def _update_internal_state(self):
        """Update all internal state with latest response."""
        self._calculate_distribution()
        self._assess_confidence()
        self._calculate_entropy()
        self._classify_consensus()
        self._assess_convergence()
    
    def _calculate_distribution(self):
        """Calculate probability distribution - O(m) complexity."""
        answers = [response.answer for response in self._state.responses]
        counts = Counter(answers)
        total = len(answers)
        self._state.distribution = {answer: count/total for answer, count in counts.items()}
    
    def _assess_confidence(self):
        """Assess consensus confidence - O(k) complexity."""
        self._state.confidence = max(self._state.distribution.values()) if self._state.distribution else 0.0
    
    def _calculate_entropy(self):
        """Calculate Shannon and normalized entropy - O(k) complexity."""
        self._state.entropy = self._entropy_calculator.calculate_shannon_entropy(self._state.distribution)
        unique_answers = len(self._state.distribution)
        self._state.normalized_entropy = self._entropy_calculator.normalize_entropy(
            self._state.entropy, unique_answers
        )
        self._state.entropy_level = self._entropy_calculator.classify_entropy_level(
            self._state.normalized_entropy
        )
    
    def _classify_consensus(self):
        """Classify consensus type - O(k log k) complexity."""
        self._state.consensus_type = self._consensus_classifier.classify_consensus(
            self._state.distribution
        )
    
    def _assess_convergence(self):
        """Assess convergence evolution - O(m) complexity."""
        self._state.convergence_analysis = self._convergence_analyzer.analyze_evolution(
            self._state.responses
        )

Convergence Analysis Implementation

class ConvergenceAnalyzer:
    """Service for analyzing response convergence patterns."""
    
    def analyze_evolution(self, responses: List[LLMResponse]) -> ConvergenceAnalysis:
        """Analyze dual-track confidence and entropy evolution."""
        if len(responses) < 2:
            return self._create_minimal_analysis(responses)
        
        confidences = []
        entropies = []
        
        for i in range(1, len(responses) + 1):
            subset = responses[:i]
            confidence, entropy = self._calculate_subset_metrics(subset)
            confidences.append(confidence)
            entropies.append(entropy)
        
        return ConvergenceAnalysis(
            confidence_evolution=confidences,
            entropy_evolution=entropies,
            convergence_rate=self._calculate_convergence_rate(confidences),
            final_stability=self._assess_stability(confidences),
            entropy_convergence_rate=self._calculate_convergence_rate(entropies),
            entropy_final_stability=self._assess_stability(entropies)
        )

Multi-Mode Stopping Logic Implementation

Entropy Mode Handlers

class StoppingCriteriaEvaluator:
    """Evaluator for multi-mode stopping decisions."""
    
    def __init__(self, config: ReflectionConfig):
        self._config = config
    
    def evaluate(self, state: AgentState) -> bool:
        """Evaluate stopping criteria based on entropy mode."""
        handlers = {
            "off": self._evaluate_off_mode,
            "confidence_only": self._evaluate_confidence_only,
            "entropy_only": self._evaluate_entropy_only,
            "combined": self._evaluate_combined_mode
        }
        
        handler = handlers.get(self._config.entropy_mode, self._evaluate_combined_mode)
        return handler(state)
    
    def _evaluate_combined_mode(self, state: AgentState) -> bool:
        """Combined scoring with entropy weighting."""
        # High confidence override
        if state.confidence >= 0.9:
            return True
        
        # Dual-threshold check
        if (state.confidence >= self._config.confidence_threshold and 
            state.normalized_entropy <= self._config.entropy_threshold):
            return True
        
        # Combined scoring
        entropy_factor = 1.0 - (self._config.entropy_weight * state.normalized_entropy)
        combined_score = state.confidence * entropy_factor
        
        return combined_score >= (self._config.confidence_threshold * 0.9)

Program Architectural Decisions Summary

The Agent Function defines the ideal behavior of our self-reflection agent in abstract terms. The following architectural decisions translate this theoretical framework into a practical, maintainable implementation:

Key Design Choices

  1. Entropy Intelligence: Four distinct modes for maximum flexibility
  2. Immutable Entities: All domain objects are frozen dataclasses
  3. Service-Oriented Architecture: Separate services for entropy, consensus, convergence
  4. O(m + k log k) Complexity: Optimized for response volume and answer diversity
  5. Comprehensive State Tracking: Full dual-track evolution monitoring
  6. Interface Abstraction: Clean separation between agent logic and LLM communication

Performance Optimizations

  1. Lazy Evaluation: Entropy calculations only when needed
  2. Efficient Distribution: Counter-based O(m) distribution calculation
  3. Minimal Memory: Reuse of calculation results where possible
  4. Parallel Processing: Batch response generation capability

Configuration Flexibility

  1. Four Entropy Modes: Complete range from traditional to entropy-pure
  2. Tunable Parameters: All thresholds and weights configurable
  3. Model Agnostic: Works with any LLM via interface abstraction
  4. Extensible: Easy to add new entropy modes or consensus types

Conclusion

Sorry I don’t have a great conclusion at the moment! :) I have implemented a version of this to prove that it “works”, however I have redone all of the Agent Decision Process since and will do the same with the code.

My plan is to implement this aginst Small Language Models - mainly due to cost - and see what the data looks like. Then I shall be cleaerer on conclusions!!

This was a great learning experience for me, I hope you get something from what I’m sharing here.

Other work?

I am uncertain if this is an approach others have tried as well, however this paper by Jekaterina Novikova et al. on Consistency in Language Models: Current Landscape, Challenges, and Future Directions leads me to think there isn’t much in the way of Consistency analysis.

Let me know what you make of it!