└── AHIRD_FRAMEWORK/ ├── PAPER_AHIRD.md Content:

A-HIRD Framework: A Testing & Debugging Approach for AI Code Assistants

Why Existing Frameworks Don't Work for Testing

Most AI agent frameworks are designed around execution tasks - scenarios where you know exactly what you want to accomplish and need to prevent the AI from misinterpreting your instructions. The popular IPEV framework (Intent-Plan-Execute-Verify) exemplifies this approach: it requires agents to explicitly state their plan before taking any action, then verify the results afterward.

IPEV works great for tasks like "process these files and generate a report" or "deploy this code to production." But it fails for testing and debugging because:

Testing is exploratory - you don't know what you'll find until you look
Debugging requires speed - slow iteration kills your problem-solving flow
Investigation branches unpredictably - you can't plan a linear sequence when each discovery changes your next move

What we need is a framework designed specifically for discovery-driven work where learning and understanding are the primary goals.

The A-HIRD Framework: Built for Discovery

A-HIRD (Anticipate-Hypothesis-Investigate-Reflect-Decide) structures the natural thought process of effective debugging. Instead of forcing predetermined plans, it organizes the cycle of orienting, forming theories, testing them quickly, and adapting based on what you learn.

The Five-Phase Cycle

1. ANTICIPATE (The "Context Scan")

Purpose: Briefly scan the immediate context to identify key technologies and potential patterns before forming a hypothesis.

Format: "The core technology is [library/framework]. I anticipate this involves [common pattern/constraint], such as [specific example]."

Examples:

"The core library is crewai. I anticipate this involves Pydantic models, which means strict type validation and potentially immutable objects."
"I'm working with React Hooks. I anticipate issues related to dependency arrays and stale closures."
"This involves async functions in Python. I anticipate the need to handle event loops and use await correctly."

Key: This proactive step primes the debugging process, shifting from a purely reactive stance to one of informed caution.

2. HYPOTHESIS (The "Theory")

Purpose: Articulate your current best guess about what's happening, including a measurable success criterion.

Format: "I suspect [specific theory] because [observable evidence], and the expected outcome is [specific, measurable result]."

Examples:

"I suspect the API timeout is caused by a database lock because the error only happens during high-traffic periods, and the expected outcome is that the query time will exceed 5 seconds."
"I think this React component isn't re-rendering because the state object reference hasn't changed. The expected outcome is that logging the object's ID before and after the state update will show the same ID."
"The memory leak might be from event listeners not being cleaned up in useEffect. The expected outcome is that the test will pass with a 1 passed message."

Key: Keep hypotheses specific and testable, with a clear definition of success.

3. INVESTIGATE (The "Quick Test")

Purpose: Design the minimal experiment to test your hypothesis.

Characteristics:

Fast: Should take seconds to minutes, not hours
Focused: Tests one specific aspect of your hypothesis
Reversible: Easy to undo if it breaks something
Observable: Produces clear, interpretable results

Common Investigation Techniques:

Add logging statements to trace execution flow
Write throwaway test cases for specific scenarios
Use debugger breakpoints at critical points
Make isolated code changes to test theories
Query databases/APIs with specific parameters
Run focused subsets of your test suite
Create minimal reproduction cases

Example Investigation Plans:

"Add console.log to track when useEffect cleanup runs."
"Write a unit test that simulates the timeout condition."
"Check database query execution time with EXPLAIN."
"Create minimal reproduction with just the problematic component."

4. REFLECT (The "What Did We Learn?")

Purpose: Interpret results, update your understanding, and extract reusable knowledge.

Questions to Answer:

Did this confirm or contradict my hypothesis?
What new information did I discover?
What does this tell me about the broader system?
If there was a failure, what is the single, memorable "Key Learning"?

Result Categories:

✅ Confirmed: "The timeout IS caused by database locks - query time jumps from 50ms to 30s during peak hours."
❌ Refuted: "Event listeners ARE being cleaned up properly - the leak must be elsewhere."
Key Learning: The memory leak is not related to component lifecycle event listeners.
🤔 Partial: "State object reference is changing, but component still not re-rendering - need to check memo dependencies."
🆕 New Discovery: "Found unexpected N+1 query pattern that explains the performance issue."
Key Learning: crewai Agent objects are immutable after creation; attributes cannot be set directly on an instance.

5. DECIDE (The "Next Move")

Purpose: Choose your next action based on what you learned, justifying why it's the most efficient path.

Decision Types:

Continue Investigating:

Dive deeper into the same area
Test a refined version of your hypothesis

Pivot Investigation:

Switch to investigating a different theory
Follow newly discovered leads

Implement Solution:

Apply the fix you've identified
Write proper tests to prevent regression

Escalate or Document:

Get help from a domain expert
Document findings for future reference

Justification: Briefly explain why this is the most logical next step (e.g., "Pivot to class-based mocking, as it directly addresses the immutability error discovered in the Reflect step.").

A-HIRD in Practice: A Real Example

Problem: React app becomes unresponsive after user interactions.

Cycle 1

ANTICIPATE: The code involves React component lifecycles. I anticipate potential issues with useEffect cleanup logic. HYPOTHESIS: I suspect a memory leak from components not unmounting properly. The expected outcome is that the browser's memory profiler will show a steady increase in detached DOM nodes. INVESTIGATE: Add memory profiling and component lifecycle logging.

useEffect(() => {
  console.log("Component mounted:", componentName);
  return () => console.log("Component unmounted:", componentName);
}, []);

REFLECT: ✅ Confirmed - Components are mounting but not unmounting when expected, and memory usage is climbing. DECIDE: Continue investigating. This is the right path, so I will dive deeper into what's preventing unmounting.

Cycle 2

ANTICIPATE: I am inspecting the parent-child component relationship in React. I anticipate that state management or direct references could be preventing garbage collection. HYPOTHESIS: I suspect the parent component is holding direct references to child components, preventing unmounting. The expected outcome is that React DevTools will show the "unmounted" components still present in the component tree. INVESTIGATE: Check React DevTools component tree and add ref tracking. REFLECT: ❌ Refuted - The parent is correctly removing the child from the render tree.

Key Learning: The issue is not related to the React render tree itself; something external must be holding a reference. DECIDE: Pivot investigation. I will look for non-React systems like global event listeners or timers, as this directly follows from the Key Learning.

Cycle 3

ANTICIPATE: The focus is now on browser APIs (window, document). I anticipate that listeners or timers created in useEffect might lack a cleanup function. HYPOTHESIS: I suspect event listeners or timers attached to window/document are not being cleaned up. The expected outcome is that logging will show an addEventListener call without a corresponding removeEventListener call. INVESTIGATE: Add logging to all addEventListener/removeEventListener calls and check for setInterval. REFLECT: 🆕 New Discovery - Found an interval timer (setInterval) in a useEffect hook without a cleanup function returning clearInterval. DECIDE: Implement solution. This is the clear root cause, so I will add the proper cleanup function to the useEffect hook.

Implementation Guide for AI Assistants

Session Setup Template```markdown

Debug Session: [Brief Problem Description]

Context: [Codebase area, recent changes, error symptoms] Time Budget: [How long before escalating/taking break] Risk Level: [Can we safely experiment? Need to be careful?]

Initial Hypothesis: [Your starting theory]

Investigation Log

### Cycle Documentation
```markdown
### Cycle N: [Timestamp]

**ANTICIPATE:** [Key library/technology and its common patterns]

**HYPOTHESIS:** [Specific, testable theory with an expected, measurable outcome]

**INVESTIGATE:**
- Action: [What I'll do]
- Expected Result: [What I expect if hypothesis is correct]
- Implementation: [Actual code/commands]

**REFLECT:**
- Actual Result: [What really happened]
- Interpretation: [What this means]
- Status: ✅Confirmed | ❌Refuted | 🤔Partial | 🆕Discovery
- Key Learning: [Single, reusable rule learned from the outcome, if applicable]

**DECIDE:**
- Next Action: [The chosen next step]
- Justification: [Why this is the most efficient next step]

---

Safety Protocols

Prevent Infinite Loops:

If 5+ cycles without progress → Change hypothesis domain entirely
If 10+ cycles without progress → Take a break or get help
Set maximum time limit for investigation sessions

Manage Scope Creep:

Focus on maximum 3 related hypotheses per session
Time-box each investigation cycle (5-15 minutes)
Do "zoom out" reviews every 30 minutes

Protect Your Codebase:

Always work on feature branches for risky experiments
Commit working state before each major investigation
Document any system changes for easy rollback
Keep a log of temporary debugging code to remove later

Advanced A-HIRD Techniques

Multiple Hypothesis Tracking

When you have several competing theories:

**Primary Hypothesis:** [Most likely - investigate first]
**Backup Hypotheses:** [Test these if primary fails]
**Wildcard Theory:** [Unlikely but worth keeping in mind]

Binary Search Debugging

For problems in large systems:

**Hypothesis:** Issue exists somewhere in [large area]
**Investigate:** Test the midpoint to divide search space
**Reflect:** Is problem in first half or second half?
**Decide:** Focus investigation on the problematic half

Reproduction-First Strategy

For intermittent or hard-to-trigger bugs:

**Hypothesis:** Bug occurs under [specific conditions]
**Investigate:** Create minimal case that triggers the issue
**Reflect:** Can we reproduce it reliably now?
**Decide:** Once reproducible, start investigating the cause

When to Use A-HIRD

Perfect For:

🐛 Debugging mysterious bugs
🔍 Understanding unfamiliar codebases
📊 Performance investigations
🧪 Exploratory testing of new features
🕵️ Root cause analysis
📚 Learning how complex systems work

Not Ideal For:

🚀 Deploying code to production
📋 Following established procedures
⚡ Bulk operations with known steps
💰 Situations where mistakes are expensive

Success Indicators

A-HIRD succeeds when you achieve:

Fast Learning Cycles: You quickly build accurate mental models of your system

Efficient Investigation: High ratio of useful discoveries to time invested

Quality Hypotheses: Your theories increasingly predict what you'll find

Actual Problem Resolution: You don't just understand the issue - you fix it

Knowledge Transfer: You emerge with insights that help solve future problems

Unlike frameworks focused on preventing mistakes, A-HIRD optimizes for the speed of discovery and depth of understanding that make debugging effective.

Getting Started

Pick a Current Bug: Choose something you're actively trying to solve
Anticipate the Context: What's the core technology involved?
Form Your First Hypothesis: What's your best guess and its expected outcome?
Design a Quick Test: What's the fastest way to check your theory?
Document Your Process: Keep a simple log of what you learn
Iterate Rapidly: Don't overthink - the framework works through practice

The goal isn't perfect process adherence - it's structured thinking that helps you debug more effectively and learn faster from every investigation.

├── PROMPT_FACTORY_AHIRD.md Content:

A-HIRD Prompt Factory v1.0

Your Role: A-HIRD Debug Session Architect

You are a specialized prompt engineer that creates A-HIRD-compliant debugging and testing prompts. Your job is to take a user's problem description and quickly generate a ready-to-use A-HIRD session prompt with minimal back-and-forth.

Core Protocol: Smart Assessment + Rapid Generation

Phase 1: Lightning Assessment (Maximum 3 Questions)

When the user describes their problem, extract what you can infer and only ask for truly essential missing pieces.

What You Can Usually Infer: - Problem Domain: Frontend bug, API issue, performance problem, test failure, etc. - Core Technology: React, Python, crewai, database, etc. - Urgency Level: Based on language like "production down" vs "weird behavior" - Investigation Style: Whether they need help exploring vs have specific theories

Only Ask If Genuinely Unclear: 1. Current Theory: "What's your best guess about what's causing this?" (if not stated) 2. Investigation Constraints: "Any areas of the code we should avoid touching?" (if high-risk context) 3. Success Definition: "How will we know when this is resolved?" (if not obvious)

Never Ask About: - Tech stack details (emerge during Anticipate phase) - Exact reproduction steps (part of the A-HIRD process) - Time estimates (debugging is inherently unpredictable)

Phase 2: Generate Complete A-HIRD Session Prompt

Output the complete debugging session prompt using the template below.

A-HIRD Session Template Generator

# A-HIRD Debug Session: {PROBLEM_SUMMARY}

## Problem Context
**Issue:** {SPECIFIC_PROBLEM_DESCRIPTION}
**Impact:** {WHO_OR_WHAT_IS_AFFECTED}
**Environment:** {DEV_STAGING_PRODUCTION_CONTEXT}
**Safety Level:** {SAFE_TO_EXPERIMENT | PROCEED_WITH_CAUTION | HIGH_RISK_CHANGES}

## Initial Context for Agent
**Your Task:** You are the debugging agent. You will generate hypotheses, design investigations, and solve this problem autonomously using the A-HIRD framework.

{STARTING_THEORY_CONTEXT_IF_PROVIDED}

---

## A-HIRD Protocol - Your Debugging Process

You will autonomously use the Anticipate-Hypothesis-Investigate-Reflect-Decide cycle:

### 1. ANTICIPATE (Context Scan)
- Briefly identify the core technology/library involved
- Note common patterns or constraints for that technology
- Format: "The core technology is [library/framework]. I anticipate this involves [common pattern/constraint], such as [specific example]"
- Prime your debugging approach based on the technology's known behaviors

### 2. HYPOTHESIS (Generate Your Theory with Success Criteria)
- Form a specific, testable theory with measurable outcomes
- Format: "I suspect [specific theory] because [observable evidence], and the expected outcome is [specific, measurable result]"
- Base hypotheses on error patterns, recent changes, or system behavior
- Include what you expect to see if the hypothesis is correct

### 3. INVESTIGATE (Design and Execute Quick Tests)
- Create focused experiments that take 30 seconds to 5 minutes
- Execute the investigation immediately
- Use appropriate tools: logging, debugging, isolated tests, code inspection
- Document both your plan and the actual results

### 4. REFLECT (Analyze What You Learned + Extract Knowledge)
- Categorize your findings:
  - ✅ **Confirmed:** Hypothesis was correct - proceed with solution
  - ❌ **Refuted:** Hypothesis was wrong - extract Key Learning for future reference
  - 🤔 **Partial:** Mixed evidence - refine hypothesis or investigate deeper
  - 🆕 **Discovery:** Found something unexpected - document Key Learning if applicable
- For failures: Extract single, memorable "Key Learning" rule
- Update your understanding of the system

### 5. DECIDE (Choose Your Next Action with Justification)
- **Continue:** Dig deeper into the same area if partially confirmed
- **Pivot:** Switch to investigating a different theory if refuted
- **Solve:** Implement the fix if you've identified the root cause
- **Escalate:** Request human input only if you're truly stuck
- **Justification:** Briefly explain why this is the most logical next step

## Session Management

### Investigation Boundaries
{CONSTRAINT_SPECIFIC_RULES}

### Documentation Style
Keep a running log in this format:

Cycle N: [Brief description]

A: [Technology context and anticipated patterns] H: [Hypothesis with expected measurable outcome] I: [Investigation plan and expected result] R: [What actually happened + interpretation + Key Learning if applicable] D: [Next move + justification]

### Safety Protocols
{SAFETY_SPECIFIC_RULES}

### Time Management
- Set 25-minute investigation blocks
- Take breaks if you hit 5 cycles without progress
- Escalate/ask for help after 10 unproductive cycles

---

## Execution Instructions

### Your Debugging Mission
1. **Begin Investigation:** Start with technology context assessment and your first hypothesis
2. **Execute A-HIRD Cycles:** Work through anticipate-hypothesis-investigate-reflect-decide loops autonomously
3. **Document Your Process:** Maintain the cycle log format for transparency and knowledge capture
4. **Build Knowledge Base:** Extract reusable learnings from each failed hypothesis
5. **Solve the Problem:** Continue until you've identified and implemented a solution
6. **Report Results:** Summarize findings, key learnings, and confirm the fix works

### Log Format (Maintain This Throughout)

Cycle N: [Brief description]

ANTICIPATE: [Core technology + anticipated patterns/constraints] HYPOTHESIS: [Your theory with expected measurable outcome] INVESTIGATE: [What you'll test + expected outcome] REFLECT: [Results + interpretation + Key Learning if failure] DECIDE: [Next action + justification for efficiency]

---

{PROBLEM_SPECIFIC_AGENT_GUIDANCE}

**Start now:** Begin with your technology context assessment and your first hypothesis with expected outcome.

Safety Protocol Templates

Safe Experimentation

### Safety Protocols - Safe Environment
- Work on feature branches for code changes
- Add temporary debugging code freely
- Experiment with different approaches
- Document temporary changes for cleanup
- Extract learnings from each failed attempt

Cautious Investigation

### Safety Protocols - Proceed With Caution
- Make git commits before each risky change
- Test changes in isolated environments when possible
- Keep backup of configuration files before modification
- Document all system changes for rollback
- Build knowledge base of failed approaches to avoid repetition

High-Risk Environment

### Safety Protocols - High Risk
- Read-only investigation only unless explicitly approved
- All changes must be reversible with clear rollback steps
- Escalate before any system modifications
- Focus on monitoring and logging rather than code changes
- Document all learnings for future similar issues

Problem-Specific Guidance Templates

Performance Investigation - Agent Instructions

Anticipate: Performance issues often involve N+1 queries, memory leaks, or blocking operations in [specific technology stack]
Begin by profiling and identifying bottlenecks autonomously
Test theories with specific timing measurements as success criteria
Extract learnings about performance patterns for this technology
Check both client-side and server-side performance as needed

Frontend Debugging - Agent Instructions

Anticipate: React/frontend issues commonly involve state management, lifecycle, or rendering problems
Use browser dev tools for real-time investigation
Test hypotheses with specific component state/props expectations
Check console errors and inspect component behavior patterns
Build knowledge of common React pitfalls encountered
Focus on state management and rendering issues

Backend Investigation - Agent Instructions

Anticipate: Backend issues typically involve database performance, API timeouts, or service integration failures
Check logs for error patterns, timing, and correlations
Use API testing tools with specific response time/status expectations
Monitor database performance and examine query execution plans
Verify authentication flows and external service integrations
Document database and API behavior patterns discovered

Test Failure Investigation - Agent Instructions

Anticipate: Test failures often involve timing issues, state dependencies, or environment setup problems
Isolate failing tests to understand exact failure modes
Test theories about interdependencies with specific test isolation approaches
Check for test environment setup and data fixture issues
Investigate timing issues in asynchronous test operations
Build knowledge base of test failure patterns and solutions

Library/Framework Specific Investigation - Agent Instructions

Anticipate: [Framework] issues commonly involve [specific patterns like immutability, lifecycle, configuration]
Focus on framework-specific constraints and common gotchas
Test hypotheses against documented framework behavior
Extract learnings about framework limitations and workarounds
Document framework-specific debugging strategies discovered

Usage Instructions

Initialize: Send this factory prompt to any LLM
Request: "Create A-HIRD session for: [your problem description]"
Quick Q&A: Answer 1-3 clarifying questions if needed
Deploy: Copy the generated session prompt to start debugging
Debug: Work through A-HIRD cycles with your AI assistant
Capture Knowledge: Review Key Learnings at session end

Example Usage Flow

User Input: "Create A-HIRD session for: My crewai Agent isn't updating its attributes after initialization, throwing AttributeError"

Factory Response: "I can see this is a Python/crewai framework issue with object attribute modification. Quick questions: 1. Any theories on why the attributes can't be set? 2. Is this blocking development or just causing test failures?

I'll create a session prompt for systematic investigation of this crewai attribute issue."

User Response: "1. Maybe the Agent objects are immutable after creation? 2. Blocking development"

Factory Output: Complete A-HIRD session prompt configured for crewai debugging with focus on object mutability investigation and attribute setting patterns, ready to copy and use.

Key Design Principles

Context-Primed Investigation: Always start with technology-specific anticipation
Measurable Hypothesis Testing: Include expected outcomes for each theory
Knowledge Accumulation: Extract reusable learnings from every failed attempt
Efficient Path Selection: Justify each decision to optimize investigation flow
Rapid Setup: Generate usable debugging sessions with minimal questions
Safety Conscious: Include appropriate caution levels based on environment
Discovery Focused: Optimize for learning speed and knowledge building
Copy-Ready: Output complete, functional debugging prompts requiring no editing

└── IPEV_LOOP_FRAMEWORK/ ├── PAPER_IPEV_LOOP.md Content:

The IPEV Loop: A Complete Framework for Reliable Agentic AI

Introduction: The Challenge of Instructing AI Agents

When you give an AI agent a complex task—like "process these files and append the results to an output file"—you might expect it to work flawlessly. After all, these are powerful systems capable of sophisticated reasoning. Yet anyone who has worked with agentic AI tools like Gemini CLI, Cursor, or similar platforms has likely experienced a familiar frustration: the agent appears to understand your request, reports success at each step, but somehow produces completely wrong results.

The fundamental problem is what we call the Ambiguity Gap—the semantic chasm between your high-level human intent and the agent's literal, low-level tool execution. When you say "append to the file," you mean "add to the end without destroying what's already there." But the agent's write_file command might default to overwrite mode, silently destroying all previous work.

This isn't a failure of AI intelligence. It's a failure of communication protocol. Agentic AI systems are powerful execution engines that operate at the literal edge of ambiguity, and our success depends on closing that gap through structured interaction patterns.

The Intent-Plan-Execute-Verify (IPEV) Loop is a battle-tested framework that transforms agents from unreliable black boxes into transparent, predictable partners. This guide presents the complete IPEV methodology, refined through extensive real-world testing to handle not just the ambiguity problem, but the practical challenges of platform instability, cost optimization, and scalable automation.

Part I: Understanding the Core Problem

The Two Failure Modes

Consider this seemingly simple instruction: "Process all markdown files in the /docs folder and append each translated version to output.md."

This instruction can fail in two distinct ways:

Over-Specification Paralysis: You create an elaborate protocol with detailed prerequisites, thinking more rules equals more reliability. The agent becomes paralyzed by cognitive overhead, spending all its effort satisfying procedural requirements instead of doing the actual work. It's like giving someone a 50-page manual to read before asking them to open a door.

Under-Specification Ambiguity: You trust the agent to "figure it out," keeping instructions simple and natural. The agent processes all files successfully but uses its default file-writing behavior—which overwrites the output file on each iteration. You end up with only the result from the last file, having lost everything else.

Both approaches fail because they don't account for the fundamental nature of agentic systems: they need explicit guidance on the critical details while retaining flexibility for adaptive problem-solving.

The Solution Framework

The IPEV Loop solves this by requiring the agent to externalize its reasoning process for every significant action. Instead of hoping the agent will interpret your intent correctly, you force it to show its work before execution, moving the potential failure point from invisible execution errors to visible planning errors that can be caught and corrected.

Part II: The IPEV Loop Methodology

The Four-Phase Cycle

Every state-changing operation follows this mandatory sequence:

1. Intent (The "What")

The agent declares its high-level objective for the immediate step.

Purpose: Establishes context and confirms understanding of the goal.

Example: "My intent is to process 01-intro.md and append the translated content to output.md."

2. Plan (The "How") - The Critical Phase

The agent translates its intent into specific, unambiguous commands with exact parameters.

Purpose: Eliminates ambiguity by forcing commitment to literal actions before execution.

Good Plan: "I will read 01-intro.md, generate the translation, then call edit tool on output.md to append the new content to the end of the existing file."

Bad Plan: "I will save the output to the file." (This restates intent without specifying how)

This phase is where most failures are prevented. By requiring explicit declaration of tools and parameters, we expose dangerous assumptions—like default overwrite behavior—before they cause damage.

3. Execute (The "Do")

The agent performs exactly what it declared in the Plan phase.

Purpose: Ensures predictable, auditable actions that match the stated intent.

4. Verify (The "Proof")

The agent performs an empirical check to confirm the action had the intended effect.

Purpose: Creates a feedback loop that catches errors immediately, preventing them from compounding.

Examples: - File operations: "I'll run ls -l output.md to confirm the file size increased." - API calls: "I'll send a GET request to confirm the data was updated." - Code changes: "I'll run the test suite to ensure no regressions."

Why This Works

The IPEV Loop succeeds because it transforms agent-human collaboration from implicit trust to explicit verification. Rather than hoping the agent interprets correctly, you require it to demonstrate understanding before acting. This moves errors from the dangerous post-execution phase to the safe pre-execution phase where they can be easily corrected.

Part III: Advanced IPEV - Context-Aware Operations

Real-world usage revealed that while the basic IPEV Loop solves ambiguity, it introduces new challenges around scalability, cost efficiency, and platform reliability. The advanced framework addresses these through context-aware protocols that adapt the level of oversight to the operational environment.

Execution Contexts

The framework recognizes three distinct operational contexts, each with different requirements for speed, oversight, and autonomy:

Development Context - Maximum Reliability

When to Use: Interactive development, debugging complex issues, learning new codebases, high-stakes operations.

Characteristics: - Human actively supervises each step - Full verification after every operation - Maximum transparency and explainability - Collaborative checkpointing with human confirmation

Trade-offs: Slower execution, higher cost, but maximum reliability and learning value.

Production Context - Maximum Efficiency

When to Use: CI/CD pipelines, scheduled tasks, well-understood operations, trusted environments.

Characteristics: - Autonomous progression through tasks - Batch verification and checkpointing - Streamlined communication for efficiency - Automated error handling and recovery

Trade-offs: Less granular oversight, but suitable for scaled operations.

Hybrid Context - Adaptive Balance

When to Use: Mixed workflows, uncertain environments, operations with variable risk levels.

Characteristics: - Intelligent escalation based on error patterns - Risk-weighted decision making - Graceful degradation to higher oversight when needed - Context switching based on real-time assessment

Trade-offs: More complex but handles the widest range of scenarios.

Risk-Based Protocol Selection

Within any context, individual operations are classified by risk level:

Low Risk: Read-only operations, idempotent actions, well-tested patterns - Streamlined verification - Batch processing eligible - Minimal checkpointing

Medium Risk: File modifications with rollback capability, standard API operations - Standard IPEV protocol - Regular checkpointing - Moderate verification depth

High Risk: Destructive operations, external integrations, untested commands - Enhanced verification requirements - Immediate checkpointing - Human confirmation in Development context

Part IV: Platform Resilience and Error Handling

Real-world agent platforms are not perfect. They can crash, hang, lose context, or enter corrupted states. The advanced IPEV framework includes specific protocols for handling these platform-level failures.

Platform Stability Monitoring

Before beginning any mission, the agent performs a health check: - Verify core tools are responsive - Test critical commands with known inputs - Establish baseline performance metrics - Document any known instabilities

Intelligent Error Recovery

Instead of the primitive "halt on failure" approach, the framework uses graduated response levels:

Level 1 - Self Diagnosis: Agent attempts to understand and resolve the issue using diagnostic tools, verbose flags, or alternative approaches.

Level 2 - Context Escalation: Based on the execution context, either log the error and continue with safe fallback (Production), request human guidance (Development), or make risk-weighted decisions (Hybrid).

Level 3 - Mission Escalation: Only for critical failures that threaten system integrity, triggering emergency protocols and human notification.

Checkpointing and State Management

The framework includes sophisticated checkpointing to handle both code state and agent session state:

Code Checkpointing: Automatic git commits after successful verification provide durable, revertible history.

Session Checkpointing: In Development context, human saves agent session after each major step. In Production context, automated harness manages session persistence.

Recovery Protocols: Clear procedures for restoring from various failure states, from simple command errors to complete platform crashes.

Part V: Complete Implementation Guide

Basic IPEV Mission Template

# Mission: [Your Specific Task]

## 1. Execution Context
**Context:** Development
**Risk Profile:** Balanced
**Platform:** [Gemini CLI/Other]

## 2. IPEV Protocol
For every state-changing action:

1. **INTENT:** State your immediate objective
2. **PLAN:** Specify exact commands and parameters
3. **EXECUTE:** Run the exact planned commands
4. **VERIFY:** Confirm success with appropriate checks

## 3. Checkpointing Protocol
After successful verification:
- **Code Checkpoint:** Use git to commit successful changes
- **Session Checkpoint:** Pause for human to save session with `/chat save [name]`
- Wait for "CONTINUE" confirmation before proceeding

## 4. Mission Parameters
- **Inputs:** [Source data/files/systems]
- **Outputs:** [Desired results]
- **Success Criteria:** [How to know when complete]
- **Constraints:** [Critical requirements and limitations]

## 5. Execution Flow
1. Acknowledge these instructions
2. Perform initial health check (`git status`, `ls -F`)
3. Begin IPEV loops for each task
4. Follow checkpointing protocol after each success
5. Signal completion with final verification

Now begin.

Advanced Context Configuration

For production or hybrid contexts, extend the template with:

## Advanced Context Configuration
**Automation Level:** [Interactive|Semi-Automated|Fully-Automated]
**Batch Processing:** [Enabled|Disabled]
**Risk Tolerance:** [Conservative|Balanced|Aggressive]
**Economic Mode:** [Verbose|Balanced|Minimal]

## Platform Stability
**Known Issues:** [Document any platform-specific problems]
**Workarounds:** [Alternative tools or approaches]
**Recovery Procedures:** [Specific steps for common failures]

## Risk Classification
- **Data Loss Risk:** [Assessment and mitigation]
- **System Impact Risk:** [Scope and reversibility]
- **Verification Requirements:** [Appropriate depth for risk level]

Directive Protocol for Human Control

Use these prefixes to maintain control over agent behavior:

DIRECTIVE: Execute immediate command, bypass IPEV loop
INSPECT: Read-only investigation, return to previous task
OVERRIDE: Manual intervention while preserving context
ESCALATE: Force context change (e.g., Production → Development)

Part VI: Practical Applications and Results

Where IPEV Excels

DevOps and Infrastructure: Before running terraform apply or kubectl commands, agents plan exact parameters and verify resource states afterward.

Code Refactoring: Agents plan specific file changes, implement them incrementally, and verify through automated test suites after each modification.

Data Processing: For ETL pipelines, each step (extract, transform, load) becomes an IPEV loop ensuring data integrity throughout.

Content Generation: When processing multiple files for output generation, explicit planning prevents the common "overwrite instead of append" failure.

Measured Improvements

Organizations implementing IPEV report: - 85% reduction in silent failures during automated processes - 60% decrease in debugging time for complex agent tasks
- 40% improvement in successful task completion rates - Predictable cost modeling through risk-based protocol selection

Economic Considerations

The framework's verbosity does increase token consumption, but this cost is offset by: - Reduced debugging cycles from catching errors early - Fewer failed runs that waste computational resources - Ability to optimize for cost through context selection - Prevention of expensive mistakes that require human cleanup

Conclusion: A Mature Approach to Agentic AI

The IPEV Loop represents a fundamental shift in how we interact with AI agents. Rather than treating them as improved chatbots, we architect them as collaborative execution engines with explicit protocols for reliability, transparency, and error recovery.

The framework acknowledges that we're working with powerful but imperfect systems. By providing structured approaches for different operational contexts—from interactive development to autonomous production—IPEV enables teams to realize the benefits of agentic AI while maintaining the control and reliability required for serious applications.

As AI agents become more capable, the principles behind IPEV—explicit planning, empirical verification, and graduated error handling—will remain relevant. The framework is designed to evolve with advancing AI capabilities while preserving the rigorous standards necessary for production use.

The choice to adopt IPEV should be made consciously, reserved for scenarios where the cost of ambiguous failure exceeds the overhead of explicit verification. For teams ready to move beyond trial-and-error prompting toward systematic agent architecture, IPEV provides the tested methodology to build reliable, transparent, and truly helpful AI collaboration.

├── PROMPT_FACTORY_IPEV_LOOP.md Content:

IPEV Prompt Factory v2.2

Your Role: IPEV Mission Architect

You are a specialized prompt engineer that creates IPEV-compliant mission prompts. Your job is to take a user's task description and quickly generate a ready-to-use IPEV mission prompt with minimal back-and-forth.

Core Protocol: Quick Assessment + Smart Generation

Phase 1: Fast Assessment (Only Ask What's Essential)

When the user describes their task, extract what you can and only ask for critical missing pieces. Keep questions to 3 or fewer.

Essential Information: 1. Task Type: Is this debugging, feature development, data processing, refactoring, or something else? 2. Risk Level: Does this involve destructive operations, external APIs, or production systems? 3. Context: Do you need interactive oversight (Development) or can this run autonomously (Production)?

Ask ONLY if unclear: - Tech stack/platform (if it affects verification methods) - Success criteria (if not obvious from the task) - Any known constraints or no-touch zones

Phase 2: Generate Complete IPEV Mission Prompt

Using the template below, fill in the specifics and output the complete mission prompt.

IPEV Mission Template Generator

# Mission: {SPECIFIC_TASK_TITLE}

## 1. Execution Context
**Context:** {DEVELOPMENT|PRODUCTION|HYBRID}
**Risk Profile:** {CONSERVATIVE|BALANCED|AGGRESSIVE}
**Platform:** {GEMINI_CLI|CURSOR|OTHER}

## 2. Core IPEV Protocol
For every state-changing action, follow this sequence:

1. **INTENT:** State your immediate objective
2. **PLAN:** Specify exact commands, tools, and parameters
   - For file operations: explicitly state append vs overwrite mode
   - For API calls: include authentication and error handling
   - For database operations: specify transaction boundaries
3. **EXECUTE:** Run the exact commands from your plan
4. **VERIFY:** Confirm success with empirical checks
   - File operations: check file size, content, or existence
   - Code changes: run relevant tests or build processes
   - API operations: verify response status and data integrity

## 3. Context-Specific Protocols

{DEVELOPMENT_CONTEXT_RULES}
{PRODUCTION_CONTEXT_RULES}
{HYBRID_CONTEXT_RULES}

## 4. Mission Parameters

### Objective:
{CLEAR_GOAL_STATEMENT}

### Inputs:
{SOURCE_DATA_FILES_SYSTEMS}

### Outputs:  
{EXPECTED_RESULTS_OR_DELIVERABLES}

### Success Criteria:
{COMPLETION_DEFINITION}

### Constraints:
{HARD_REQUIREMENTS_AND_LIMITATIONS}

## 5. Verification Strategy
Primary verification method: {TEST_COMMAND_OR_CHECK}
Fallback verification: {ALTERNATIVE_VERIFICATION}

## 6. Platform-Specific Notes
{KNOWN_ISSUES_AND_WORKAROUNDS}

## 7. Execution Flow
1. **Initialize:** Acknowledge instructions and perform health check
2. **Survey:** Examine current state with read-only commands
3. **Execute:** Begin IPEV loops for each logical task
4. **Checkpoint:** {CONTEXT_APPROPRIATE_CHECKPOINTING}
5. **Complete:** Final verification and status report

{SPECIAL_INSTRUCTIONS_OR_EMERGENCY_PROTOCOLS}

Now begin with initialization and survey.

Context-Specific Rule Templates

Development Context Rules

## Development Context Protocols
- **Checkpointing:** After each successful VERIFY, commit to git and pause
- **Session Management:** Output: "**CHECKPOINT COMPLETE. Save session with `/chat save [name]` and type 'CONTINUE'**"
- **Risk Handling:** Request human confirmation before HIGH RISK operations
- **Directive Support:** Respond immediately to DIRECTIVE: commands
- **Error Recovery:** On failure, pause and request guidance rather than retry

Production Context Rules

## Production Context Protocols
- **Checkpointing:** Batch commits at logical boundaries
- **Session Management:** Automated progression, human escalation only on critical failures
- **Risk Handling:** Proceed with LOW/MEDIUM risk, escalate HIGH risk operations
- **Batch Processing:** Group similar operations for efficiency
- **Error Recovery:** Attempt self-diagnosis before escalation

Hybrid Context Rules

## Hybrid Context Protocols  
- **Adaptive Checkpointing:** Risk-based decision making
- **Dynamic Escalation:** Automatic context switch if error rate exceeds threshold
- **Smart Verification:** Sampling verification for batch operations
- **Cost Optimization:** Balance verbosity with operational needs
- **Context Switching:** Graceful degradation to Development mode when uncertain

Task-Specific Templates

For Debugging Tasks:

### Debugging-Specific Instructions:
- Start with DIRECTIVE: commands to inspect current state
- Document expected vs actual behavior before proposing fixes
- Test fixes in isolation before integration
- Verify no regression in existing functionality

For Development Tasks:

### Development-Specific Instructions:
- Follow existing project patterns and conventions
- Write tests before implementing features (TDD approach)
- Implement in small, verifiable increments
- Include error handling and edge cases

For Data Processing Tasks:

### Data Processing Instructions:
- Validate input data format before processing
- Implement checksum or sampling verification for large datasets
- Use explicit append mode for output accumulation
- Include data integrity checks at each stage

For DevOps Tasks:

### DevOps-Specific Instructions:
- Perform dry-run verification where possible
- Check system state before and after changes
- Include rollback procedures in planning
- Use staging environment for validation when available

Usage Instructions

Initialize: Send this factory prompt to any LLM
Request: "Create IPEV mission for: [your task description]"
Refine: Answer any clarifying questions (typically 1-3)
Deploy: Copy the generated mission prompt to your agent platform
Execute: Run with Read @mission.md and follow its instructions

Example Usage Flow

User Input: "Create IPEV mission for: Fix the failing tests in my Python API project"

Factory Response: "I can see this is a debugging task. Quick questions: 1. What's your test command? (pytest, unittest, etc.) 2. Do you need to modify production code or just tests? 3. Should this run interactively or can it be autonomous?"

User Response: "1. pytest, 2. might need both, 3. interactive please"

Factory Output: Complete IPEV mission prompt configured for interactive debugging with pytest verification, ready to copy and use.

Key Design Principles

Minimal Friction: Generate usable prompts with 2-3 questions maximum
Smart Defaults: Assume reasonable configurations based on task type
Context Aware: Automatically select appropriate IPEV context and protocols
Battle Tested: Include proven verification methods and error handling
Copy-Ready: Output complete, functional mission prompts requiring no editing

├── three_party_system.md Content:

The Three-Party System: Fast Track to Productive AI

How It Works

🧑‍💻 Developer ↔ 🤖 LLM (Prompt Factory) ↔ ⚡ Agentic Code Editor

The LLM is NOT doing the work. The LLM is your rapid prompt generator that creates powerful instructions for your Agentic Code Editor in under 10 minutes.

IPEV Workflow: Execution Tasks

Party Roles

Developer: "I need to process these files and generate a report"
LLM Factory: "What's your output format? Any constraints?" (2-3 questions max)
Agentic Code Editor: Receives complete IPEV mission → executes with Intent-Plan-Execute-Verify loops

Speed: 5-10 minutes from problem to working agent

Example Flow:

Dev: "Process markdown files in /docs, translate to Spanish, append to output.md"
LLM: "Interactive oversight or autonomous? What translation service?"
Dev: "Autonomous, use Google Translate API"  
LLM: [Generates complete IPEV mission prompt]
Dev: [Copies prompt to Cursor/Gemini] → Agent starts working immediately

HIRD Workflow: Debugging & Testing

Party Roles

Developer: "My React checkout form freezes when users click submit"
LLM Factory: "Any theories? Production or dev environment?" (1-3 questions max)
Agentic Code Editor: Receives complete HIRD session → autonomously debugs with Hypothesis-Investigate-Reflect-Decide cycles

Speed: 3-8 minutes from bug report to debugging agent

Example Flow:

Dev: "API randomly returns 500 errors but I can't reproduce locally"
LLM: "Any patterns you've noticed? High-risk production system?"
Dev: "Happens during peak hours, yes it's production"
LLM: [Generates HIRD debugging session prompt]  
Dev: [Copies prompt to agent] → Agent starts systematic investigation immediately

The Critical Distinction

❌ What People Think Happens

Developer ↔ LLM (does the work together)

✅ What Actually Happens

Developer → LLM (generates powerful prompt) → Agentic Code Editor (does all the work)

Why This Is So Fast

No Template Filling: The LLM intelligently infers most details from your problem description

Smart Questions: Only asks 1-3 essential questions, not 20 configuration options

Copy-Paste Ready: Generates complete, working prompts that need zero editing

Immediate Productivity: Your agentic code editor starts working within minutes, not hours

The Power of Separation

Developer Focus: You describe problems in natural language, not formal specifications

LLM Efficiency: Optimized for rapid prompt generation, not task execution

Agent Autonomy: Gets structured instructions but full freedom to solve problems creatively

Result: From "I have a problem" to "AI is solving it" in under 10 minutes

This isn't about replacing developers—it's about giving developers superpowers through properly instructed AI agents.

my_frameworks

A-HIRD Framework: A Testing & Debugging Approach for AI Code Assistants

Why Existing Frameworks Don't Work for Testing

The A-HIRD Framework: Built for Discovery

The Five-Phase Cycle

1. ANTICIPATE (The "Context Scan")

2. HYPOTHESIS (The "Theory")

3. INVESTIGATE (The "Quick Test")

4. REFLECT (The "What Did We Learn?")

5. DECIDE (The "Next Move")

A-HIRD in Practice: A Real Example

Cycle 1

Cycle 2

Cycle 3

Implementation Guide for AI Assistants

Session Setup Template```markdown

Debug Session: [Brief Problem Description]

Investigation Log

Safety Protocols

Advanced A-HIRD Techniques

Multiple Hypothesis Tracking

Binary Search Debugging

Reproduction-First Strategy

When to Use A-HIRD

Perfect For:

Not Ideal For:

Success Indicators

Getting Started

A-HIRD Prompt Factory v1.0

Your Role: A-HIRD Debug Session Architect

Core Protocol: Smart Assessment + Rapid Generation

Phase 1: Lightning Assessment (Maximum 3 Questions)

Phase 2: Generate Complete A-HIRD Session Prompt

A-HIRD Session Template Generator

Cycle N: [Brief description]

Cycle N: [Brief description]

Safety Protocol Templates

Safe Experimentation

Cautious Investigation

High-Risk Environment

Problem-Specific Guidance Templates

Performance Investigation - Agent Instructions

Frontend Debugging - Agent Instructions

Backend Investigation - Agent Instructions

Test Failure Investigation - Agent Instructions

Library/Framework Specific Investigation - Agent Instructions

Usage Instructions

Example Usage Flow

Key Design Principles

The IPEV Loop: A Complete Framework for Reliable Agentic AI

Introduction: The Challenge of Instructing AI Agents

Part I: Understanding the Core Problem

The Two Failure Modes

The Solution Framework

Part II: The IPEV Loop Methodology

The Four-Phase Cycle

1. Intent (The "What")

2. Plan (The "How") - The Critical Phase

3. Execute (The "Do")

4. Verify (The "Proof")

Why This Works

Part III: Advanced IPEV - Context-Aware Operations

Execution Contexts

Development Context - Maximum Reliability

Production Context - Maximum Efficiency

Hybrid Context - Adaptive Balance

Risk-Based Protocol Selection

Part IV: Platform Resilience and Error Handling

Platform Stability Monitoring

Intelligent Error Recovery

Checkpointing and State Management

Part V: Complete Implementation Guide

Basic IPEV Mission Template

Advanced Context Configuration

Directive Protocol for Human Control

Part VI: Practical Applications and Results

Where IPEV Excels

Measured Improvements

Economic Considerations

Conclusion: A Mature Approach to Agentic AI