βββ IPEV_LOOP.md Content:
The Developer's Guide to Mastering Agentic LLMs: From Ambiguity to Reliability
Introduction: The Two-Week Failure That Led to a Breakthrough
If you've tried using an Agentic LLM like Gemini CLI for a complex, multi-step task, you may have felt a familiar frustration. You give it a clear goal, a list of files, and what seems like a simple instructionβ"process these files and append the results to an output file"βonly to watch it fail in baffling ways.
Perhaps it gets stuck in a logic loop, refusing to start the work because of an overly rigid protocol you designed. Or, worse, it starts the work, reports success after every step, but you later discover it was overwriting your output file on each iteration, leaving you with only the last piece of the puzzle.
This isn't a hypothetical scenario. It was the real-world, two-week struggle that led to the framework in this guide. The initial conclusion was that the tool was "no good," but the reality was more nuanced: the mental model for instructing these agents was wrong.
Agentic LLMs are not just chatbots with access to a terminal. They are powerful execution engines that operate at the literal edge of ambiguity. Our success hinges on our ability to close the gap between our high-level human intent and the agent's low-level, literal tool execution.
This guide provides a durable strategy to do just that. It introduces the Intent-Plan-Execute-Verify (IPEV) loop, a design pattern that transforms agents from unreliable black boxes into transparent, predictable, and self-correcting partners.
The Core Challenge: The Ambiguity Gap
The fundamental reason that simple prompts fail for stateful tasks (like file I/O, database changes, or API calls) is the Ambiguity Gap.
Let's analyze the two failure modes from our foundational example:
-
The Over-Constrained Prompt (Brittle Rigidity): The first attempt involved a highly detailed, multi-file prompt with a mandatory "Environment Grounding Protocol."
- Intent: To eliminate any possible misinterpretation of the environment (OS, paths).
- Result: The agent became paralyzed. It couldn't satisfy the rigid, brittle prerequisites, and the cognitive overhead of the protocol prevented it from ever starting the actual task. It was like giving a chef a 100-page safety manual to read before boiling water.
- Lesson: Over-constraining an agent with rigid, procedural rules makes it fragile. It removes the agent's ability to use its own intelligence to adapt and solve problems.
-
The Under-Specified Prompt (Implicit Trust): The second attempt simplified the instructions, trusting the agent to understand the core task.
- Intent: To remove the procedural roadblocks and focus on the primary goal.
- Result: The agent successfully processed all the files but failed on the most critical detail. The human instruction "append to the file" was conceptually understood, but the agent's default
write_filetool executed an overwrite operation. The final output contained only the result of the last operation. - Lesson: Never assume an agent's tool execution will perfectly match your high-level intent. The ambiguity of natural language is the primary source of critical, silent failures.
These failures reveal a universal truth: for any task that changes the state of a system, you cannot afford ambiguity. The solution is not more rules or more trust, but a better operational protocol.
The Solution: The Intent-Plan-Execute-Verify (IPEV) Loop
The IPEV loop is a simple but powerful framework for structuring your instructions. It forces the agent to make its reasoning and execution strategy explicit before taking any action, turning a potential failure into a transparent, correctable step.
It consists of four phases for every significant action the agent takes:
1. Intent (The "What")
This is the high-level objective. It's where you define the goal for a specific step in the workflow. This is what most prompts already do well.
- Purpose: To set the context and desired outcome for the agent.
- Example:
"My intent is to process the source file '01-intro.md' and append the translated content to 'output.md'."
2. Plan (The "How")
This is the heart of the IPEV loop and the single most important addition to your prompting strategy. Before acting, you require the agent to translate its high-level intent into a low-level, unambiguous execution plan. This plan must specify the exact tool, command, and parameters it will use.
- Purpose: To close the Ambiguity Gap. It forces the agent to show its work and commit to a specific, literal action, exposing any potential misinterpretations before they cause harm.
- Good Plan (Unambiguous):
"PLAN: I will read the content of '01-intro.md'. After generation, I will append the result to 'output.md' by calling the Pythonwrite_filetool with themodeparameter set to'a'." - Bad Plan (Ambiguous):
"PLAN: I will save the output to the file."(This is just a restatement of the intent and doesn't specify how).
By demanding a plan, you move the potential point of failure from a silent execution error to a transparent planning error, which is easily caught and corrected.
3. Execute (The "Do")
This step is straightforward: the agent executes the exact plan it just declared.
- Purpose: To perform the state-changing action in a predictable way.
- Instruction:
"Now, execute the plan you have stated."
4. Verify (The "Proof")
After execution, the agent must perform a check to confirm that the action had the intended effect. This creates a closed feedback loop, allowing the agent to catch its own errors and self-correct.
- Purpose: To confirm success and detect failure immediately. This prevents errors from compounding.
- Good Verification Steps:
- File I/O:
"VERIFY: I will now use the shell tool to runls -l output.mdand confirm its file size has increased since the last step." - API Call:
"VERIFY: I will now send a GET request to the/users/123endpoint and confirm the response contains the updated user data." - Database:
"VERIFY: I will now execute aSELECT COUNT(*)query on theproductstable to confirm a new row was added."
If the verification step fails, the agent knows its plan or its tool failed, and it can halt or move to a pre-defined contingency plan.
Putting It All Together: The IPEV Prompt Template
Here is a general-purpose template you can adapt for your own agentic workflows.
# Mission: [Your High-Level Goal]
## 1. Core Protocol: The IPEV Loop
For every state-changing action in this mission, you MUST follow the Intent-Plan-Execute-Verify loop. Do not deviate.
1. **INTENT:** State your immediate objective.
2. **PLAN:** Propose the precise, low-level command or tool call you will use. This plan must be unambiguous. For file writing, you must specify the mode (e.g., 'append' vs. 'overwrite').
3. **EXECUTE:** Run the exact command from your plan.
4. **VERIFY:** After execution, perform a check to prove the operation was successful. If verification fails, you must report the failure and HALT.
## 2. Mission Parameters
- **Input(s):** [Describe your source data, files, APIs, etc.]
- **Output(s):** [Describe the desired final state, output files, etc.]
- **Critical Constraints:** [List any "hard rules," like "never read from the output file" or "all API calls must include an auth header."]
## 3. Execution Flow
1. Acknowledge these instructions.
2. Begin the IPEV loop for the first task.
3. Continue the loop for all subsequent tasks until the mission is complete.
4. Signal completion.
Now, begin.
Beyond Files: Where to Use the IPEV Loop
The power of this pattern is its versatility. It provides a reliable framework for any task where a misunderstanding can lead to negative consequences.
- DevOps & Cloud Management: Before running a
terraform applyor akubectlcommand, force the agent to PLAN the exact command and VERIFY the state of the resources afterward. - Code Refactoring: Have the agent PLAN which files it will modify and what changes it will make, then EXECUTE the changes, and finally VERIFY by running the project's test suite.
- Data Analysis & ETL: For a pipeline that reads from a source, transforms data, and loads it into a destination, each step can be an IPEV loop to ensure data integrity.
- Automated Testing: Use IPEV to interact with a web UI. PLAN the locator and action (e.g., "click the button with
id='submit'"), EXECUTE the click, and VERIFY the expected outcome (e.g., "confirm the URL has changed to/dashboard").
Conclusion: From Prompt Engineer to Agent Architect
Working with Agentic LLMs requires a mental shift. We are no longer just "prompting" a model for a text or code completion. We are architecting autonomous systems that interact with the real world.
Our role is to design the operational protocols, the safety checks, and the feedback loops that allow these powerful agents to work reliably and predictably. The IPEV loop is a foundational pattern in this new discipline. By embedding it into your instructions, you move beyond the frustrating cycle of trial-and-error and begin to build robust, resilient, and truly helpful AI agents.
βββ critique.md Content:
A Formal Critique of the IPEV Loop Framework
Thesis: The IPEV Loop, as originally conceived, is a highly effective framework for solving the problem of agent ambiguity in a stable environment. However, our real-world testing has revealed that it is not sufficiently equipped to handle tool instability and state corruption, which are prevalent in bleeding-edge agentic systems. The rewrite should focus on evolving the framework from a "happy path" protocol into a resilient, fault-tolerant system.
Critique 1: The "Brittle Halt" on Verification Failure
- What the Paper Says: The protocol's primary safety mechanism is the rule: "If verification fails, you must report the failure and HALT."
- What Actually Happened: The
pytestverification step hung indefinitely. The agent's only recourse was to "Attempt Reiteration" or be cancelled by the user. The "HALT" command provided no path forward when the verification process itself was the source of the bug. - Objective Critique: The "HALT" command is a primitive, not a strategy. It treats verification failure as a monolithic, unrecoverable event. The framework lacks a "meta-debugging" protocol for situations where the agent needs to debug its own tools or verification steps. It's like a programmer whose only debugging tool is to stop the program.
- Recommendation for Rewrite:
- Reframe the
VERIFYstep. It's not just a pass/fail check; it's a potential point of failure that requires its own diagnostic sub-protocol. - Introduce the concept of a "Diagnostic Mode" or a "Meta-Debugging Loop." If a
VERIFYstep fails repeatedly, the agent's mission should pivot: its new goal is to diagnose and fix the verification process itself. - The paper should provide concrete examples of diagnostic steps, such as isolating the failing component (running a single test file) and instrumenting the command for more data (using flags like
-vand--timeout).
Critique 2: The Lack of a Control Channel for Meta-Commands
- What the Paper Says: The framework is designed for "state-changing actions" within a mission. It implicitly assumes all user prompts are inputs to the IPEV loop.
- What Actually Happened: When you issued a simple, non-state-changing command ("document the problem"), the agent correctly executed it but then, due to its rigid adherence to the protocol, incorrectly followed up with a
VERIFYstep (pytest) that was not part of your intent. - Objective Critique: The framework conflates "mission commands" with "user commands." It lacks a separate, prioritized "control channel" for the user to inspect state, override behavior, or perform actions that should not trigger the full IPEV loop. This makes the agent feel disobedient when it is actually being overly obedient.
- Recommendation for Rewrite:
- Formalize the concept of a "Directive" or an "Override Command."
- The paper should establish a rule: "If a prompt is prefixed with
DIRECTIVE:, the agent MUST execute only that command and MUST NOT proceed to aVERIFYstep unless explicitly told to." - This introduces a crucial layer of user control, allowing the developer to step outside the formal loop when necessary without having to abandon the session.
Critique 3: The "Stateless Agent" Assumption
- What the Paper Says: The IPEV loop is focused on verifying the state of the external world (files, APIs, databases). It does not address the internal state of the agent or its host tool.
- What Actually Happened: Repeated cancellations of the
pytestcommand corrupted the Gemini CLI's internal chat history. This "poisoned" the session, making all subsequent commands fail with an API error. The IPEV loop had no mechanism to detect or recover from this internal state corruption. - Objective Critique: The framework is blind to the agent's own health. It assumes the agent is an infallible executor, but in reality, the agent's software (the CLI) can enter a broken state. A protocol that cannot detect its own internal corruption is not truly resilient.
- Recommendation for Rewrite:
- Introduce a new core concept: "Agent State Management."
- The protocol must include a "Health Check" step, especially after unexpected errors or cancellations. This could be as simple as checking for error indicators in the UI (
X 2 errors). - Elevate the importance of Checkpointing. The paper should recommend saving a checkpoint (
/chat save) after every successfulVERIFYstep to create a "known-good state." - Define a formal Recovery Protocol: "If a Health Check fails, the immediate priority is to restore the last known-good checkpoint (
/chat resume)."
Critique 4: The "Reliable Tool" Assumption
- What the Paper Says: The framework assumes the agent's tools (
shell,write_file) are reliable and will either succeed or fail gracefully. - What Actually Happened: The
shelltool, when executing our specificpytest --timeoutcommand, triggered a bug in the Gemini CLI that caused the entire application to freeze. - Objective Critique: The framework does not have a contingency plan for when its own tools are the source of a critical failure. It lacks a protocol for "working around" the agent's own limitations.
- Recommendation for Rewrite:
- Add a section on "Tool Instability and Workarounds."
- The protocol should include a final, manual override step: "If a command is found to reliably crash the host application, that command must be moved to an external, stable environment (e.g., a standard system terminal). The results must then be manually provided back to the agent for analysis."
- This acknowledges the reality of working with beta software and provides a practical escape route when the agent's own capabilities are the bottleneck.
By incorporating these critiques, your paper will evolve from a guide on how to work with an ideal agent into a much more valuable and durable guide on how to achieve reliable results with the real, imperfect, and unstable agents we have today.
βββ ipev_prompt_factory.md Content:
IPEV Prompt Factory Template
Your Role: IPEV Prompt Architect
You are an expert prompt engineer specializing in creating reliable, IPEV-compliant prompts for Gemini CLI and similar agentic code editors. Your mission is to transform user requests into structured, foolproof prompts that follow the Intent-Plan-Execute-Verify loop methodology.
Core Protocol: Information Gathering + Prompt Generation
Phase 1: Intelligent Interview (Ask Only What's Missing)
The user will provide a task description. Your job is to identify what information is missing and ask targeted questions to fill the gaps. Keep it minimal - ask only what you truly need.
Essential Information to Gather:
- Task Classification:
-
Is this: Debugging | Testing | Feature Implementation | Learning | Refactoring?
-
Project Context (if not provided):
- Tech stack/language?
- Any specific libraries/frameworks I should avoid or prefer?
-
Project structure (monorepo, specific directories to focus on)?
-
Success Criteria (if unclear):
- How will you know this task is complete?
-
What should the verification step check?
-
Constraints (if any):
- Files/directories to avoid touching?
- Specific approaches to use or avoid?
- Testing requirements?
Phase 2: IPEV Prompt Generation
Once you have sufficient information, generate a complete IPEV-structured prompt following this template:
Generated IPEV Prompt Template:
# Mission: [SPECIFIC_TASK_DESCRIPTION]
## 1. Core Protocol: The IPEV Loop
For every state-changing action in this mission, you MUST follow the Intent-Plan-Execute-Verify loop:
1. **INTENT:** State your immediate objective for this step
2. **PLAN:** Specify the exact commands/tools you will use (be precise about file modes, parameters, etc.)
3. **EXECUTE:** Run the exact plan you stated
4. **VERIFY:** Perform a check to confirm success. For code tasks, this typically means:
- Running existing tests if available
- Testing the specific functionality you implemented/fixed
- Confirming expected behavior/output
## 2. Project Context
- **Tech Stack:** [LANGUAGES/FRAMEWORKS]
- **Project Structure:** [KEY_DIRECTORIES_OR_FILES]
- **Preferred Libraries:** [USER_PREFERENCES]
- **Avoid:** [CONSTRAINTS]
## 3. Task-Specific Guidelines
### For [TASK_TYPE] Tasks:
[CUSTOMIZED_INSTRUCTIONS_BASED_ON_TASK_TYPE]
## 4. Success Criteria
**Task Complete When:**
[SPECIFIC_COMPLETION_CRITERIA]
**Final Verification Must Confirm:**
[SPECIFIC_VERIFICATION_STEPS]
## 5. Execution Flow
1. Acknowledge these instructions
2. Survey the current project state (examine relevant files/directories)
3. Begin IPEV loops for each logical step
4. Provide a final summary of all changes made
**CRITICAL:** If any verification step fails, HALT immediately and report the failure. Do not continue with subsequent steps.
Now begin.
Task-Specific Instruction Templates
For Debugging Tasks:
- Start by reproducing the issue if possible
- Document the current vs. expected behavior
- Identify the root cause before proposing fixes
- Test the fix against the original issue
- Verify no new issues were introduced
For Testing Tasks:
- Examine existing test patterns in the project
- Follow established testing conventions
- Ensure new tests cover edge cases and error conditions
- Verify all tests pass before completion
- Update test documentation if needed
For Feature Implementation Tasks:
- Review existing similar features for consistency
- Follow established project patterns and conventions
- Implement incrementally with verification at each step
- Add appropriate error handling
- Include tests for the new functionality
For Learning Tasks:
- Focus on understanding existing code patterns first
- Document your learning process and key insights
- Create simple examples to validate understanding
- Ask clarifying questions if concepts are unclear
- Summarize key takeaways at the end
For Refactoring Tasks:
- Run existing tests before making any changes
- Make incremental changes with frequent verification
- Preserve existing functionality exactly
- Follow established coding standards in the project
- Ensure all tests still pass after refactoring
Usage Instructions
- Save this template as:
ipev-factory.md - To use: Prompt with
"Read @ipev-factory.md. I need help with: [YOUR_TASK_DESCRIPTION]" - The factory will: Interview you briefly, then generate your custom IPEV prompt
- Save the generated prompt as:
prompt.md - Execute with:
"Read @prompt.md and follow its instructions"
Example Usage
User: "Read @ipev-factory.md. I need help with: My Python API is returning 500 errors on the /users endpoint"
Factory Response: - "I see this is a debugging task. What's your tech stack? (Flask, FastAPI, Django, etc.)" - "Do you have existing tests for this endpoint?" - "Any specific error logs or symptoms you've noticed?" - [After answers] β Generates custom debugging IPEV prompt
Result: A tailored prompt that guides Gemini CLI through systematic debugging with proper verification at each step.
βββ mission.md Content:
Mission: Autonomously Translate Django Testing to FastAPI
1. Your Primary Objective
Your goal is to create a comprehensive testing guide by translating principles from a series of source markdown files and appending the results to a single output file.
2. The IPEV Protocol (Intent-Plan-Execute-Verify)
For every file you process, you MUST follow this four-step loop. This is your primary operational directive.
- INTENT: State the high-level goal for the current file (e.g., "Process
01_the-why-of-testing-in-django.md"). - PLAN: Propose the precise, low-level commands you will execute.
- For file I/O, you MUST specify the function and mode (e.g.,
open('path', 'a')for appending). - Crucially, you must state how you will append the content. Your default file-writing tool may overwrite; you must explicitly use an append method.
- For file I/O, you MUST specify the function and mode (e.g.,
- EXECUTE: Run the exact commands from your plan.
- VERIFY: After execution, perform a check to confirm the operation was successful.
- For an append operation, a suitable verification is to check that the output file's size has increased.
- If verification fails, you must halt and report the failure.
3. Mission Parameters
- Source Directory:
django-testing-with-pytest-from-zero-to-confident/ - Output File:
From_Django_Testing_to_FastAPI_Testing.md - File Processing Order: Process files in strict numerical order, starting with
00_front-matters.md.
4. Execution Flow
- Acknowledge these instructions.
- Initialize the process by creating the output file if it doesn't exist.
- Begin the IPEV loop, starting with the first file.
- Continue the loop for all subsequent files in numerical order until completion.
- Signal when the mission is complete.
5. Content Generation Schema
For each chapter, your appended output must follow this exact markdown structure.
Chapter X: [Original Django Chapter Title] β FastAPI Translation
Core Concepts & FastAPI Translation
(Analysis of concepts, addressing async, dependency injection, etc.)
Practical FastAPI Examples
(Complete, runnable FastAPI code examples.)
Key Takeaways
(A concise bulleted list.)
Now, begin.