An AI that writes code is interesting. An AI that writes code, runs it, reads the errors, and fixes itself is useful. Here is how to build one.
Code generation with LLMs is unreliable. GPT-4o generates working code about 60-70% of the time for non-trivial tasks. The other 30-40% contains syntax errors, wrong API usage, missing imports, or logic bugs. The fix is not a better model. The fix is a feedback loop: generate code, execute it, read the error, and regenerate. This is exactly the pattern that LangGraph's cyclical graphs are designed for. This tutorial walks through building a self-correcting coding agent from scratch.
The architecture is a cycle with three nodes:
Why LangGraph, Not LangChain?
LangChain processes data through a linear chain: A -> B -> C -> Done. Self-correction requires cycles: Write -> Test -> Fix -> Test -> Fix -> Test -> Pass. LangGraph supports these cycles natively with conditional edges. Learn more in our LangChain vs LangGraph comparison.
# State tracks the code, errors, and iteration count
from typing import TypedDict, Optional
class CoderState(TypedDict):
requirement: str # user's task description
code: str # current generated code
test_output: str # stdout + stderr from execution
test_passed: bool # did all tests pass?
error_history: list # previous errors for context
iteration: int # current iteration count
max_iterations: int # safety limit (default: 5)
# Write node: generates or fixes code
def write_code(state: CoderState) -> CoderState:
if state["iteration"] == 0:
prompt = f"Write Python code for: {state['requirement']}"
else:
prompt = (
f"Fix this code:\n{state['code']}\n\n"
f"Error:\n{state['test_output']}\n\n"
f"Previous errors: {state['error_history']}"
)
code = llm.invoke(prompt)
return {"code": extract_code(code), "iteration": state["iteration"] + 1}
# Test node: executes code in sandbox
def test_code(state: CoderState) -> CoderState:
result = sandbox.execute(state["code"], timeout=30)
passed = result.exit_code == 0
errors = state["error_history"].copy()
if not passed:
errors.append(result.stderr)
return {
"test_output": result.stdout + result.stderr,
"test_passed": passed,
"error_history": errors
}
# Build the self-correction cycle
from langgraph.graph import StateGraph, END
graph = StateGraph(CoderState)
graph.add_node("write", write_code)
graph.add_node("test", test_code)
graph.add_edge("write", "test")
graph.add_conditional_edges(
"test",
lambda state: (
END if state["test_passed"]
else END if state["iteration"] >= state["max_iterations"]
else "write" # loop back to fix
)
)
graph.set_entry_point("write")
agent = graph.compile()
Never execute LLM-generated code directly on your server. Use a sandboxed environment:
The self-correction loop raises the effective success rate from 65% to 93%. The remaining 7% typically involves tasks that require architectural changes the LLM cannot figure out from error messages alone. For those cases, add a human-in-the-loop step using LangGraph's interrupt/resume.
The basic loop above needs several additions for production:
For the complete production architecture, see our production agent blueprint.
Only if you execute code without a sandbox. With Docker containers or E2B, the generated code runs in complete isolation. Network access should be disabled by default and file system access restricted to a temporary directory.
No. It handles well-defined, testable programming tasks (write a function that..., parse this data format, implement this algorithm). It does not handle system design, architecture decisions, or ambiguous requirements. Use it as a coding assistant, not an autonomous developer.
Python has the best results because LLMs have the most Python training data, and Python's error messages are descriptive. The pattern works for JavaScript and TypeScript as well. For compiled languages (Go, Rust), the compilation step adds latency to each iteration. Read more about why Python dominates AI development.
We build custom coding agents and developer tools powered by LLMs. From code generation to automated testing pipelines.
Discuss Your Project