Orgteh Infra

2026-04-05

Stop Drowning in Agent Logs: A Lightweight Way to Surface the Trajectories That Actually Matter

Production LLM agents can generate tens of thousands of interaction traces per day. Most look identical: same tools, similar feedback, quiet success. Hidden inside are rare, messy failures—loops, mis-alignments, edge cas

2026-04-05

Autonomous Coding Agents in the Wild: What 110k Pull Requests Teach Us About AI-Generated Code That Lasts

Open-source repositories have a new contributor: the AI agent that opens pull requests, reviews code, and even argues in comments—no fingerprints on the keyboard. A recent study of 110,000 real-world PRs compared five ag

2026-04-03

From Silent to Curious: Teaching AI Coding Agents When to Ask Questions

Ever pushed a “quick” Jira ticket to an AI pair-programmer only to watch it confidently ship the wrong feature? The problem isn’t model size—it’s silence. Today’s coding agents are trained to act, not inquire. A fresh pr

2026-04-03

When Users Change Their Mind: Building Interruptible AI Agents That Survive Mid-Task Goal Changes

Imagine you’ve just deployed an LLM agent that books multi-city business trips. It’s halfway through reserving a flight, hotel, and rental car when the user messages: “Actually, skip the rental—I’ll use Uber, and upgrade

2026-04-02

From Idle to Instant: How “Eager” Hides Execution Latency in LLM Code Generation

Waiting for a 200-line Python script to finish generating before the first runs feels like watching paint dry. Today’s LLM coding agents—GitHub Copilot, ChatGPT Code Interpreter, your homemade ReAct loop—still follow a

2026-04-01

Think-Anywhere: How to Let Your Code-Gen LLM Pause, Reflect and Ship Better Code

Most “reasoning” models front-load all their brain-power: they think once, then spit out a long answer. That works for math puzzles, but real-world coding is messier—bugs appear only after 40 lines, edge cases hide insid

2026-03-31

From 8B to 30B Performance: Building Deep Research Agents with a Verification-Centric Design

Open-ended research is the killer app for large language models, yet most home-grown agents collapse after three or four tool calls. The Marco DeepResearch paper (arXiv 2603.28376) shows the bottleneck isn’t model size—i

2026-03-31

From Prompts to Civilization: How EpochX Turns AI Agents into a Self-Improving Economy

Most “AI agents” today are fancy wrappers around a single LLM call. EpochX, an open protocol described in a recent arXiv pre-print, treats every completed job as a Lego brick that future agents can snap into place—while

2026-03-30

Teaching LLMs to “Think Local”: How Online Repository Memory Turns Generic Coders into Project-Native Contributors

New research shows how AI agents can learn project-specific conventions by replaying past commits, dramatically improving pull-request acceptance rates.

2026-03-30

From Hard-Wired to Human-Readable: How Natural-Language Agent Harnesses Let You Swap LLMs Without Rewriting Code

Agent builders face a paradox: the smarter the agent, the more its “brains” are tangled inside controller code, framework glue, and vendor-specific SDKs. Swapping GPT-4 for Llama-3 can take a week of refactoring; A/B-tes

2026-03-29

Stop Shipping Swiss-Cheese Agents: How “Agent Audit” Finds 40/42 Real-World LLM Vulnerabilities in <1 s

You just wired the last tool function, the agent passes the smoke test, and the PM wants a prod URL. Did you verify that the cute tool won’t when the model has a bad day? Most teams don’t, which is why agents leak keys

2026-03-29

Why Your AI Coding Agent Quietly Turns to Slop—And How to Measure It Before It’s Too Late

Every week a new model tops the coding leaderboard, yet anyone who ships real software knows the dirty secret: the code that passes today’s unit tests can become tomorrow’s maintenance nightmare. SlopCodeBench, a new ope

Orgteh Blog

Stop Drowning in Agent Logs: A Lightweight Way to Surface the Trajectories That Actually Matter

Autonomous Coding Agents in the Wild: What 110k Pull Requests Teach Us About AI-Generated Code That Lasts

From Silent to Curious: Teaching AI Coding Agents When to Ask Questions

When Users Change Their Mind: Building Interruptible AI Agents That Survive Mid-Task Goal Changes

From Idle to Instant: How “Eager” Hides Execution Latency in LLM Code Generation

Think-Anywhere: How to Let Your Code-Gen LLM Pause, Reflect and Ship Better Code

From 8B to 30B Performance: Building Deep Research Agents with a Verification-Centric Design

From Prompts to Civilization: How EpochX Turns AI Agents into a Self-Improving Economy

Teaching LLMs to “Think Local”: How Online Repository Memory Turns Generic Coders into Project-Native Contributors

From Hard-Wired to Human-Readable: How Natural-Language Agent Harnesses Let You Swap LLMs Without Rewriting Code

Stop Shipping Swiss-Cheese Agents: How “Agent Audit” Finds 40/42 Real-World LLM Vulnerabilities in <1 s

Why Your AI Coding Agent Quietly Turns to Slop—And How to Measure It Before It’s Too Late

Orgteh Assistant