The Wasted Learning Signal — The Gap Between AI Drafts and What You Actually Publish
Introducing Self-Tuning Loop: capture implicit feedback from human edit diffs, analyze patterns periodically, and auto-evolve prompt guidelines. Includes academic gap analysis (DSPy, TextGrad, POHF).
If you use AI to write, this routine is familiar.
1. Ask AI for a draft → 2. Read the result → 3. Fix the tone, delete unnecessary sentences, reorder paragraphs → 4. Publish → 5. Repeat tomorrow from step 1
The judgments embedded in step 3 — “this tone is too stiff,” “the intro is too long,” “show the numbers first” — are exactly the learning signals AI needs to adapt to you.
But you throw them away every single time.
The Structural Problem with AI Writing Today
The Limits of Custom Instructions
ChatGPT’s memory, Claude’s Custom Style, Notion AI’s Voice settings. Most AI tools offer explicit configuration as the way to reflect user style.
| Tool | ”Learning” Method | Reality |
|---|---|---|
| ChatGPT Memory | Auto-extract insights from chat | Stores facts (job, preferences). Not writing style |
| Claude Custom Style | Upload sample text | One-time analysis. No evolution during use |
| Jasper Brand Voice | Upload existing content → profile | Static after initial setup |
| Grammarly | Track suggestion accept/reject patterns | Limited to sentence-level suggestions |
| Notion AI | Workspace context awareness | Document reference, not learning |
See the pattern? All of them are initial setup → static forever. None actually improve with use.
Grammarly comes closest, but even it only tracks “accept/reject of sentence-level suggestions” — not the diff between a full draft and the final version.
Why No One Does This
It’s not technically impossible. Two reasons:
First, capturing diffs requires both the “draft” and “final” to exist in the same system. If you get a draft from ChatGPT and edit it in Google Docs, the two systems don’t know about each other. The draft lives in ChatGPT, the final in Docs — the connection is broken.
Second, even with diffs, there’s no pipeline to turn them into “learning.” Manually tweaking prompts is currently the only method.
The Value of Wasted Signals
I ran an experiment. Over two weeks, I collected the diffs between AI-generated LinkedIn drafts and the actually published versions, then had an LLM analyze them all at once.
Results:
- Changed question-style openings to declarative statements — 7 times
- Expanded hashtags from 3 → 5 — consistent pattern
- Changed formal endings to casual tone — 11 out of 14 drafts
- Added English terms in parentheses after technical jargon — every time
These patterns were never explicitly requested. I never wrote “use declarative openings” in any Custom Instructions. I just kept editing. But when the diffs were collected, clear preferences emerged.
This is implicit feedback — learning signals embedded in the editing behavior itself, without the user being conscious of it.
Self-Tuning Loop: A System Where Prompts Evolve
The solution is structurally simple.
graph TD
G["1. Generate — Create draft using current guidelines"]
C["2. Capture — Store diff between draft and final"]
A["3. Analyze — Extract repeating patterns from N diffs"]
E["4. Evolve — Auto-update guidelines"]
G --> C
C --> A
A --> E
E -->|"next draft gets closer to what you want"| G
Self-Tuning Loop. A 4-step cycle.
Step 1: Generate
AI creates a draft referencing current guidelines. Initially, these can be just a few lines.
## Writing Guidelines v1
- Tone: Professional but not stiff
- Length: ~1,500 characters
- Structure: Lead with the point, background later
Step 2: Capture
When the user edits the draft into a final version, both are stored. The key is extracting the diff.
[Draft] "The advancement of AI technology is bringing great changes to our society."
[Final] "Enterprise adoption rose 34% within 3 months of GPT-4o's launch."
[Diff] Abstract statement → concrete data + proper nouns
Step 3: Analyze
After a period (7 days, 10 items, etc.) of accumulated diffs, request pattern analysis from an LLM.
## Analysis Prompt
Below are the diffs from the last 10 AI drafts vs. user final versions.
Extract repeating editing patterns.
For each pattern:
- Pattern description (what changes and how)
- Frequency (X out of 10)
- Inferred reason (why this change is made)
Step 4: Evolve
Apply extracted patterns to the guidelines. With a safety mechanism:
| Classification | Criteria | Action |
|---|---|---|
| Safe | 70%+ frequency, style/tone related | Auto-apply |
| Risky | Below 50% frequency, or structural changes | Suggest only (human approves) |
Auto-applying only Safe changes makes guidelines gradually converge toward the user’s preferences.
## Writing Guidelines v3 (auto-evolved)
- Tone: Professional but not stiff
- Length: ~1,500 characters
- Structure: Lead with the point, background later
+ - Openings: Start with specific data or proper nouns (avoid abstractions)
+ - Ending style: Use casual tone (avoid formal endings)
+ - Technical terms: Add English in parentheses on first mention
+ - Hashtags: Use 5 (3 is too few)
That’s all it takes. No fine-tuning, no ML infrastructure — just evolving prompt text makes the next draft better.
Relationship to Academic Research
Automatic prompt optimization is an active research area. Here’s how Self-Tuning Loop differs from existing work.
Existing Approaches: Metric-Based Optimization
| Research | Core Mechanism | Feedback Signal |
|---|---|---|
| APE (Zhou et al., 2023) | LLM generates instruction candidates → evaluate → select best | Task accuracy |
| OPRO (DeepMind, 2023) | LLM as optimizer, referencing score history | Benchmark scores |
| DSPy (Stanford, 2024) | Declare prompts as modules, compiler auto-optimizes | Developer-defined metrics |
| EvoPrompt (2024) | Genetic algorithm evolves prompt population | Dev set accuracy |
All optimize automatic benchmark scores. “Accuracy 85% → 91%.” Irrelevant to individual style preferences.
Closest Research: TextGrad and POHF
TextGrad (Stanford, 2024, published in Nature) has the most similar structure.
graph LR
subgraph TextGrad
O1["AI output"] --> C1["LLM generates critique"]
C1 --> U1["Use critique as gradient"]
U1 --> O1
end
subgraph Self-Tuning Loop
O2["AI draft"] --> C2["Human edits"]
C2 --> U2["Use edit diff as gradient"]
U2 --> O2
end
One difference. In TextGrad, the gradient comes from AI’s automatic critique. In Self-Tuning Loop, it comes from actual human edits.
POHF (Lin et al., ICML 2024) is the closest human-feedback research. It shows users A/B outputs and asks “which is better?” But requiring explicit comparison judgments every time creates high user friction.
The Gap: What No One Has Done
From a February 2025 comprehensive survey (Systematic Survey of Automatic Prompt Optimization Techniques, arxiv:2502.16923):
| Feedback Type | Used By | Notes |
|---|---|---|
| Numeric score (accuracy etc.) | APE, OPRO, DSPy, EvoPrompt, PromptBreeder | Most common |
| LLM synthetic critique | TextGrad, Self-Refine, Constitutional AI | AI evaluating AI |
| Human A/B preference | POHF | Explicit feedback (high friction) |
| Human edit diffs | No existing research | White space |
A system that uses human edit diffs as implicit feedback to automatically evolve prompts does not exist — academically or commercially.
Why This Beats Fine-Tuning
“Why not just fine-tune the model?” — A reasonable question.
| Criteria | Fine-tuning | Self-Tuning Loop |
|---|---|---|
| Cost | GPU hours + data pipeline | $0 (just prompt text changes) |
| Data needed | Hundreds to thousands of pairs | First improvement from as few as 3 diffs |
| Interpretability | Weight changes — black box | Guideline change history — human-readable |
| Rollback | Restore checkpoint (complex) | Delete one line |
| Model independence | Locked to specific model | Works with any LLM |
| Time to start | Days building pipeline | Immediate |
Fine-tuning suits large-scale pattern changes. But for learning that “this person writes short intros” or “uses 5 hashtags” — it’s overkill.
Self-Tuning Loop evolves the prompt, not the model. Switch models and the guidelines still work. That’s its biggest advantage.
Where Can This Be Applied?
This loop applies to any repetitive task where AI creates a draft and a human refines it.
| Domain | Generate | Capture (diff) | Analyze | Evolve |
|---|---|---|---|---|
| AI drafts reply | Sent mail vs draft | ”Less formal,” “remove greeting” | Update tone guide | |
| Blog | AI drafts post | Published vs draft | ”Shorter intro,” “data first” | Patch writing guide |
| Contracts | AI drafts clauses | Lawyer’s edits | ”Always add indemnity clause” | Evolve clause templates |
| Code review | AI generates comments | Accept/ignore patterns | ”Only accept naming, ignore structure” | Adjust review criteria |
| Proposals | AI drafts slides | Final version diff | ”Always add data backing” | Evolve proposal prompt |
| News curation | AI selects articles | 👍/👎 feedback | ”Prefers consulting firm news” | Patch selection criteria |
See the common structure? The 4-step loop is identical regardless of domain.
Personal RLHF
In the Ralph Loop series, we covered the structure of RLHF (Reinforcement Learning from Human Feedback) — the technology OpenAI invested billions of dollars into for ChatGPT.
RLHF’s core goal is “reflecting human preferences into the model.” Self-Tuning Loop achieves the same goal at a completely different layer.
| Dimension | RLHF | Self-Tuning Loop |
|---|---|---|
| Layer | Model weights | Prompt text |
| Cost | GPU clusters + labelers | $0 |
| Feedback collection | Professional labelers do A/B comparison | User edits as usual (zero friction) |
| Personalization | Average across all users | Individual unique preferences |
| Update frequency | Model release (months) | Weekly, automatic |
| Interpretability | Weight changes (unreadable) | Natural language guidelines (readable) |
| Rollback | Restore checkpoint | Delete one line |
If RLHF bakes the average preference of all users into model weights, Self-Tuning Loop accumulates one person’s unique judgment into a prompt.
Next: The Actual System
Concepts end here. The next part dissects a system where this loop is actually running in production.
- 35 sources auto-collected, AI-curated, with a weekly feedback loop for auto-improvement
- Send a photo from Telegram and a blog draft is auto-generated
- LinkedIn drafts created daily, editing patterns reflected in tone guides
- All of this running at $0 additional cost
Cron, Telegram, Claude — the architecture of a self-improving operations system built with three tools.
This Series
- Part 1: The Wasted Learning Signal (this post)
- Part 2: Cron + Telegram + Claude System Anatomy (coming soon)
- Part 3: Build Your Own Self-Tuning Loop (coming soon)
Related Posts
- Ralph Loop: Why Context Rot — From RLHF to context collapse
- Claude Code Anatomy: Architecture — Inside a production agent system
- My Memory System — Another approach to prompt evolution
References
- Zhou et al., “Large Language Models Are Human-Level Prompt Engineers” (APE), NeurIPS 2023
- Yang et al., “Large Language Models as Optimizers” (OPRO), ICLR 2024
- Khattab et al., “DSPy: Compiling Declarative Language Model Calls” Stanford, 2024
- Yuksekgonul et al., “TextGrad: Automatic Differentiation via Text” Nature, 2024
- Lin et al., “Prompt Optimization with Human Feedback” (POHF), ICML 2024
- Fernando et al., “PromptBreeder: Self-Referential Self-Improvement” DeepMind, ICML 2024
- “A Systematic Survey of Automatic Prompt Optimization Techniques” arxiv:2502.16923, 2025
Related Posts

Build Your Own Self-Tuning Loop — Reference Implementation Guide
Self-Tuning Loop 4 steps (Generate → Capture → Analyze → Evolve) extracted as a universal module. Supabase DDL, diff capture utilities, analysis/evolution prompts, email/blog examples, GitHub reference implementation.

Cron + Telegram + Claude: Anatomy of a Self-Improving System at $0
The actual production system implementing Self-Tuning Loop from Part 1. Data collection (35 sources), AI curation, Telegram input pipeline, weekly auto-review, and Syncthing-based zero-deploy prompt evolution.

Multi-Agent Workflow — 6 Patterns from Supervisor to Swarm
Six core patterns of multi-agent workflow (Supervisor / Sequential / Hierarchical / Network / Swarm / Map-Reduce), grounded in primary sources from LangGraph, CrewAI, OpenAI, and Anthropic. Each pattern's topology and fit, plus a decision framework for production.