The Wasted Learning Signal — The Gap Between AI Drafts and What You Actually Publish

Introducing Self-Tuning Loop: capture implicit feedback from human edit diffs, analyze patterns periodically, and auto-evolve prompt guidelines. Includes academic gap analysis (DSPy, TextGrad, POHF).

If you use AI to write, this routine is familiar.

1. Ask AI for a draft → 2. Read the result → 3. Fix the tone, delete unnecessary sentences, reorder paragraphs → 4. Publish → 5. Repeat tomorrow from step 1

The judgments embedded in step 3 — “this tone is too stiff,” “the intro is too long,” “show the numbers first” — are exactly the learning signals AI needs to adapt to you.

But you throw them away every single time.

The Structural Problem with AI Writing Today

The Limits of Custom Instructions

ChatGPT’s memory, Claude’s Custom Style, Notion AI’s Voice settings. Most AI tools offer explicit configuration as the way to reflect user style.

Tool	”Learning” Method	Reality
ChatGPT Memory	Auto-extract insights from chat	Stores facts (job, preferences). Not writing style
Claude Custom Style	Upload sample text	One-time analysis. No evolution during use
Jasper Brand Voice	Upload existing content → profile	Static after initial setup
Grammarly	Track suggestion accept/reject patterns	Limited to sentence-level suggestions
Notion AI	Workspace context awareness	Document reference, not learning

See the pattern? All of them are initial setup → static forever. None actually improve with use.

Grammarly comes closest, but even it only tracks “accept/reject of sentence-level suggestions” — not the diff between a full draft and the final version.

Why No One Does This

It’s not technically impossible. Two reasons:

First, capturing diffs requires both the “draft” and “final” to exist in the same system. If you get a draft from ChatGPT and edit it in Google Docs, the two systems don’t know about each other. The draft lives in ChatGPT, the final in Docs — the connection is broken.

Second, even with diffs, there’s no pipeline to turn them into “learning.” Manually tweaking prompts is currently the only method.

The Value of Wasted Signals

I ran an experiment. Over two weeks, I collected the diffs between AI-generated LinkedIn drafts and the actually published versions, then had an LLM analyze them all at once.

Results:

Changed question-style openings to declarative statements — 7 times

Expanded hashtags from 3 → 5 — consistent pattern

Changed formal endings to casual tone — 11 out of 14 drafts

Added English terms in parentheses after technical jargon — every time

These patterns were never explicitly requested. I never wrote “use declarative openings” in any Custom Instructions. I just kept editing. But when the diffs were collected, clear preferences emerged.

This is implicit feedback — learning signals embedded in the editing behavior itself, without the user being conscious of it.

Self-Tuning Loop: A System Where Prompts Evolve

The solution is structurally simple.

graph TD
    G["1. Generate — Create draft using current guidelines"]
    C["2. Capture — Store diff between draft and final"]
    A["3. Analyze — Extract repeating patterns from N diffs"]
    E["4. Evolve — Auto-update guidelines"]

    G --> C
    C --> A
    A --> E
    E -->|"next draft gets closer to what you want"| G

Self-Tuning Loop. A 4-step cycle.

Step 1: Generate

AI creates a draft referencing current guidelines. Initially, these can be just a few lines.

## Writing Guidelines v1
- Tone: Professional but not stiff
- Length: ~1,500 characters
- Structure: Lead with the point, background later

Step 2: Capture

When the user edits the draft into a final version, both are stored. The key is extracting the diff.

[Draft] "The advancement of AI technology is bringing great changes to our society."
[Final] "Enterprise adoption rose 34% within 3 months of GPT-4o's launch."
[Diff]  Abstract statement → concrete data + proper nouns

Step 3: Analyze

After a period (7 days, 10 items, etc.) of accumulated diffs, request pattern analysis from an LLM.

## Analysis Prompt
Below are the diffs from the last 10 AI drafts vs. user final versions.
Extract repeating editing patterns.
For each pattern:
- Pattern description (what changes and how)
- Frequency (X out of 10)
- Inferred reason (why this change is made)

Step 4: Evolve

Apply extracted patterns to the guidelines. With a safety mechanism:

Classification	Criteria	Action
Safe	70%+ frequency, style/tone related	Auto-apply
Risky	Below 50% frequency, or structural changes	Suggest only (human approves)

Auto-applying only Safe changes makes guidelines gradually converge toward the user’s preferences.

## Writing Guidelines v3 (auto-evolved)
- Tone: Professional but not stiff
- Length: ~1,500 characters
- Structure: Lead with the point, background later
+ - Openings: Start with specific data or proper nouns (avoid abstractions)
+ - Ending style: Use casual tone (avoid formal endings)
+ - Technical terms: Add English in parentheses on first mention
+ - Hashtags: Use 5 (3 is too few)

That’s all it takes. No fine-tuning, no ML infrastructure — just evolving prompt text makes the next draft better.

Relationship to Academic Research

Automatic prompt optimization is an active research area. Here’s how Self-Tuning Loop differs from existing work.

Existing Approaches: Metric-Based Optimization

Research	Core Mechanism	Feedback Signal
APE (Zhou et al., 2023)	LLM generates instruction candidates → evaluate → select best	Task accuracy
OPRO (DeepMind, 2023)	LLM as optimizer, referencing score history	Benchmark scores
DSPy (Stanford, 2024)	Declare prompts as modules, compiler auto-optimizes	Developer-defined metrics
EvoPrompt (2024)	Genetic algorithm evolves prompt population	Dev set accuracy

All optimize automatic benchmark scores. “Accuracy 85% → 91%.” Irrelevant to individual style preferences.

Closest Research: TextGrad and POHF

TextGrad (Stanford, 2024, published in Nature) has the most similar structure.

graph LR
    subgraph TextGrad
        O1["AI output"] --> C1["LLM generates critique"]
        C1 --> U1["Use critique as gradient"]
        U1 --> O1
    end

    subgraph Self-Tuning Loop
        O2["AI draft"] --> C2["Human edits"]
        C2 --> U2["Use edit diff as gradient"]
        U2 --> O2
    end

One difference. In TextGrad, the gradient comes from AI’s automatic critique. In Self-Tuning Loop, it comes from actual human edits.

POHF (Lin et al., ICML 2024) is the closest human-feedback research. It shows users A/B outputs and asks “which is better?” But requiring explicit comparison judgments every time creates high user friction.

The Gap: What No One Has Done

From a February 2025 comprehensive survey (Systematic Survey of Automatic Prompt Optimization Techniques, arxiv:2502.16923):

Feedback Type	Used By	Notes
Numeric score (accuracy etc.)	APE, OPRO, DSPy, EvoPrompt, PromptBreeder	Most common
LLM synthetic critique	TextGrad, Self-Refine, Constitutional AI	AI evaluating AI
Human A/B preference	POHF	Explicit feedback (high friction)
Human edit diffs	No existing research	White space

A system that uses human edit diffs as implicit feedback to automatically evolve prompts does not exist — academically or commercially.

Why This Beats Fine-Tuning

“Why not just fine-tune the model?” — A reasonable question.

Criteria	Fine-tuning	Self-Tuning Loop
Cost	GPU hours + data pipeline	$0 (just prompt text changes)
Data needed	Hundreds to thousands of pairs	First improvement from as few as 3 diffs
Interpretability	Weight changes — black box	Guideline change history — human-readable
Rollback	Restore checkpoint (complex)	Delete one line
Model independence	Locked to specific model	Works with any LLM
Time to start	Days building pipeline	Immediate

Fine-tuning suits large-scale pattern changes. But for learning that “this person writes short intros” or “uses 5 hashtags” — it’s overkill.

Self-Tuning Loop evolves the prompt, not the model. Switch models and the guidelines still work. That’s its biggest advantage.

Where Can This Be Applied?

This loop applies to any repetitive task where AI creates a draft and a human refines it.

Domain	Generate	Capture (diff)	Analyze	Evolve
Email	AI drafts reply	Sent mail vs draft	”Less formal,” “remove greeting”	Update tone guide
Blog	AI drafts post	Published vs draft	”Shorter intro,” “data first”	Patch writing guide
Contracts	AI drafts clauses	Lawyer’s edits	”Always add indemnity clause”	Evolve clause templates
Code review	AI generates comments	Accept/ignore patterns	”Only accept naming, ignore structure”	Adjust review criteria
Proposals	AI drafts slides	Final version diff	”Always add data backing”	Evolve proposal prompt
News curation	AI selects articles	👍/👎 feedback	”Prefers consulting firm news”	Patch selection criteria

See the common structure? The 4-step loop is identical regardless of domain.

Personal RLHF

In the Ralph Loop series, we covered the structure of RLHF (Reinforcement Learning from Human Feedback) — the technology OpenAI invested billions of dollars into for ChatGPT.

RLHF’s core goal is “reflecting human preferences into the model.” Self-Tuning Loop achieves the same goal at a completely different layer.

Dimension	RLHF	Self-Tuning Loop
Layer	Model weights	Prompt text
Cost	GPU clusters + labelers	$0
Feedback collection	Professional labelers do A/B comparison	User edits as usual (zero friction)
Personalization	Average across all users	Individual unique preferences
Update frequency	Model release (months)	Weekly, automatic
Interpretability	Weight changes (unreadable)	Natural language guidelines (readable)
Rollback	Restore checkpoint	Delete one line

If RLHF bakes the average preference of all users into model weights, Self-Tuning Loop accumulates one person’s unique judgment into a prompt.

Next: The Actual System

Concepts end here. The next part dissects a system where this loop is actually running in production.

35 sources auto-collected, AI-curated, with a weekly feedback loop for auto-improvement
Send a photo from Telegram and a blog draft is auto-generated
LinkedIn drafts created daily, editing patterns reflected in tone guides
All of this running at $0 additional cost

Cron, Telegram, Claude — the architecture of a self-improving operations system built with three tools.

This Series

Part 1: The Wasted Learning Signal (this post)
Part 2: Cron + Telegram + Claude System Anatomy (coming soon)
Part 3: Build Your Own Self-Tuning Loop (coming soon)

Ralph Loop: Why Context Rot — From RLHF to context collapse
Claude Code Anatomy: Architecture — Inside a production agent system
My Memory System — Another approach to prompt evolution

References

Zhou et al., “Large Language Models Are Human-Level Prompt Engineers” (APE), NeurIPS 2023
Yang et al., “Large Language Models as Optimizers” (OPRO), ICLR 2024
Khattab et al., “DSPy: Compiling Declarative Language Model Calls” Stanford, 2024
Yuksekgonul et al., “TextGrad: Automatic Differentiation via Text” Nature, 2024
Lin et al., “Prompt Optimization with Human Feedback” (POHF), ICML 2024
Fernando et al., “PromptBreeder: Self-Referential Self-Improvement” DeepMind, ICML 2024
“A Systematic Survey of Automatic Prompt Optimization Techniques” arxiv:2502.16923, 2025