Minbook
KO
The Wasted Learning Signal — The Gap Between AI Drafts and What You Actually Publish

The Wasted Learning Signal — The Gap Between AI Drafts and What You Actually Publish

M. · · 6 min read

Introducing Self-Tuning Loop: capture implicit feedback from human edit diffs, analyze patterns periodically, and auto-evolve prompt guidelines. Includes academic gap analysis (DSPy, TextGrad, POHF).

If you use AI to write, this routine is familiar.

1. Ask AI for a draft → 2. Read the result → 3. Fix the tone, delete unnecessary sentences, reorder paragraphs → 4. Publish → 5. Repeat tomorrow from step 1

The judgments embedded in step 3 — “this tone is too stiff,” “the intro is too long,” “show the numbers first” — are exactly the learning signals AI needs to adapt to you.

But you throw them away every single time.


The Structural Problem with AI Writing Today

The Limits of Custom Instructions

ChatGPT’s memory, Claude’s Custom Style, Notion AI’s Voice settings. Most AI tools offer explicit configuration as the way to reflect user style.

Tool”Learning” MethodReality
ChatGPT MemoryAuto-extract insights from chatStores facts (job, preferences). Not writing style
Claude Custom StyleUpload sample textOne-time analysis. No evolution during use
Jasper Brand VoiceUpload existing content → profileStatic after initial setup
GrammarlyTrack suggestion accept/reject patternsLimited to sentence-level suggestions
Notion AIWorkspace context awarenessDocument reference, not learning

See the pattern? All of them are initial setup → static forever. None actually improve with use.

Grammarly comes closest, but even it only tracks “accept/reject of sentence-level suggestions” — not the diff between a full draft and the final version.

Why No One Does This

It’s not technically impossible. Two reasons:

First, capturing diffs requires both the “draft” and “final” to exist in the same system. If you get a draft from ChatGPT and edit it in Google Docs, the two systems don’t know about each other. The draft lives in ChatGPT, the final in Docs — the connection is broken.

Second, even with diffs, there’s no pipeline to turn them into “learning.” Manually tweaking prompts is currently the only method.


The Value of Wasted Signals

I ran an experiment. Over two weeks, I collected the diffs between AI-generated LinkedIn drafts and the actually published versions, then had an LLM analyze them all at once.

Results:

  • Changed question-style openings to declarative statements — 7 times
  • Expanded hashtags from 3 → 5 — consistent pattern
  • Changed formal endings to casual tone — 11 out of 14 drafts
  • Added English terms in parentheses after technical jargon — every time

These patterns were never explicitly requested. I never wrote “use declarative openings” in any Custom Instructions. I just kept editing. But when the diffs were collected, clear preferences emerged.

This is implicit feedback — learning signals embedded in the editing behavior itself, without the user being conscious of it.


Self-Tuning Loop: A System Where Prompts Evolve

The solution is structurally simple.

graph TD
    G["1. Generate — Create draft using current guidelines"]
    C["2. Capture — Store diff between draft and final"]
    A["3. Analyze — Extract repeating patterns from N diffs"]
    E["4. Evolve — Auto-update guidelines"]

    G --> C
    C --> A
    A --> E
    E -->|"next draft gets closer to what you want"| G

Self-Tuning Loop. A 4-step cycle.

Step 1: Generate

AI creates a draft referencing current guidelines. Initially, these can be just a few lines.

## Writing Guidelines v1
- Tone: Professional but not stiff
- Length: ~1,500 characters
- Structure: Lead with the point, background later

Step 2: Capture

When the user edits the draft into a final version, both are stored. The key is extracting the diff.

[Draft] "The advancement of AI technology is bringing great changes to our society."
[Final] "Enterprise adoption rose 34% within 3 months of GPT-4o's launch."
[Diff]  Abstract statement → concrete data + proper nouns

Step 3: Analyze

After a period (7 days, 10 items, etc.) of accumulated diffs, request pattern analysis from an LLM.

## Analysis Prompt
Below are the diffs from the last 10 AI drafts vs. user final versions.
Extract repeating editing patterns.
For each pattern:
- Pattern description (what changes and how)
- Frequency (X out of 10)
- Inferred reason (why this change is made)

Step 4: Evolve

Apply extracted patterns to the guidelines. With a safety mechanism:

ClassificationCriteriaAction
Safe70%+ frequency, style/tone relatedAuto-apply
RiskyBelow 50% frequency, or structural changesSuggest only (human approves)

Auto-applying only Safe changes makes guidelines gradually converge toward the user’s preferences.

## Writing Guidelines v3 (auto-evolved)
- Tone: Professional but not stiff
- Length: ~1,500 characters
- Structure: Lead with the point, background later
+ - Openings: Start with specific data or proper nouns (avoid abstractions)
+ - Ending style: Use casual tone (avoid formal endings)
+ - Technical terms: Add English in parentheses on first mention
+ - Hashtags: Use 5 (3 is too few)

That’s all it takes. No fine-tuning, no ML infrastructure — just evolving prompt text makes the next draft better.


Relationship to Academic Research

Automatic prompt optimization is an active research area. Here’s how Self-Tuning Loop differs from existing work.

Existing Approaches: Metric-Based Optimization

ResearchCore MechanismFeedback Signal
APE (Zhou et al., 2023)LLM generates instruction candidates → evaluate → select bestTask accuracy
OPRO (DeepMind, 2023)LLM as optimizer, referencing score historyBenchmark scores
DSPy (Stanford, 2024)Declare prompts as modules, compiler auto-optimizesDeveloper-defined metrics
EvoPrompt (2024)Genetic algorithm evolves prompt populationDev set accuracy

All optimize automatic benchmark scores. “Accuracy 85% → 91%.” Irrelevant to individual style preferences.

Closest Research: TextGrad and POHF

TextGrad (Stanford, 2024, published in Nature) has the most similar structure.

graph LR
    subgraph TextGrad
        O1["AI output"] --> C1["LLM generates critique"]
        C1 --> U1["Use critique as gradient"]
        U1 --> O1
    end

    subgraph Self-Tuning Loop
        O2["AI draft"] --> C2["Human edits"]
        C2 --> U2["Use edit diff as gradient"]
        U2 --> O2
    end

One difference. In TextGrad, the gradient comes from AI’s automatic critique. In Self-Tuning Loop, it comes from actual human edits.

POHF (Lin et al., ICML 2024) is the closest human-feedback research. It shows users A/B outputs and asks “which is better?” But requiring explicit comparison judgments every time creates high user friction.

The Gap: What No One Has Done

From a February 2025 comprehensive survey (Systematic Survey of Automatic Prompt Optimization Techniques, arxiv:2502.16923):

Feedback TypeUsed ByNotes
Numeric score (accuracy etc.)APE, OPRO, DSPy, EvoPrompt, PromptBreederMost common
LLM synthetic critiqueTextGrad, Self-Refine, Constitutional AIAI evaluating AI
Human A/B preferencePOHFExplicit feedback (high friction)
Human edit diffsNo existing researchWhite space

A system that uses human edit diffs as implicit feedback to automatically evolve prompts does not exist — academically or commercially.


Why This Beats Fine-Tuning

“Why not just fine-tune the model?” — A reasonable question.

CriteriaFine-tuningSelf-Tuning Loop
CostGPU hours + data pipeline$0 (just prompt text changes)
Data neededHundreds to thousands of pairsFirst improvement from as few as 3 diffs
InterpretabilityWeight changes — black boxGuideline change history — human-readable
RollbackRestore checkpoint (complex)Delete one line
Model independenceLocked to specific modelWorks with any LLM
Time to startDays building pipelineImmediate

Fine-tuning suits large-scale pattern changes. But for learning that “this person writes short intros” or “uses 5 hashtags” — it’s overkill.

Self-Tuning Loop evolves the prompt, not the model. Switch models and the guidelines still work. That’s its biggest advantage.


Where Can This Be Applied?

This loop applies to any repetitive task where AI creates a draft and a human refines it.

DomainGenerateCapture (diff)AnalyzeEvolve
EmailAI drafts replySent mail vs draft”Less formal,” “remove greeting”Update tone guide
BlogAI drafts postPublished vs draft”Shorter intro,” “data first”Patch writing guide
ContractsAI drafts clausesLawyer’s edits”Always add indemnity clause”Evolve clause templates
Code reviewAI generates commentsAccept/ignore patterns”Only accept naming, ignore structure”Adjust review criteria
ProposalsAI drafts slidesFinal version diff”Always add data backing”Evolve proposal prompt
News curationAI selects articles👍/👎 feedback”Prefers consulting firm news”Patch selection criteria

See the common structure? The 4-step loop is identical regardless of domain.


Personal RLHF

In the Ralph Loop series, we covered the structure of RLHF (Reinforcement Learning from Human Feedback) — the technology OpenAI invested billions of dollars into for ChatGPT.

RLHF’s core goal is “reflecting human preferences into the model.” Self-Tuning Loop achieves the same goal at a completely different layer.

DimensionRLHFSelf-Tuning Loop
LayerModel weightsPrompt text
CostGPU clusters + labelers$0
Feedback collectionProfessional labelers do A/B comparisonUser edits as usual (zero friction)
PersonalizationAverage across all usersIndividual unique preferences
Update frequencyModel release (months)Weekly, automatic
InterpretabilityWeight changes (unreadable)Natural language guidelines (readable)
RollbackRestore checkpointDelete one line

If RLHF bakes the average preference of all users into model weights, Self-Tuning Loop accumulates one person’s unique judgment into a prompt.


Next: The Actual System

Concepts end here. The next part dissects a system where this loop is actually running in production.

  • 35 sources auto-collected, AI-curated, with a weekly feedback loop for auto-improvement
  • Send a photo from Telegram and a blog draft is auto-generated
  • LinkedIn drafts created daily, editing patterns reflected in tone guides
  • All of this running at $0 additional cost

Cron, Telegram, Claude — the architecture of a self-improving operations system built with three tools.


This Series

  • Part 1: The Wasted Learning Signal (this post)
  • Part 2: Cron + Telegram + Claude System Anatomy (coming soon)
  • Part 3: Build Your Own Self-Tuning Loop (coming soon)

References

  • Zhou et al., “Large Language Models Are Human-Level Prompt Engineers” (APE), NeurIPS 2023
  • Yang et al., “Large Language Models as Optimizers” (OPRO), ICLR 2024
  • Khattab et al., “DSPy: Compiling Declarative Language Model Calls” Stanford, 2024
  • Yuksekgonul et al., “TextGrad: Automatic Differentiation via Text” Nature, 2024
  • Lin et al., “Prompt Optimization with Human Feedback” (POHF), ICML 2024
  • Fernando et al., “PromptBreeder: Self-Referential Self-Improvement” DeepMind, ICML 2024
  • “A Systematic Survey of Automatic Prompt Optimization Techniques” arxiv:2502.16923, 2025
Share

Related Posts