Generate a manual from screenshots

A common AI-agent workflow: walk a feature, capture screenshots at each step, attach annotations + redactions, ship the bundle as a how-to document. With annot-mcp@0.2.0’s encode option, the bytes shrink ~3–5× per screenshot — practical for 30–50 slide manuals that would otherwise be too heavy.

For the kept-in-sync flavour of this same flow — where the manual stays current with the live UI because a Playwright tour re-captures every screen on every PR — see the living-product-docs recipe.

Encode preset cheat sheet

For manual generation, the recommended defaults:

"encode": {
  "format": "smart",            // PNG-8 for UI, JPEG for photos
  "saveSizePreset": "standard"  // 1920px max (mobile-friendly + print-friendly)
}

For mobile-first manuals (smaller files acceptable):

"encode": {
  "format": "smart",
  "saveSizePreset": "light"     // 1280px max — ~30–40% additional savings
}

For archival / reference manuals where lossless matters:

"encode": {
  "format": "png",              // PNG-32 only, no PNG-8 quantization
  "saveSizePreset": "highQuality"  // 2560px max, preserves UI fidelity
}

Agent transcript: 5-step onboarding manual

USER: Walk the signup → first-project onboarding flow on
      staging.example.com. Capture each step with a callout
      explaining the action, and produce a manual.

AGENT → annot.annot_annotate_url({
  url: "https://staging.example.com/signup",
  annotations: [
    { type: "rect", locator: "input[name=email]", intent: "info" },
    { type: "callout",
      atLocator: "h1",
      targetLocator: "input[name=email]",
      content: "Step 1: Enter your email" }
  ],
  encode: { format: "smart", saveSizePreset: "standard" }
})
        ← 180 KB PNG-8 (~60% smaller than raw PNG-32)

[ playwright.browser_navigate + fill + click to advance ]

AGENT → annot.annot_annotate_url({
  url: "https://staging.example.com/signup/verify",
  annotations: [
    { type: "rect", locator: "[data-testid=otp-input]", intent: "info" },
    { type: "callout",
      at: { x: 40, y: 40 },
      targetLocator: "[data-testid=otp-input]",
      content: "Step 2: Enter the code from your inbox" }
  ],
  encode: { format: "smart", saveSizePreset: "standard" }
})
        ← 95 KB PNG-8 (sparse form, even smaller)

[ ... 3 more steps ... ]

After 5 annotated steps:

Encoding	Total bundle size
PNG-32 raw (default 0.1.x)	~2.5 MB
`encode: smart + standard`	~750 KB

That’s the difference between “this manual is too heavy for the LMS” and “this manual loads instantly on a phone.”

Composing with redaction

For manuals that walk through authenticated flows, redact sensitive fields per-step:

AGENT → annot.annot_redact_url({
  url: "https://staging.example.com/profile",
  regions: [
    { locator: "[data-testid=ssn]", style: "blur" },
    { locator: "[data-testid=card-last4]", style: "solid", color: "#000" }
  ],
  encode: { format: "smart", saveSizePreset: "standard" }
})
        ← redacted + encoded in one MCP call

The redactions burn destructively into the bitmap before the encode pipeline, so the encoded output (whether PNG-8 or JPEG) carries no recoverable original pixels.

Why the smart heuristic helps here

Manual screenshots are typically:

UI-heavy — solid backgrounds, sharp text, limited palette. Smart picks PNG-8 → ~60% file-size reduction.
Occasionally photo-mixed (e.g. a step shows a product carousel). Smart’s photo-heavy fallback picks JPEG → ~80% reduction on those specific images, no quality loss the reader notices.

The agent doesn’t have to think about which encoding to pick per screenshot — format: "smart" handles the decision tree based on actual pixel content.

License posture

@ingcreators/annot-annotator is Apache-2.0 end to end. The bundled Median Cut quantizer powering the PNG-8 path is pure TypeScript; no GPL inclusion, no separate install step.