fireworks-design
An open-source Claude Code workflow that replicates ClaudeDesign — fan out many distinct design directions, judge them like a panel, synthesize the best, and adversarially refine to a single world-class frontend page.
One-sentence pitch: Stop rolling the dice on a single model output.
fireworks-designexplores 6–8 independent aesthetics in parallel, scores each across 6 design dimensions, grafts the winners together, then critique-fix-loops the result until it's shippable.
📦 Install
In Claude Code, just say:
Install fireworks-design into this project — repo is
yizhiyanhua-ai/fireworks-design.
Claude drops the workflow into .claude/workflows/ and you're done.
Prefer the shell?
mkdir -p .claude/workflows
curl -fsSL -o .claude/workflows/fireworks-design.js \
https://raw.githubusercontent.com/yizhiyanhua-ai/fireworks-design/main/fireworks-design.js🚀 Usage
That's the whole setup. Now just talk to Claude in plain language — say what you want and where to save it:
Use fireworks-design to build a landing page for a coffee shop. Save it to
~/design/coffee.
Claude runs the workflow in the background (watch it live with /workflows) and hands you a finished final.html. Tweak it conversationally whenever:
…explore 6 directions, refine 3 rounds, and use brand color
#7c3aed.
What you get: a single self-contained final.html (plus the explored draft-*.html directions) and a summary of which aesthetic won and why.
Prefer to call it directly?
Workflow({
name: "fireworks-design",
args: { prompt: "...", outputDir: "/abs/path", variants: 6, refineRounds: 2 }
})prompt and outputDir (absolute path) are required; variants (3–8), refineRounds, brand, and lenses are optional. Full reference in the Arguments section below.
✨ Featured outputs (效果解读)
14 real pages across totally different domains — all live on GitHub Pages · full table + deep-dives. Four highlights:
🎬 LUMIÈRE — movie rating · Dark Premium
|
🎵 NOVA · AURORA — album · Bold Editorial
|
🎨 OBJECT & ECHO — studio · Bold Editorial
|
|
| Page | Winner & why it fits | Signature moment | |
|---|---|---|---|
| 🎬 | LUMIÈRE — movie rating | Dark Premium — theatrical immersion beat editorial; antique-gold rationed as "prestige currency" (only ratings/CTA/top ranks) | one-shot gold projector-beam sweeps the 9.2 score |
| 🎵 | NOVA · AURORA — album | Bold Editorial — oversized Didone wordmark, midnight-violet-teal 60/30/10 | generative cover + 32-bar visualizer + play-state changes the whole room |
| 🎨 | OBJECT & ECHO — studio | Bold Editorial — gallery-zine, kinetic grotesque "object" vs ghosted italic "echo" | spatial afterimage echo behind the hero wordmark |
| AZORES — travel | Bold Editorial — photography-as-product, NatGeo-meets-Cereal | interactive islands map with breathing halo + cross-fade detail panel |
Winner diversity proves the point: across 14 briefs, Bold Editorial won ×8, Dark Premium ×3 (movie/restaurant/ecommerce), Swiss Minimal ×2 (fitness/edtech), Editorial ×1 (nonprofit). Different briefs crown different winners — that's why we judge instead of generating once. Full 效果解读 (winning rationale, signature moments, real bugs the refine/polish pass caught) in
examples/README.md.
🖼️ Gallery — all 14 live examples
![]() LUMIÈRE |
![]() MAISON NOIR |
![]() OBJECT & ECHO |
![]() NOVA · AURORA |
![]() PULSE |
![]() AZORES |
![]() LUMEN |
![]() ECHOES OF THE VOID |
![]() Brightwater |
![]() AURA ONE |
![]() BUILD/2026 |
![]() vector |
![]() tideline |
![]() Lin Hua |
🧠 How it works — six phases
① Brief — distill a design system
One agent turns your prompt (+ optional brand) into a shared creative brief: product framing, audience, required sections, and concrete design tokens (font pairings, palette hexes, mood, references). These tokens are injected into every subsequent agent, so the whole pipeline stays on-brand.
② Diverge — wide exploration (the quality core)
Each agent commits fully to one aesthetic and produces a complete, self-contained HTML file. Each generator internally plans → self-critiques → produces, rather than dumping a first draft.
③ Judge — panel scoring
Every direction × every design dimension, scored 1–10 by independent critics (~36 critiques in parallel for 6 directions). Each verdict also returns its single highest-leverage fix, which feeds the next phase.
④ Synthesize — graft the best
Reads the Top-3 directions' source, uses the strongest as the base, folds in the best elements of the others, and fixes every issue the judges flagged. The output must clearly beat any single direction.
⑤ Refine — adversarial polishing
A ruthless reviewer returns prioritized issues (severity-tagged); a fixer applies them surgically. Looped refineRounds times (default 2). This is ClaudeDesign's "fine-grained controls," engineered.
⑥ Polish — ship-ready QA
A final gate checks and fixes: responsiveness (375/768/1280+), all interactive states, prefers-reduced-motion, semantic HTML + ARIA, WCAG AA contrast, no console errors, no leftover placeholders — then writes final.html.
✨ Why this exists
Even experienced designers ration exploration — there's rarely time to prototype a dozen directions, so you settle for two. From the Claude Design announcement:
"Even experienced designers have to ration exploration — there's rarely time to prototype a dozen directions, so you limit yourself to a few."
A single LLM generation is one draw from a distribution. Its taste, mood, and prompt interpretation are locked into that one version. fireworks-design turns that variance into a quality floor by:
- Exploring widely — N parallel agents, each committed to a distinct aesthetic.
- Judging independently — a panel scores every direction across separate design dimensions.
- Synthesizing — taking the winner as the skeleton and grafting the best of the runners-up.
- Refining adversarially — critique → fix, looped until it clears the bar.
🧩 Built on Claude Code's Dynamic Workflow
fireworks-design is not a prompt, a library, or a hosted service. It's a single JavaScript file that the Claude Code Workflow runtime executes as a Dynamic Workflow — deterministic code that spawns model agents, runs them in parallel, and composes their schema-validated outputs. Control flow belongs to the script (loops, fan-out, barriers), not to the model, so every run is reproducible and resumable. Bring it your model; it brings the orchestration.
The whole pipeline is ~270 lines with a shape you can read at a glance:
// fireworks-design.js — condensed to its bones
phase('Brief'); const brief = await agentRetry(briefPrompt, { schema: BRIEF_SCHEMA })
phase('Diverge'); const variants = await parallel(LENSES.map(l => () => // fan-out: N directions
agent(generate(l), { schema: VARIANT_SCHEMA }))).filter(Boolean)
phase('Judge'); const verdicts = await parallel(variants.flatMap(v => // panel: N × 6 dimensions
DIMS.map(d => () => agent(judge(v, d), { schema: SCORE_SCHEMA }))))
phase('Synthesize'); await agentRetry(synthesize(top(variants, verdicts))) // graft the best
for (let i = 0; i < REFINE_ROUNDS; i++) { // critique ↔ fix loop
const issues = await agentRetry(critique, { schema: CRITIQUE_SCHEMA })
await agentRetry(fix(issues))
}
phase('Polish'); await agentRetry(polish) // ship-ready QA
return { outputPath: FINAL_PATH, winner, ranking, summary: polish }What makes this a dynamic workflow rather than a script that calls a model:
- Deterministic orchestration —
parallel()is a real barrier,phase()groups live progress, theforloop is yours. The model never decides what runs next. - Schema-validated agents — every
agent()returns typed JSON, so the pipeline composes data, not prose. No regex-parsing model output. - Resumable — edit a prompt and re-run; the unchanged prefix replays from cache (
resumeFromRunId).
Full file: fireworks-design.js.
🔬 Under the hood — orchestration, not iteration
This isn't a "generate, then ask the same model to improve it" loop. It's a deterministic multi-agent pipeline that composes several modern inference-time techniques into one reproducible run — built on the Claude Code Workflow runtime (parallel() / pipeline() / agent() primitives, schema-validated returns, resumable execution).
| Technique | Where it lives | Why it matters |
|---|---|---|
| Best-of-N + self-consistency | Diverge → Judge | Generate N directions, keep the one with the highest average across independent scores — quality rises with N, not with luck. |
| LLM-as-judge panel | Judge | 6 dimensions × N directions, scored by critics that never see each other's answers. Kills single-judge bias. |
| Diverse generation (diverse beams) | Diverge lenses | Each agent commits to a distinct aesthetic, so the N samples cover the design space instead of clustering on one idea. |
| Critique-and-revise (Self-Refine / Reflexion) | Refine | A dedicated critic emits severity-tagged issues; a fixer applies them surgically; loop until the bar clears. |
| Synthesis / grafting | Synthesize | The winner is the skeleton, runners-up donate their best parts — not a merge of averages. |
| Structured tool-use | every agent | Returns are schema-validated JSON, composed deterministically — no brittle regex parsing of model prose. |
| Context isolation | per-agent | Each agent runs in its own context with only the tokens it needs; the shared brief is injected (cacheable), not re-read. |
| Resumable execution | runtime | resumeFromRunId replays cached results for the unchanged prefix — edit a prompt mid-run without paying to re-run everything. |
| Model-agnostic | agent() omits model |
Inherits the session model. Swap Opus ↔ Sonnet ↔ anything; the pipeline is identical. |
| Budget- & rate-limit-aware | budget global, retries |
Fan-out can scale to a token budget; agents auto-retry on 429s instead of crashing the run. |
The net effect: you're not hoping the model has a good day. You're engineering a quality floor out of sampling, judging, and refinement — the same inference-time-scaling ideas behind best-of-N and self-consistency, applied to design instead of math.
⚙️ Arguments
| Argument | Required | Default | Description |
|---|---|---|---|
prompt |
✅ | — | What to build (natural language). |
outputDir |
✅ | — | Absolute path to write files. |
variants |
❌ | 6 |
Number of directions (3–8). |
refineRounds |
❌ | 2 |
Critique→fix loops. |
brand |
❌ | — | Brand notes / existing design system / references. |
lenses |
❌ | all | Restrict to specific aesthetics by key. |
🎚️ Customization
Aesthetic lenses (LENSES)
The distinct directions explored in phase ②. Edit the LENSES array to add your own styles:
| Key | Style |
|---|---|
editorial |
Bold Editorial — high-contrast, expressive display type, magazine grid |
minimal |
Swiss Minimal — grid-obsessed, restrained, Inter/Geist |
gradient |
Vibrant Gradient — mesh gradients, glassmorphism, neon accents |
dark-premium |
Dark Premium — near-black canvas, gold/violet accent, cinematic |
organic |
Soft Organic — rounded forms, warm palette, approachable motion |
brutalist |
Neo-Brutalist — raw borders, hard shadows, mono, high-energy |
glass |
Glass Aurora — translucent layers, aurora blobs, backdrop blur |
mono-tech |
Mono Tech — monospace accents, terminal/data-forward |
Judge dimensions (DIMS)
What the panel scores on: hierarchy · typography · color/contrast · motion · engineering craft · delight/originality.
🤖 Model choices
Every agent() call omits the model parameter, so all subagents inherit the current session model. Run it on Opus for top quality, on Sonnet for speed, or on any model your harness exposes — no code changes needed.
💼 Example cases
A few prompts to try (all outputDir set to an absolute path):
| Type | Prompt | Args |
|---|---|---|
| SaaS landing | Pricing + landing for an open-source vector DB; speed benchmarks, code hero, comparison table. | variants: 6 |
| OSS homepage | MIT CLI tool; install command, 3 feature cards, terminal demo. | variants: 4, brand: "mono, #10b981" |
| Portfolio | Designer one-pager; asymmetric editorial, large type, works grid. | variants: 8, refineRounds: 3 |
| Event | One-day AI conference; countdown hero, speakers, schedule, register CTA. | variants: 6, brand: "#ea580c" |
More quick recipes
| Goal | Suggested args |
|---|---|
| Fast first draft | variants: 4, refineRounds: 1 |
| Maximum quality | variants: 8, refineRounds: 3 |
| Brand-locked | pass brand: with hexes + fonts |
| Specific aesthetics only | lenses: ["editorial","dark-premium"] |
🔗 Mapping to ClaudeDesign
| ClaudeDesign principle | This workflow |
|---|---|
| Wide exploration (a dozen directions) | ② Diverge — 6–8 parallel aesthetics |
| Strongest vision model judges | ③ Judge — 6 dimensions × N directions |
| Distill & merge the best | ④ Synthesize — Top-3 graft |
| Fine-grained iterative controls | ⑤ Refine — critique↔fix loop |
| Design system throughout | ① Brief — tokens injected everywhere |
| Export deliverable HTML | ⑥ Polish — QA → final.html |
💰 Cost & considerations
- 6 variants × full pipeline ≈ 40+ agent calls. This is deliberate — quality is the point.
- Token cost scales with
variants,refineRounds, and page complexity. - Workflow agents read full HTML files for judging/synthesis, so very long pages cost more.
- Want budget-aware scaling? Tie
VARIANT_COUNTtobudget.total(the workflow exposesbudget).
❓ FAQ
Do I need a specific model? No. It inherits your session model. A strong model (Opus-class) gives the best results.
Can I use just one direction? Then you don't need this workflow — run a normal frontend-design pass instead. The value is the parallel exploration + judging.
Where do the draft files go? In your outputDir. They're not committed (see .gitignore); keep or delete them freely.
Can I plug in my design system? Yes — pass it via brand. Brief will fold your tokens into every agent.
🤝 Contributing
Contributions welcome — new aesthetic lenses, judge dimensions, or smarter synthesis. Open an issue first to discuss scope. See CONTRIBUTING.md.
📄 License
MIT © yizhiyanhua-ai
Built with Claude Code · Quality is the only thing that matters.









