AI industry builder / agent workflows

Brad Zhang

Longform AI writing, forum-grade X thinking, and open-source proof for founders, operators, and early AI teams.

X / @teach_fireworks

Topic dossier

Inference

Latency, serving, evaluation, and the real operating cost of useful model behavior.

What becomes visible, how quickly, and under what confidence threshold when models leave the lab and enter service?

Inference is framed here as a publishing problem. Latency, evaluation, and serving all become clearer once visibility and responsibility replace abstract infra vanity.

Abstract editorial cover for fireworks-tech-graph

Lead essay

Inference as a Publishing Problem

Every production model is also a publishing decision about what deserves to become visible fast.

Serving, latency, and evaluation become clearer when treated as editorial constraints instead of abstract infra metrics.

X Thread/February 2026/9 min read

Project evidence

Pull the topic back to execution.

Diagram-system spread for fireworks-tech-graph

Diagramming as editorial infrastructure

fireworks-tech-graph

A project that treats architecture diagrams not as cosmetic add-ons, but as durable explanatory surfaces for AI systems work.

  • Built for architecture, flow, and agent-memory diagrams
  • Opinionated visual system rather than generic chart output
  • Useful wherever systems explanation must survive beyond a meeting
Recovery spread for fireworks-sessions-saver

Session continuity without continuity theater

fireworks-sessions-saver

A checkpointing layer for coding sessions, built to survive interruption, context loss, and handoff without pretending memory never breaks.

  • Checkpoint model for coding CLI tools
  • Designed around interruption, recovery, and handoff
  • Makes context survivable instead of merely longer