Topic desk

Inference

What becomes visible, how quickly, and under what confidence threshold when models leave the lab and enter service?

Inference is framed here as a publishing problem. Latency, evaluation, and serving become clearer once visibility and responsibility replace abstract infrastructure vanity.

Live source unreachable; showing the most recent snapshot.

Longform

All longform

OpenAI Proactively Discloses Long Range Agent Tasks Out of Bounds: The Safe Steering Behind PR # 287

A single tool approval begins to lose global semantics when the model can be tried for hours on end. On July 20, 2026, OpenAI unveiled an on-premises pause. A generic model that...

Jul 21, 2026 · 12 min read

Everyone is interested in the fact that GPT 5.6 sol output speed is 12 times faster than Kimi K3.

Everyone is interested in the fact that GPT 5.6 sol output speed is 12 times faster than Kimi K3. Let me list the relevant materials to read together: Group 1: Where...

Jul 20, 2026 · 2 min read

The output speed of GPT 5.6 sol is approximately 12 times that of Kimi K3! Everyone feels that K3 is much

The output speed of GPT 5.6 sol is approximately 12 times that of Kimi K3! Everyone feels that K3 is much slower than GPT 5.6 sol. This is indeed the case in my experience...

Jul 20, 2026 · 2 min read

The Agent's memory should not be stuffed into the context window

When many people discuss coding agents such as Codex and Claude Code, their first reaction is model capabilities: whether they can read code, correct bugs, and...

Jul 17, 2026 · 8 min read

I connected Codex and Cursor: an automatic memory bus allows the two Agent software to communicate freely

Recently I am doing a very specific thing: using the source code of pi agent and a set of small exercises. I hope that Codex will be responsible for the reading history, combing...

Jul 14, 2026 · 11 min read

I tested GPT-5.5 and GPT-5.6 SOL using 5 real front-end scenarios: SOL won 4 games but was 13% slower

On July 10, 2026, I put GPT-5.5 and GPT-5.6 SOL into the same Codex execution chain and performed five sets of front-end tasks in succession. Five scenes, ten pages, and...

Jul 11, 2026 · 12 min read

Anthropic’s 85-minute Fable 5 workshop explains how to implement the next generation AI Agent

Video entrance Opening keynote: https://www.youtube.com/watch?v=GMIWm5y90xA The capability curve: https://www.youtube.com/watch?v=tP4MGcJ80Y0 How to get to production...

Jul 8, 2026 · 18 min read

After pre-training, what does the next training paradigm for AI look like?

Source Original video: What does the next training paradigm look like? Channel: Dwarkesh Patel Link: https://www.youtube.com/watch?v=20p5-kQXF_Q Start In this video, Dwarkesh Patel

Jul 2, 2026 · 15 min read

If you want to truly understand how LLM is trained, CS336 is the most worthy set of open courses to learn the system.

If you want to truly understand how LLM is trained, CS336 is the most worthy set of open courses to learn the system. Building a modern LLM engineering stack from scratch covers...

Jun 18, 2026 · 2 min read

Let's say you're the head of technology at a technology company. Last year, you used GraphRAG to build an

Let's say you're the head of technology at a technology company. Last year, you used GraphRAG to build an internal knowledge base and used 20,000 contracts to do PoC. The...

Jun 17, 2026 · 17 min read

When dynamic workflow and front-end design are combined, there will be unexpected effects!

When dynamic workflow and front-end design are combined, there will be unexpected effects! Share an open source project: fireworks-design This project is easily misunderstood as...

Jun 16, 2026 · 5 min read

Agent Loop is popular🔥: Agent goes from Demo to production with a Loop in between

There has been a heated debate in overseas AI circles these days. Matt Van Horn sorted out this debate in “WTF Is a Loop? Peter Steinberger vs. Boris Cherny”: You shouldn’t be p...

Jun 9, 2026 · 10 min read

Five Traps of Agent Harness 1. Self-evaluation is a trap. Use an adversarial evaluator....

Five Traps of Agent Harness 1. Self-evaluation is a trap. Use an adversarial evaluator. Self-evaluation is a trap. Use an adversarial evaluator. Many teams will do this: Agent g...

Jun 8, 2026 · 3 min read

Agentic Workflow is the most common enterprise scenario. Today I read a sharing about h...

Agentic Workflow is the most common enterprise scenario. Today I read a sharing about how to define a stable and reliable Agentic Workflow within the enterprise. To summarize br...

Jun 5, 2026 · 2 min read

From Device Plugin to DRA, I systematically reviewed the evolution route of Kubernetes device resource management.

From Device Plugin to DRA, I systematically reviewed the evolution route of Kubernetes device resource management. Hardcore post from AI Infra! Collect it first and then...

Jun 3, 2026 · 3 min read

Asynchronous Agent Era: From Code Assistant to Software Factory

I have reorganized the content of this podcast of Latent Space. It is worth taking a look at the in-depth discussion and sharing related to asynchronous Agent. I also...

May 31, 2026 · 22 min read

From the evolution of synchronous Agent to asynchronous Agent, I systematically sorted out the relevant content!

From the evolution of synchronous Agent to asynchronous Agent, I systematically sorted out the relevant content! The watershed between Async Agent and ordinary Agent is not "whe...

May 31, 2026 · 4 min read

My exploration and practice sharing in the past two months: Minimum engineering closed loop for long task Agent

A picture shared by engineer Claude accurately describes the three most common problems of long-task agents: the inability to keep track of the state, the inability...

May 25, 2026 · 27 min read

Ten times better, not ten times harder - the logic of growth in the AI era

There is a question that has always made me feel a little strange: many people regard AI as a speed-increasing tool. It used to take half an hour to write an email, now...

May 23, 2026 · 7 min read

Visualize that with the rapid development of agent cli and harness, the interaction and logic core will soon

Visualize that with the rapid development of agent cli and harness, the interaction and logic core will soon be reconstructed by Golang, which is very suitable for...

May 19, 2026 · 1 min read

6-11: 6. Evals: LLM-as-judge + human evals Official Document | OpenAI Evaluation Best P...

6-11: 6. Evals: LLM-as-judge + human evals Official Document | OpenAI Evaluation Best Practices — Learn eval rubric, the applicable conditions for LLM-as-judge, the role of huma...

May 11, 2026 · 4 min read

Based on the recommendation of the boss, I combed through a list of high-quality AI Engineer learning materials, which is worth collecting and learning!

Based on the recommendation of the boss, I combed through a list of high-quality AI Engineer learning materials, which is worth collecting and learning! Too dry, too...

May 11, 2026 · 3 min read

Using Codex goals well can truly unleash the potential of the GPT 5 model! OpenAI's rec...

Using Codex goals well can truly unleash the potential of the GPT 5 model! OpenAI's recently disclosed directions for Codex are obviously going in the following directions: - Lo...

May 1, 2026 · 6 min read

Many people underestimate a problem: the real danger of Agent is not the occasional wrong answer, but "systematic drift"

Many people discuss Agent, and their focus still remains on: - Is the answer correct this time - Is the tool adjusted accurately this time - Is the task completed this...

Apr 30, 2026 · 9 min read

RLMs are the new inference models. Inference models are the first clear demonstration t...

RLMs are the new inference models. Inference models are the first clear demonstration that language model capabilities can be extended with test-time computation. Recursive lang...

Apr 21, 2026 · 2 min read

Lin Junyang’s latest must-read masterpiece! AI has shifted from "reasoning thinking" to "agentic thinking".

Lin Junyang’s latest must-read masterpiece! AI has shifted from "reasoning thinking" to "agentic thinking". 🏀The frontier of competition is shifting from "better...

Mar 26, 2026 · 3 min read

What will future travel and transportation look like based on AI? Such as Cybercab, Robovan and FSD

Musk’s blueprint for autonomous transportation, Cybercab, is a vehicle that completely eliminates the steering wheel and pedals and relies entirely on Tesla’s FSD...

Jan 7, 2026 · 12 min read

A historic moment for the world's first large model listed company. On January 8, 2026, as the world's first

A historic moment for the world's first large model listed company

Jan 7, 2026 · 17 min read

The Scientific and Commercial Apotheosis of Zhipu AI

The ascension of Zhipu AI to the status of the world's first publicly traded large-scale model corporation on January 8, 2026, represents a seminal juncture in the history of co...

Jan 7, 2026 · 10 min read

Introducing a treasure trove of open source tools for AI drawing software design drawings: next-ai-draw-io was also recommended by teacher Xiao Ke in our group just now.

Introducing a treasure trove of open source tools for AI drawing software design drawings: next-ai-draw-io was also recommended by teacher Xiao Ke in our group just now. I am ba...

Dec 30, 2025 · 1 min read

Hard-working people It seems that the days after the epidemic are passing faster and faster.

Hard-working people It seems that the days after the epidemic are passing faster and faster. I have become accustomed to reading books and writing notes while commuting. When I...

Dec 25, 2025 · 3 min read

IIya explained everyone's misunderstanding of some of his views: He believes that the current scaling route

IIya explained everyone's misunderstanding of some of his views: He believes that the current scaling route can make progress without leading to stagnation, but based on...

Nov 28, 2025 · 3 min read

This year I saw a lot of similar AI products focusing on memory or virtual companionship, which reminded me of a TV series from a few years ago.

This year I saw a lot of similar AI products focusing on memory or virtual companionship, which reminded me of a TV series from a few years ago. The general plot of "Upload to R...

Nov 20, 2025 · 2 min read

here way go: Based on the worldview of Westworld, create a web game with a similar aesthetic My requirement

here way go: Based on the worldview of Westworld, create a web game with a similar aesthetic My requirement is that when I give her a command, the screen can respond in...

Nov 19, 2025 · 1 min read

Inference as a Publishing Problem

Every production model is also a publishing decision about what deserves to become visible fast.

February 2026 · 9 min read

Field Notes

Full ledger

Live judgment for this desk is filed by date in the field-notes ledger; enter through the full ledger by month.

Jul 15, 2026

Open AI Former CTO Mira's startup-general open source model Inkling Inkling: 975B, 41B total activations

Jul 14, 2026

What's going on? GPT 5.6 sol is the second case of deleting user data! I feel that there should be a fatal

Jul 11, 2026

What is backpressure in Agent? What should be done? On the surface, the system keeps bu...

Lab proofs

All repositories

fireworks-tech-graph

Proof: Technical diagrams as durable product and distribution infrastructure.

fireworks-sessions-saver

Proof: Checkpointing for coding sessions that will inevitably be interrupted.