Readings

Reading List - September 5, 2025

Is this a newsletter?

Sam Krystal

08 Sep 2025 — 2 min read

Read what we've read this week - but keep it private 😉

Inside vLLM: Anatomy of a High-Throughput LLM Inference System
By: Aleksa Gordić

Truly elegant description of how large language model inference works - super accessible, for such a technical topic!

Go Deep on vLLMs!

A Language Model Built for the Public Good 🇨🇭
By: ETH Zurich

Huge open model release from ETH Zurich and the Swiss AI Initiative, with fully open data to boot!

Good for Switzerland!

1965 Cryptanalysis Training Workbook Released by the NSA
By: Bruce Schneier

We love cryptography history, great to see some original methods released to the public!

Cool Cryptography!

Your Face Tomorrow
By: Michael W. Clune

Facial recognition tech raises real concerns about privacy and surveillance, Clune forces us to confront how human interactions already discipline and reshape us through the faces we show to others!

Spooky!

UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
By: Huang Et al.

A little technical, UltraMemV2 is an interesting memory-layer redesign that dramatically improves long-context learning without incurring high memory access costs - helpful for MoE!

Learn MoE!

MoE Inference Economics from First Principles
By: Piotr Mazurek and Eric Schreiber

The piece breaks down how running MoE models changes the economics of inference compared to traditional “dense” models. MoEs can be much cheaper per token - but only if deployed at large scale with smart batching and communication strategies.

Learn Even MoE About MoE!

Predicting the Order of Upcoming Tokens Improves Language Modeling
By: Zayd M.K. Zuhri, Erland Hillman Fuadi, and Alham Fikri Aji Mbzuai

We can’t predict future tokens (yet), but these researchers figured out a new way of ordering tokens to improve token prediction speed with a novel embedding layer architecture!

Cool Method!

Last week, Swyx sparked a Twitter debate about whether AI model evals were necessary - below are two articles published in favor and against evals.

In Defense of AI Evals, for Everyone
By: Shreya Shankar

Shreya makes the case that the absence of evals from companies that use AI from big model providers degrades the quality of their own AI products.

Evals are Important!

Why evals haven't landed (yet) lessons from building them for Copilot
By: Julia Neagu

Julia makes the case that, in her experience, evals do not noticeably improve the quality of AI models proportionate to the production bottlenecks they create.

Are Evals really Important?

Tweet of the Week

Snarky, but precise. Salute!

Amusing how 99% of people trying to explain LLMs forget that they don't generate the next token, they generate a probability distribution over the entire vocabulary space that the end application is free to sample from

You are very often not presented with the Most Likely Token https://t.co/7RMM4KkeDo
— emozilla (@theemozilla) September 5, 2025