Reading List - September 5, 2025
Is this a newsletter?

Read what we've read this week - but keep it private 😉
Inside vLLM: Anatomy of a High-Throughput LLM Inference System
By: Aleksa Gordić
Truly elegant description of how large language model inference works - super accessible, for such a technical topic!
A Language Model Built for the Public Good 🇨🇭
By: ETH Zurich
Huge open model release from ETH Zurich and the Swiss AI Initiative, with fully open data to boot!
1965 Cryptanalysis Training Workbook Released by the NSA
By: Bruce Schneier
We love cryptography history, great to see some original methods released to the public!
Your Face Tomorrow
By: Michael W. Clune
Facial recognition tech raises real concerns about privacy and surveillance, Clune forces us to confront how human interactions already discipline and reshape us through the faces we show to others!
UltraMemV2: Memory Networks Scaling to 120B Parameters with Superior Long-Context Learning
By: Huang Et al.
A little technical, UltraMemV2 is an interesting memory-layer redesign that dramatically improves long-context learning without incurring high memory access costs - helpful for MoE!
MoE Inference Economics from First Principles
By: Piotr Mazurek and Eric Schreiber
The piece breaks down how running MoE models changes the economics of inference compared to traditional “dense” models. MoEs can be much cheaper per token - but only if deployed at large scale with smart batching and communication strategies.
Predicting the Order of Upcoming Tokens Improves Language Modeling
By: Zayd M.K. Zuhri, Erland Hillman Fuadi, and Alham Fikri Aji Mbzuai
We can’t predict future tokens (yet), but these researchers figured out a new way of ordering tokens to improve token prediction speed with a novel embedding layer architecture!
Last week, Swyx sparked a Twitter debate about whether AI model evals were necessary - below are two articles published in favor and against evals.
In Defense of AI Evals, for Everyone
By: Shreya Shankar
Shreya makes the case that the absence of evals from companies that use AI from big model providers degrades the quality of their own AI products.
Why evals haven't landed (yet) lessons from building them for Copilot
By: Julia Neagu
Julia makes the case that, in her experience, evals do not noticeably improve the quality of AI models proportionate to the production bottlenecks they create.
Tweet of the Week
Snarky, but precise. Salute!
Amusing how 99% of people trying to explain LLMs forget that they don't generate the next token, they generate a probability distribution over the entire vocabulary space that the end application is free to sample from
— emozilla (@theemozilla) September 5, 2025
You are very often not presented with the Most Likely Token https://t.co/7RMM4KkeDo