Efficient Knowledge
An interactive primer on reading and generation speed.
scroll to skip
How humans and models actually do
Human factors research has been studying the limits of reading and listening speeds, comprehension, and typing speeds for many years. There's a wide distribution from novice to expert; I've used figures for the average adult native-English speaker.
Models don't "read" and "type" like humans, but they do have analogues. The two we care about are token output speed and input rate limits (the tokens per minute a single account can push through). Models and providers vary in their TTFT and reasoning levels. The outputs table treats a model's average performance as the answer tokens divided by the total response time.
Use the toggle on the left to flip between WPM and TPM.
| Source | WPM (avg) | WPM (max) |
|---|---|---|
| Typing (human) | 50 | 120+ |
| Speaking (human) | 150 | 200 |
| GPT-5.5 | ~185 | ~1900 |
| Gemini 3.5 Flash | ~920 | ~8000 |
| Claude Opus 4.7 | ~1000 | ~2000 |
| Source | WPM (avg) | WPM (max) |
|---|---|---|
| Reading (human, silent) | ~250 | 500+ |
| Listening (human) | 170 | 275 |
| Gemini API, free tier | ~192k | |
| Anthropic Opus, Tier 1 | ~385k | |
| Anthropic Opus, Tier 4 | ~7.7M | |
| GPT-5.5, Tier 5 | ~31M | |
The effective TPM for high-reasoning models (~185 – 1000 WPM) is surprisingly close to human reading. I'd be curious if this is a deliberate product decision. Although we often ask for faster models, we use those outputs indirectly in async-friendly deliverables — with artifact-driven reviews and rich HTML documents.
Measure yourself
How does your own reading and typing compare? Pick a passage. Read it. Type it back. If you like, test your comprehension of the text to see how fast you can read before the details start to blur.
Step 1 — Pick a passage
Takeaways
A few things stand out from the comparison:
- Most people can speak 3x faster than they can type, and dictation has gotten really good. If your bottleneck is getting words out of your head, the microphone is the best tool we have.
- The model input channel is practically unlimited. A Tier 4 Anthropic Opus account can ingest roughly 7.7 million words per minute — more than 500 hours of human reading.
- Most people comprehend at around 250 WPM whether they're reading or listening at higher speeds. That's the same order of magnitude as effective TPM on frontier high-reasoning models — which is why a high-reasoning response still feels readable in real time.
Models are very fast at the part of work where the unit cost was relatively low — moving characters from one buffer to another, restating something at a different length, retrieving a passage from a long document. We can turn these strengths into better tools. Reasoning mode is one example of this trade: the model exchanges many low-value tokens for a higher-value restatement.
The performance measure worth caring about is closer to value retained per minute of attention: how much of what passed in front of you actually changed how you make the next decision. Speed reading does not help if the comprehension loss outpaces the speed gain, and generating more tokens is a waste if those tokens don't move the work forward. The interesting question is not how many words per minute, but whether those minutes help make better decisions.
Attention and outcomes are the new scarce resources, so I expect more abstractions in model and agent oversight. Instead of reviewing plans and transcripts step by step, we'll look at high-level outcomes and tune those to be easily reviewable. We'll manage the agent's performance more like delegating to a professional, focusing on deliverables instead of the process. The labs know that delegation requires trust, and I expect them to continue differentiating on trust and speed. For some models, feedback time will be a compelling advantage, but many labs will decide it's not worth optimizing for speed if the model can be trusted to work independently.
— sethrylan.org · 2026