2026W20

A few interesting articles I read over the past few days

May 24, 2026·45 reads

This is the output of an automated process. Every Sunday, a script retrieves articles I've saved and read, uses AI to expand my quick notes into something more coherent, then publishes them. This post is one of those articles.

We’ve made the world too complicated — The piece names something I keep circling back to — most of us operate daily within systems we can’t meaningfully inspect. We write code on machines we don’t fully understand, live under laws we haven’t read, eat food from supply chains we can’t trace. The author doesn’t pretend to have a fix, and that honesty is the whole point. The temptation is always to either despair at the complexity or act like you’ve got a handle on it. The harder position is admitting you don’t, and still trying.
Our billing pipeline was suddenly slow. The culprit was a hidden bottleneck in ClickHouse — The detail that got me: more than half of query duration was spent waiting on a mutex that only protected a read operation. Nobody was writing — every thread just needed to look at the parts list, but they were all queuing up for an exclusive lock. The fix sequence is textbook — shared locking, deferred copying, binary search — but the diagnostic path is the interesting part. Flame graphs from trace_log pointed straight to the synchronization layer, not I/O or compute. A good reminder that the bottleneck is almost never where you assume it is, especially after a migration that “shouldn’t have changed anything.”
How to achieve truly serverless GPUs — The number that reframes everything: typical GPU utilization in inference sits at 10-20%. That’s not a technical problem, it’s an economic one — you over-provision because cold starts take minutes. Modal attacks the startup chain at every layer: pre-allocated buffers, lazy filesystem loading, CPU snapshots via gVisor, GPU memory checkpoints. The result is 40x faster scaling, from ~2,000 seconds down to ~50. What I find most interesting is the GPU snapshot trick — checkpointing CUDA graph compilation and Torch compiler state so you skip minutes of initialization on each new replica. If this holds at scale, the pricing model for inference changes fundamentally.
Why senior developers fail to communicate their expertise — The two-loops framing clicked for me. The business runs on reducing uncertainty fast — ship, test the market, learn. Senior developers run on managing complexity — keep the system stable so you can still ship next quarter. Both are right, and the failure isn’t disagreement, it’s that they’re solving different problems in the same conversation. The proposed escape hatch — “can we try something quicker?” — is deceptively simple. It acknowledges the speed need while redirecting toward a smaller experiment instead of bolting more onto the production system. I’ve been in that exact meeting, arguing for stability while everyone else wants velocity, and I wish I’d had that framing earlier.
Local AI Needs to be the Norm. — The argument that landed: modern devices ship with neural engines that sit mostly idle while apps stream data to servers in Virginia. For tasks like summarization, extraction, and classification, you don’t need frontier models — you need something fast and private that runs on hardware the user already owns. The Brutalist Report example is a good proof point: article summaries generated entirely on-device, no server, no API key, no billing. I’m not sure this generalizes to everything people want from AI, but for the 80% of features that are really data transformation rather than reasoning, the case for local-first is hard to argue against.
There was only mom — This one is hard to comment on without being reductive. José writes about caring for his mother during her last days with cancer, and the specificity is what gives it weight — asking for forgiveness and her writing “I forgive you” because she couldn’t speak, changing her diaper and recognizing it was the first thing she ever did for him. The piece moves between raw grief and philosophical reflection without the transitions feeling forced. What stayed with me is his reason for publishing: “many moms go forgotten.” There’s something defiant about using a blog post as an act of remembrance.
Staring at walls to improve focus and productivity — The practice sounds absurd — sit and stare at a blank wall for 5-10 minutes when you can’t focus. But the reasoning holds: 87 GB of daily information input, constant stimulation, a productivity cycle built on caffeine and scrolling that never actually recharges anything. It’s basically meditation without calling it meditation — unfocused peripheral vision to trigger the parasympathetic nervous system. What I recognize most is the description of the resistance. Your brain fights doing nothing the same way your body fights a cold pool. I’ve tried similar resets and the hardest part is always convincing yourself that doing nothing is doing something.
Ghostty Is Leaving GitHub — The detail that says it all: Hashimoto kept a journal of outages, and “almost every day has an X” for days when GitHub blocked his work. After 18 years of daily use since 2008, this isn’t someone looking for an excuse to leave — it’s someone who ran out of patience. The emotional framing is honest too: “I want to be there but it doesn’t want me to be there.” What makes this more than a rant is who’s saying it. A high-profile open source maintainer leaving is a real test of whether developer loyalty to GitHub is about the platform or just inertia.
How The Heck Does Shazam Work? — The clever bit isn’t the fingerprinting itself — it’s how they pair peaks. A single frequency peak is too common to identify anything. But pairing two nearby peaks with their time offset creates a hash specific enough to match against millions of songs. The constellation map approach — keeping only the loudest peaks, discarding everything else — is what makes it noise-resistant, because ambient sound almost never dominates a frequency region. Built on Avery Wang’s 2003 paper, and the core algorithm hasn’t fundamentally changed since. Sometimes the right abstraction just lasts.
I am building a cloud — David Crawshaw (Tailscale co-founder) makes the case that clouds got worse as they got bigger. The IOPS argument is the most concrete: remote block storage added 10% overhead with hard drives, but with SSDs offering 20-microsecond seeks, that overhead became 10x. We adapted our architectures to hide latency that shouldn’t exist. Same story with egress — 10x markup over bare metal, treated as normal because everyone charges it. The bet with exe.dev is that agents will need a different compute primitive: flexible CPU/memory without fixed VM shapes, local NVMe, built-in TLS. Whether it wins is anyone’s guess, but the diagnosis of what’s broken rings true to anyone who’s wrestled with instance sizing.