Where's the Shovelware? Why AI Coding Claims Don't Add Up --- Mike Judge Sep 03, 2025 [View all]
https://substack.com/inbox/post/172538377
With receipts.
"But lo! men have become the tools of their tools." --- Henry David Thoreau, "Walden"
I was an early adopter of AI coding and a fan until maybe two months ago, when I read the METR study (1) and suddenly got serious doubts. In that study, the authors discovered that developers were unreliable narrators of their own productivity. They thought AI was making them 20% faster, but it was actually making them 19% slower. This shocked me because I had just told someone the week before that I thought AI was only making me about 25% faster, and I was bummed it wasnt a higher number. I was only off by 5% from the developers own incorrect estimates.
This was unsettling. It was impossible not to question if I too were an unreliable narrator of my own experience. Was I hoodwinked by the screens of code flying by and had no way of quantifying whether all that reading and reviewing of code actually took more time in the first place than just doing the thing myself?
So, I started testing my own productivity using a modified methodology from that study. Id take a task and Id estimate how long it would take to code if I were doing it by hand, and then Id flip a coin, heads Id use AI, and tails Id just do it myself. Then Id record when I started and when I ended. That would give me the delta, and I could use the delta to build AI vs no AI charts, and Id see some trends. I ran that for six weeks, recording all that data, and do you know what I discovered?
I discovered that the data isnt statistically significant at any meaningful level. That I would need to record new datapoints for another four months just to prove if AI was speeding me up or slowing me down at all. Its too neck-in-neck.
(1)
https://metr.org/blog/2025-07-10-early-2025-ai-experienced-os-dev-study/