Link Search Menu Expand Document

Take performance seriously while it can't cripple you

Premature optimization might be the root of all evil, but performance often means just not being lazy.

Performance doesn’t matter much in the beginning

Just like with security and testing, performance should be a regular part of development.

When platforms have little traffic, they don’t pay much attention to performance. They can solve performance issues by throwing hardware at them.

As a result, many of them go through a teething stage during which they must significantly improve performance to be able to handle traffic of the following order of magnitude. The most infamous examples of this are Reddit and Twitter.

I’ve learned this lesson the hard way but learned it well.

We had surges of users correlating to marketing activity which would cripple the site on occasion. To improve performance, at times, I had to continuously monitor performance and look for bottlenecks.

The situation was even worse when coupled with a surge in bots from Russia and China. I always had to have my laptop with me to ban IPs and restart the server.

After about a month of continuous monitoring, I identified and fixed most bottlenecks stabilizing performance. I also (blocked the Russian and Chinese bots).

Going through that made me take performance seriously, allowing Supplybunny to handle a 10× spike in traffic when the lockdowns started.

But making good architecture decisions and sometimes picking performance over development velocity is worth it

To make performance part of the development process you should get in the habit of picking it as often as reasonably possible when facing an easy/fast tradeoff.

It also means occasionally building features that only serve to improve performance.

But most importantly, it’s about not getting into a situation where fixing performance issues is no. 1 priority

Ignoring performance during the regular course of development results in expensive technical debt. To resolve it, you need to pore over logs and track down issues taking up a lot of time and effort. You should make sure you never get here.

And a way to do that is to keep an eye on it

Finally, you should make use of tools to periodically measure performance. I’m partial to https://github.com/wvanbergen/request-log-analyzer but have started using NewRelic and GoAccess as well.

Based on these measurements, you should calculate the maximum throughput of your application. Compare that with the average load required and then over-provision a bit so that your application can handle spikes in traffic.

Related Lessons

Further reading


Suggest an improvement to this page (me@ognjen.io)