Howdy.
Today, the site had experienced severe issues, which resulted in about an hour of downtime, and a few hours more of degraded performance.
I apologize for this mishap. We are looking into this situation, and into ways to mitigate it in the future.
What went wrong?
As many of you know, the site is continually developed and improved.
We usually have weekly releases, typically on Wednesdays. That day was chosen because traffic to the site is usually lighter than normal.
Today, we were deploying a whole slew of changes, including a revamped replacements list UI, tweaks to the upload page, some quality of life improvements, and so on.
Some of these improvements required changes to the database table containing post version data. Those changes were not major – you likely would not have even noticed them if they did go through.
Needless to say, something went very wrong during the deployment process. I'm going to have to get a bit technical here.
Databases typically use indexes – data structures used to quickly locate entries without having to search through every row in a database table.
They are very useful, considering how much data is required to make the site work. The post versions table specifically contains almost 50 million entries.
Unfortunately, the indexes related to the post versions table ended up corrupted.
We are still looking into what exactly went wrong here. However, the fact remains that at that moment, post version lookups ended up slowing down to a crawl.
In fact, database requests to that table were not just slow – they were outright failing, as the requests were taking longer than the maximum timeout period allowed.
This really should not have been this much of a problem, all things considered. Not a lot of people are searching through post versions on a daily basis, after all.
That's where the other issue came in – the "recent tags" function. It's a feature of the post editing form. As the name implies, it suggests to you the tags that you recently added to other posts.
The person who wrote that functionality did not really think things through, though. Instead of only fetching data when needed, it caused every post page to do an extra database lookup searching for recently used tags.
Yes, even if you did not open the post editing form. Or even if you never did any tagging at all.
And so, when the post versions requests started to fail, they took post pages with them.
Furthermore, all of these hanging requests were causing an increased load on the servers – and soon enough, the rest of the site started lagging too.
What do we do now?
Currently, the "recent tags" functionality is disabled. We will be looking into updating it to avoid hammering the servers with pointless requests.
The underlying issue is still being investigated. The indexes had been rebuilt, and the post versions pages should work normally.
We will also be looking into improving the testing processes in order to catch issues like this before they hit the main site.
Once again, I apologize for the inconvenience.