In a recent responsibility-reshuffle at work, I have inherited from
The other obvious downside is the visibility of any resulting cock-ups. Do something wrong with one server, and my company's external website will disappear. Do something wrong with another, and an important internal website that almost all employees need to get on with their work on a day-to-day basis will disappear.
It is this latter server, the internal one, which concerns us today. A little while ago, it started making plaintive pleas to be upgraded to Ubuntu 16.04.1[*] That's fine, but upgrading requires restarts, and will make the internal website unavailable for a while. So I should do it outside office hours. And probably outside NYC office hours too.
Today, I had a physio appointment at 10am. So getting up, upgrading the server and then getting dressed and heading off to the physio seemed like a good plan. So I did that.
The server rebooted post-upgrade (always a relief) but the website displayed that most helpful of messages, "Internal Server Error 500". Everyone's favourite.
Never mind, there are some notes from Zandev about how to check for Django[**] errors. So I do that, and find an error, and Google to find out what the problem is, and install a few new packages and reinstall a Python module and fix it. Hurrah! Except our friend the 500 error hasn't actually gone away.
Well, to be honest, it doesn't look like a Django error anyway, I'd start by blaming Apache[***]. I don't really know what I'm looking for, but Apache has an error log and I have Google.
Sadly, Apache's error logs are empty. I figure hey, never mind, I'll restart Apache - because turning something off and back on again always helps. Except in this case, it doesn't. I tried again, but still nothing.
I revert to Google, but Internal Server Error is pretty vague, and a lot of the top hits are either (a)very specific to one web-provider or (b) aimed at people who are wanting to visit a website and want to know why it doesn't work, not the person whose job it is to fix it.
Zandev popped online about half an hour later, and fixed the whole issue in an embarrassingly short amount of time. He pointed out I was looking at the wrong log file (dammit, shouldn't have left my screen session lying around...) and told me where the right one was.
He'd found the error in the logs ("populate() isn't reentrant"), Googled for it, and discovered the solution was to restart Apache. So he restarted it, and suddenly there was the website, all beautifully working.
I pointed out that this was rather unfair. Especially since I had already done that. Twice. Indeed, I checked back through the logs and found the error, plus the evidence of my own restart, all appearing in exactly the same sequence as the ones at the tail of the file that showed him fixing it.
He referred me to the koan of Tom Knight and the Lisp Machine.
Apache does not respect me. Or my dressing gown.
[*] Remember recently when everyone with Windows 8 suddenly found themselves with an option to upgrade to Windows 10? Like that. Only less aggressive marketing. And on Linux.
[**] Django is the web framework which the internal website is built on.
[***] Apache is a ubiquitous piece of software which is the magic glue that makes the files on a server that make up a website be available to someone visiting in a browser.
no subject
Date: 2016-11-26 11:38 am (UTC)