At ~6am this morning my cloud server went down and refused to respond to shutdown/reboot requests.
Turns out a disk had become corrupted at some point and so couldn't boot up. Support had attempted various moves etc. but ended up leaving the server in netboot and requiring me to run a fsck.
This was the first time I'd come across this issue (disk corruption at least in my experience is relatively rare these days although obviously it can still happen).
While I now have another tool in my arsenal if this should happen again, I'm a little concerned as to my responsibility for sorting this out.
As it's highly unlikely (impossible?) for anything one could do (even maliciously) on a cloud server to cause physical disk corruption (data corruption is a different issue) I am not sure why it's my responsibility to bring the system back up?
I suppose I could pay for managed hosting, but that's not in my budget right now and to be honest I can solve most of the problems that come along (particularly if it's my own stupid fault for causing them).
I don't want to quibble over responsibilities, though, but rather know if there's anything I can put in place to alert me to this. If this had happened when I'd been on holiday or otherwise away from Internet access, I'd have no ability to bring the server back up (or even necessarily know something had gone wrong).
A mere 'ping' in this case wouldn't indicate any issues. I could have an automated 'head' access to one of my sites, and then email a non-Bytemark account with an alert, but this still would make it difficult to do anything about it if I was away from the Net (it does happen!).
Any other suggestions? I am hoping this is an unfortunate, rare occurrence, but I would like to be prepared for any future failures which might occur.