- Posted by bwf
- On November 5, 2013
- 0 Comments
Stephen K from management comments on uptime reports from October 2013
So what is your Uptime?
Dealing with many of the sales tickets and chats that come in I regularly get asked ‘So what is your Uptime?’
For a long time now we have had staff on duty 24/7 who monitor our servers and react quickly when they see an issue. Whilst this is good and it certainly helps ensure our server uptime is as high as possible this does not provide any quantifiable figures to point clients or potential clients towards.
If you want to see what we do for our server monitoring check here: http://bwf.co/gotyourback
Any web host can place promises on their website promising 99.99% uptime but what does this actually mean. I have felt strongly for a long time that we needed to have a third party Uptime Monitor so last month we installed Pingdom and we monitor all our servers with this software. Whilst there is only 30 days of data available yet I wanted to raise awareness of this with this short article on uptime monitoring.
Below you will see what uptime actually means in terms of downtime for your websites. The formula to determine availability is straightforward: There are 525600 minutes in a year and the percentage calculations are relatively simple maths.
Downtime per year
Downtime per month
Downtime per week
* Uptime calculations from Hostdime Blog
What affects downtime?
Below you will see three examples of where server uptime can be affected – at hardware level, at server level and at network level.
Hardware Issues on a Single Server
Let’s look at a very recent example: In October 2013 our monitoring system was showing that Shared Server 28 had a failed drive in the disk array. An array means a drive can fail and the server stays up but it is prudent failed drives are replaced quickly. This is usually done without any downtime as the drives can be hot swapped. We hot swapped the disk but the server would not see the drive. We needed to take the server off the rack, swap the RAID card and SATA cables and place the server back on the rack. Of course this was maintenance we did on a weekend at 3am to minimise client impact and if you look at our status site you will see the server was down for approximately 2.4 hours as the server required a file system check. Prior to this maintenance this server had many months of 100% uptime so even though this outage brought the server down to 99.5% uptime in October 2013 I know the figures over a 12 month period would have been significantly higher. This is a reason why our new monitoring system will be so effective as in 12 months time anyone can look at our documented uptime over a longer term.
Individual Server Issues not hardware related
We recently had a tweet from a Managed VPS Client asking why his server was down and were we having issues. Upon checking we determined the Network was 100% up, the Node was 100% up and the issue was simply with the client’s individual server which had become RAM Exhausted and needed an upgrade. This was a simple fix but is an example of how other issues can affect server uptime.
Data Centre Wide Issues
Issues in the data centre can also affect uptime. Although usually your server is not down during such events the server is not publically accessible so is therefore deemed as being down by our external monitoring system. Such issues like Denial of Service attacks are unfortunately all too commonplace and all data centres have such issues from time to time. I personally keep a private log of such issues from a number of our competitors so I can make a judgement on the reliability of certain facilities. I also privately monitor a number of competitors servers (ping only) again so I can judge our performance with others. In the past 6 months our Maidenhead Racks have had virtually 100% network uptime, our Maidstone Racks have also had virtually 100% network uptime and our Orlando Florida Racks have been somewhat problematic with what I would say are too many issues. That said, in the past 4 weeks Orlando seem to have got a grip on things and we have had close to 100% network uptime with just a 4 minute outage recorded.
Bigwetfish Uptime for October 2013:
So what is the verdict? The stats are in and below you can find our uptime for October 2013.
We do find Pingdom report a lot of false positives. Most of the downtimes reported that are 2 mins or less we have found to be false reports and we are not sure why. I personally have been on line on a few occasions where pingdom report an outage and I have confirmed from my location and also from our remote techs location that there is no outage. Bear this in mind when looking at the raw stats. It is for this reason we have decided to report the average uptime for a facility rather than the single server stats. If you take out any outage less than 2 minutes our uptime is higher.
What we did to calculate this was take the monthly uptime % for each server in a single facility and work out the average to give the average uptime for all servers in a particular facility. You can view our status site to get the individual figures for single servers if you wish.
DATA CENTRE: Maidstone Kent UK (Hostdime UK Facility)
October 2013 Uptime: 99.93%*
16 Servers in total
DATA CENTRE: Orlando USA (Hostdime US Facility)
October 2013 Uptime: 99.94%
6 Servers in total
DATA CENTRE: Maidenhead UK (IOMART)
October 2013 Uptime: 99.98%**
10 Servers in total
*Two incidents contributed to this figure this month. During this month Server 28 had 2.4 hours downtime as the RAID card needed replaced and server needed an fsck file system check. Server 22 needed a reboot and it forced a file system check lasting just over the hour.
**One server has been retired (server34) so we have excluded this from the figures as although it still shows in our status it has no active clients on it.
Why not use: http://status.bigwetfish.co.uk to keep an eye on your server.
Although we have a paid for commercial Pingdom account there is a free version. Why not sign up at pingdom.com and start monitoring your own uptme for your own server.