We recently were reviewing proposals from two vendors. One vendor claimed 100% uptime. Another vendor claimed 99.95% uptime. Our SLA to customers is below both of those numbers, but 100% feels better than 99.95% right? So we should go with 100% right?
My experience is that the uptime number in an SLA is purely for marketing purposes. Pure. Marketing. Purposes. If you read 100% and think the service will be online for 100% of the time? Shame on you.
The really important thing is the detail behind the SLA. Here are a few tricks I've seen that make a 99.999% SLA roughly worth nothing.
- What are the exclusions? Most service providers are hosted somewhere (Amazon? Physical space?) that has it's own uptime guarantee. If that provider goes down is your SLA still in effect? Many SLAs exclude acts of nature like a hurricane that can take down a single provider.
- What do you get when the number is broken? Some contracts give you a credit. Some give you cash. Some give you a credit that is worth your monthly cost multiplied by the percent of time they were offline. Is that worth much to you?
- Do you get more if the outage is persistent? If a service dies for an hour that's a problem. If it dies for a day that is horrible. I want to be compensated more if the outage is prolonged.
- Whose monitoring counts? What kind of monitoring? I've had times where my monitoring (Pingdom) showed a site was offline for hours, but internal monitoring showed it was fine. I got no credits.
- What counts as "down" - if the service is online but taking 10 times longer than normal to process requests, is that OK? What if the service is online but network connectivity is degraded?
- How are periods of downtime calculated? An SLA I read only counted a full hour of continuous downtime as real downtime. Many outages are 10 minutes here, 20 minutes there. I want to be compensated for those as well.