What Are Availability, Reliability, and Serviceability?
Availability refers to the overall uptime of the computer system or its specific features. For example, a personal computer is available for use if its operating system is booted and running. While related to availability, the concept of reliability means something different. Reliability refers to the general likelihood of a failure occurring in a running system. A perfectly reliable system will enjoy 100 percent availability. However, when a failure occurs, it affects availability in different ways, depending on the nature of the problem. Serviceability affects availability, as well. You can detect and repair failures more quickly in a serviceable system than in an unserviceable one, meaning you’ll have less downtime per incident on average.
Availability Levels
The standard way to define levels or classes of availability in a computer network system is a scale of nines. For example, 99 percent uptime translates to two nines of availability, 99.9 percent uptime to three nines, and so on. The below table illustrates the meaning of this scale. It expresses each level in terms of the maximum amount of downtime per (nonleap) year that could be tolerated to meet the uptime requirement. It also lists a few examples of the type of systems that commonly meet these requirements. The overall time frame involved (weeks, months, or years) should be specified to give the strongest meaning. A product that achieves 99.9 percent uptime over a period of one or more years has proven itself to a greater degree than one whose availability has only been measured for a few weeks.
Network Availability: An Example
Availability has always been an important characteristic of systems but becomes a critical and complex challenge on networks. Network services are commonly distributed across several computers and can depend on various auxiliary devices. Take the Domain Name System (DNS), for example, used on the internet and private intranet networks to maintain a list of computer names based on their network addresses. DNS keeps its index of names and addresses on a server called the primary DNS server. When a single DNS server exists in a system, a server crash takes down all DNS capability on that network. DNS, however, offers support for distributed servers. Besides the primary server, an administrator can install secondary and tertiary DNS servers on the network. Now, a failure in any one of the three systems is less likely to cause a complete loss of DNS service. Other types of network outages also affect DNS availability. Link failures, for example, can take down DNS by making it impossible for clients to communicate with a DNS server. It’s not uncommon in these scenarios for some people (depending on their physical location on the network) to lose DNS access but others to remain unaffected. Configuring multiple DNS servers helps to deal with these indirect failures that affect availability.
Perceived Availability vs. High Availability
The timing of failures plays a role in the perceived availability of a network. A business system that suffers frequent weekend outages, for example, may show relatively low availability numbers. Still, this downtime may not be noticed by the regular workforce. The networking industry uses the term high availability to refer to systems and technologies specially-engineered for reliability, availability, and serviceability. Such systems typically include redundant hardware like disks and power supplies and intelligent software like load-balancing and fail-over functionality. The difficulty in achieving high availability increases dramatically at the four-nines and five-nines levels. So, vendors charge a cost premium for these features.