But companies seeking to deliver critical business information over frame relay circuits need a guarantee of network performance for those circuits. Monitoring and controlling the performance and reliability of public-network circuits depends greatly on SLAs (service-level agreements) between companies and service providers and monitoring of the agreements.
SLAs are contracts that define the service levels expected from a service provider and the penalties imposed if the service provider does not comply. These SLAs’ main purpose is to help keep conflicts between companies and service providers to a minimum by setting reasonable expectations of service.
SLAs benefit the client by providing effective grading criteria and protection from poor service. They benefit the service provider by offering a way to ensure that expectations are set correctly and will be judged fairly. Although SLAs include some kind of monetary reimbursement for lost or poor service, that’s a last resort. Ask anyone affected by April’s AT&T frame relay outages — they’d rather have good service than compensation for their lost connectivity.
What does an SLA guarantee?
There are four basic items that should be covered in every SLA: availability, reliability, effective throughput and response time (or delay). Other items to consider include the time it takes to respond to problems and the time required to repair or restore service. Availability is a measurement of how much uptime the customer receives, while reliability refers to how often a network goes down or how long it stays down.
Reliability is a better metric for measuring the consistency of a connection. For example, if a network were to be out of commission 1 minute out of every 10, it would have an availability of 90 percent, but it would be labeled unreliable.
Since the SLA spells out the terms of network performance, both the provider and the customer must accept the same definition for each term. For example, definitions can vary even for a measurement as simple as network availability.
Availability guarantees should include all components of the provider’s network, the local loop to the network and any equipment provided by the service provider (such as a CSU/DSU and router). Service providers may want to exclude the following: a customer-provided CSU/DSU, router or other access device; the local loop when provided by the customer; and network downtime caused by the carrier’s scheduled maintenance, customer-induced outages, dial-in links and natural disasters.
There’s also an important distinction between network-based availability and site-based availability. For a network consisting of 10 sites, a 99.5 percent average network availability would allow 36 total hours of downtime in a 30-day month. If the SLA is based on site availability, any one site can only be down for 3.6 hours in the month. The distinction can be very important when determining compliance with an SLA.
When defining measurements of throughput for an SLA, the traffic load and delay should be measured when the impact is at its highest, which is at times of peak traffic load. Since service providers will often exclude certain transmissions, such as during provider maintenance, from dial-up lines or from new circuits added during a contract month, customers should understand which data has been included in any measurements so that the user’s cross-check measurements will correspond to those performed by the service provider.
Preparing for an SLA
The main step in preparing for an SLA is baselining the network’s performance. This involves monitoring performance over a period of time, usually a minimum of three months, and reviewing the performance data for any trends that may affect network quality. Without this information in hand, a company cannot realistically determine what WAN performance it requires, nor could it tell if performance had degraded or why. This baseline data can also tell a company if it needs to negotiate for special conditions in its SLA.
While an SLA is couched mainly in technical terms such as availability, throughput and response time, there must be a link between these terms and the company’s business needs.
Although network performance is measured by what happens as traffic passes through the devices comprising the WAN, the most important results for the customer are how applications behave and whether users can do their jobs. Proper baselining, therefore, requires tying lower-level measurements made on the network to the business requirements of the enterprise — not a simple matter.
Many service-level management products focus on monitoring service levels with SNMP and RMON and thus do not provide a sufficiently integrated view to let network managers review end-to-end performance for applications.
Some products, such as Hewlett-Packard Co.’s Netmetrix Reporter, Infovista Corp.’s SLA Conformance Manager, Platinum Technology Inc.’s Wiretap, Compuware Corp.’s Ecoscope and Optimal Networks Corp.’s Application Expert, use response times, often with data gathered with SNMP and RMON, to give a more integrated view of network performance.
Monitoring your SLA
Whether a company’s SLA is a standard one provided by the service provider or a custom-negotiated one, it will mean little without a system for monitoring the specified service levels. A good rule of thumb is to review internal measurements of network performance and reliability on a weekly basis and compare them to the service provider’s results every month.
There are also key implementation issues that have a direct impact on the usefulness of SLAs to the network manager. The first issue is where the measurements are taken: end-to-end or just within the service provider’s network (for example, from the provider’s switch at one customer site to another switch at another site). The local loop can have a profound impact on network performance, but it is ignored in a switch-to-switch implementation. Performance measurements (and troubleshooting) should be taken end-to-end.
The second issue is the measurement system — it should be independent of the network being measured and not biased toward switch or router architectures. Also keep in mind that presentation of the information is almost as important as the information itself. As mentioned earlier, integrated display of end-to-end network performance data is still rudimentary.
Many of the service providers offering guaranteed service will locate measurement devices at the customer’s site. For comparison’s sake, users should try to locate their own measuring devices parallel with those installed by the service provider.