Phone:
1-800-346-3119

A Beginner's Guide to Redundancy Standards

Industrial Ethernet: How to Keep Your Network Up and Running

For a very good reason, a commercial airliner crossing the ocean must have more than one engine. In the same way, when an industrial network failure creates a safety risk or other possible major loss, backup systems are necessary to reduce the risk.

Ethernet’s ability to deliver reliable control in industrial settings was once a major concern. Now that industrial grade managed switches make it simple to segment and manage industrial networks, the speed of recovery from a typical network fault can be measured in microseconds (thousandths of a second). With these short recovery times, many applications are now able to use Ethernet and take advantage of the most up-to-date control networks.

This paper reviews how network topology, transport layer protocols, and network management tools affect recovery speed, and the practical comparison of these technologies in the real world.

Topology: The Basis of Security and Speed

Network topology, or the means by which devices are connected, is key to redundancy and recovery time. Typical topologies as reviewed here are mesh networks, link aggregation and redundant rings.

These three topologies deal very well with recovery from link failure, often the weakest part of a network. The evaluation factors described apply to other redundancy plans as well.

While redundant ring topology is the most common of these, a review of mesh and trunked networks helps to understanding why redundant rings are used most. The ideal choice for an industrial network is based on the best mix of cost, ability to heal and speed of recovery.

In a mesh network, devices or nodes are interconnected:

Mesh network diagram

A mesh configuration is the most flexible and best able to heal. Multiple paths aid in recovery from a broken link or switch. Even a failure of more than one link or switch at the same time can be mended.

Mesh networks are also more costly, as they require more wiring and ports. Additionally, recovery from a mesh network failure is much slower. This is because the system has to re-learn paths and configure ports to work around the broken link.

Link aggregation, or trunking, also uses multiple links between devices but in a simpler way:

Link aggregation diagram

If one link fails, the parallel cable takes over. Recovery is fast due to the simple configuration, and bandwidth greater because of the higher capacity of two or more lines. But cabling costs are higher, and if a switch goes down the switches connected through it are also disabled.

There’s also a higher risk of total link failure. When one link fails the same event may also damage parallel cabling. This increases the chance of a more serious disruption.

Redundant rings are connected in a loop. One link is disabled until it’s needed:

Redundant rings diagram

This solution has lower cabling costs, and faster recovery from a failed data pathway. Along with support for other switches if one fails, these are reasons that the redundant ring is the clear choice for industrial settings.

Transport Layer Tools: TCP and UDP

TCP and UDP are transport layer programs. Each has its advantages, but with tradeoffs between speed and assured data delivery.

TCP guarantees data delivery, checks receiving device status, verifies the message is correct and complete, and resends data as needed. UDP doesn’t check receiving device status, doesn’t resend data and doesn’t guarantee delivery. But it’s faster, uses less processor time, less bandwidth and requires no response from the receiving device. As a result, it’s able to deliver data faster than TCP.

Industrial control often uses UDP as it improves real time performance.

Network Management Protocols

Some network management protocols are capable of handling a wide variety of configurations. Others are useful for only one type of configuration, but do it well.

The Spanning Tree Protocol or STP was adopted as IEEE 802.1D standard in 1990. The Rapid Spanning Tree Protocol or RSTP (similar to but faster and more capable than STP) came out as the IEEE 802.1w standard in 1998.

Each of these does substantially the same thing by allowing Ethernet to connect with mesh and ring networks. They do this by putting selected links into standby to prevent data loops from overloading the network. Without this, the circular connections that make up the mesh and ring topologies would bog down the network and ultimately lead to a complete communications failure.

Standby links are activated to “heal” the network when a link fails. Recovery time difference between STP and RSTP is great, but RSTP is still slow for many industrial applications.

Proprietary Ring protocols answer this problem. They expand redundant ring applications to situations that require faster recovery times.

These only deal with redundant ring topologies, not mesh networks. As they are proprietary, they may not allow the mixing of different brands of switches. However, they’ve been well received in industrial settings needing the benefits of quick recovery.

In some cases, manufacturers accomplish this by using the simple routines of RSTP to fix redundant ring failures, but take out the time consuming ones needed to manage mesh networks.

Link aggregation, or IEEE 802.3ad, is the standard that formalizes the management of parallel Ethernet network cables and ports, to expand bandwidth and provide for faster recovery.

With link aggregation, the connection is never totally lost, as it is duplicated by a parallel line. But bits of data can be lost when a link goes down, and bandwidth is reduced by the loss of a link.

Redundancy, Recovery and Cost Comparison

While real-life networks may require additional redundancy measures, all are subject to the trade-offs of protection vs. resources.

It’s impossible to fill in the blanks precisely. The combinations and variations of redundant networks could fill volumes. Results for every piece of equipment and network will vary. But every situation will use the same type of analysis.

Here’s a comparison of the topologies and protocols reviewed in this paper:

Network Protocols

Average Recovery Speed*

Redundancy & Protection*

Cost*

Proprietary Ring / TCP

300 ms

Medium

Medium

Proprietary Ring / UDP

200 ms

Medium

Medium

STP-Redundant Ring / TCP

30-90 seconds

Medium

Medium

STP-Redundant Ring / UDP

10-50 seconds

Medium

Medium

RSTP-Redundant Ring / TCP

1-3 seconds

Medium

Medium

RSTP-Redundant Ring / UDP

<1-2 seconds

Medium

Medium

STP-Mesh / TCP

60-300 seconds

Higher

Higher

STP-Mesh / UDP

40-150 seconds

Higher

Higher

RSTP-Mesh / TCP

3-10 seconds

Higher

Higher

RSTP-Mesh / UDP

1-2 seconds

Higher

Higher

Trunking / TCP

100-200 ms

Lower

Lower

Trunking / UDP

0-10  ms

Lower

Lower

 

*Approximations only, based on a review of the literature, manufacturer claims and average test results.

Managed Ethernet Switches – The Hardware That Makes It Happen

All this protection and speed is, of course, dependent on the up-to-date hardware that works with these configurations and protocols. B&B Electronics EIR500 and EIR600 series industrial managed switches deliver redundant, ultra-fast recovery.

With features such as IEEE 802.3x flow control, redundant ring or RSTP capability, under 300ms recovery and other industrial grade features, these form the hardware backbone of many demanding industrial networks.

These are also capable of greater redundancy for even more security. For instance, dual homing prevents loss of connectivity between the redundant ring and the upper level switch. Coupling ring connects two redundant rings together to create additional redundancy.

As an example, a power company uses Ethernet switches from B&B Electronics with dual homing and coupling ring configuration to remotely manage hard-to-reach substations. Substations use “step down” transformers to convert high voltages from the power grid to lower levels for homes and businesses. A bus splits electricity to distribution lines.

Circuit breakers and switches disconnect from the power grid or lines in a split second to protect from lightning or overloads, then reconnect just as quickly to restore power – and profits.

This is an example of how the additional cost of cabling and equipment is justified by the extra level of security.