Switch fabric setup for iSCSI

.

This post summarizes various elements which, in my experience, serve to optimize the iSCSI switch fabric. While it talks about Powerconnect switches, the elements discussed would be relevant to any switch model, used in the particular role.

.

a)      Global :

      1. Arp cache-size 1024
      2. mac address-table aging-time 1230
      3. spanning-tree portfast bpdufilter default : discard BPDUs received on portfast enabled ports.
      4. spanning-tree bpdu-protection : Received BPDUs on edge ports cause the offending port to be manually disabled.
      5. Loop Guard should be configured only on non-designated ports. These include ports in alternate or backup roles. Root ports and designated ports should not have loop guard enabled so that they can forward traffic.

b)      Global + VLAN Interface

      1. no ip redirects
      2. no ip unreachables

c)       Access Ports:

      1.  Spanning-tree tcn-guard: prevent edge ports from propagating TCNs.

ISCSI

1)      STP :  RSTP in operation. RSTP has superior port states and topology exchange mechanisms, compared to classic STP (802.1D).

 

    • Ports: Storage Vs. Server

On Equalogic/storage egress ports, I normally follow the practice of disabling STP completely. Unlike Server ports, there is no risk of a bridging application on the other end on these ports. Thus, not only do I want the port to transition quickly to forwarding state (which Portfast can accomplish), I also want to port to be removed from the STP tree calculation, as well as suppression of all generation of TCNs (Topology change notifications) and BPDUs. In contrast, I usually leave spanning tree running on server ports, with the addition of Portfast.

    • TCNs

Consider an example.

The PC on switch port is turned off. The switch detects the link status going down. The switch would begin sending TCN BPDUs toward the Root. The Root sends a TCN acknowledgment back and then sends a Configuration BPDU with the TCN bit set to all downstream switches. The TCN flag is received from the Root, and as a result, downstream switches would age out entries from their bridge or CAM tables much sooner than normal ( from 300 to 15 seconds). This causes recently idle entries to be flushed, leaving only the actively transmitting stations in the table. Given enough PCs, the switches could be in a constant state of flushing bridge tables.  Also remember that when a switch doesn’t have a CAM entry for a destination, the packet must be flooded out all its ports. Flushed tables mean more unknown unicasts, which mean more broadcasts throughout the network.

On Cisco switches, you can enable the STP PortFast feature on a port with a single attached PC. As a result, TCNs aren’t sent when the port changes state. On Powerconnect, the TCNs are supposed to stop as well. However, if you find these are still being generated, disable STP on such switch port to remove it from STP calculation completely. This will also put a stop to any TCN generation on the port.

In RSTP, only non-edge ports that move to the forwarding state cause a topology change. This means that a loss of connectivity is not considered as a topology change any more, contrary to 802.1D (that is, a port that moves to blocking no longer generates a TC). Also, the initiator of the topology change floods this information throughout the network, as opposed to 802.1D where only the root did.

    • Cisco Trunks

a)    Trunk default behaviour

The default behaviour for the switch is to transport all the VLANs across the link. As a best practice  I start with first removing all VLANs from the link, and then selectively adding only the ones I want across. This streamlines the trunk transport, but importantly, also helps with STP calculations. Cisco maintains a per VLAN instance of STP. The greater the number of VLANs and the links across which they are carried, the more complex the tree  & the more stress the calculations would put on CPU.

b)    Encapsulation dot1q

Lastly, though not necessary, I set the encapsulation explicitly to dot1q. together with nonegotiate, it assures that resources are not wasted on generating and processing DTP frames across the links.

2)     VLAN Segmentation:

 a)      Assign iSCSI traffic to its own VLAN.

 b)      Shutdown or orphan VLAN 1.

Within the Administrative VLAN, the switch behaves as a generic IP host, responding to all Broadcast & Multicast traffic. Any issues or excess Broadcast/Multicast within this VLAN will have a direct impact on switch CPU. you can use a separate management VLAN, which is used only to manage the switch and does not carry general network traffic. For Multicast, the MC data and UC data both go through the same COS Q ( Queue 1) to the CPU – unregistered MC data will flood to all ports in the VLAN which the CPU port is one of them, and since the COS Q is rate limited, and is really FIFO’d, the percentage chance that a UC pkt getting to the CPU in the environment of lot’s of MC data, it very small.

Also do not use VLAN1 for iscsi traffic (sometimes called the default VLAN) if JUMBO frames will be used as JUMBO frames when used in this VLAN have been seen to work inconsistently.

 3)    Cut through mode enabled.

4)     Flow control enabled on all ports, end to end. Verify via Show interfaces status and show storm-control all.

To be effective, flow control has to be implemented end to end. Thus, I would recommend it to be enabled through the entire path, from the egress to storage ports, through to the egress to server ports connecting to the iSCSI VLAN. Also, Flow control only works when the port is in full duplex mode, so be sure to enable full duplex on the port before enabling flow control.

Flow Control & Cut through Mutually exclusive: (For the Dell EqualLogic (EQL) PS Array, use flow control as best practice)

Most switches operate in “store-and-forward” mode; where, when a packet comes in to an ingress port, the switch buffers it completely. In “cut-through” mode, the switch doesn’t actually buffer the entire packet. As soon as it finds the destination address in the header, it figures out the egress port and starts sending the bits to that port immediately. The advantage of cut-through is that there is almost no latency introduced by the switch. HOWEVER, the big disadvantage is that, if you turn on cut-through, you lose flow control – since the switch no longer buffers the packet, it can’t meaningfully do anything about flow control. So, in cut-through mode, most switches simply stop responding to pause frames.

Any latency gains you might get by using cut-through would be more than offset by the increased amount of packet retransmission you’d get by not being able to use flow control. The best practice recommendation would be to stick with store-and-forward and use flow control. Note that while cut-through mode may work in a new SAN that is very lightly loaded if the amount of overall traffic is increased over time, cut-through mode may suddenly stop being desirable and switching back to store-and-forward will be needed. Tell-tale signs are performance issues, dropped connections, and an increase in packet retransmits overall on the SAN including at the array side.

If you still experience dropped packets, if you can find the controls in your OSes, you might try reducing the TCP window size until there are no dropped packets.

5)     Jumbo frames, enabled as MTU 9216 on switch, MTU 9000 on ESXi.

The most important aspect of any SAN is the capability of that SAN infrastructure to store or retrieve as much information as possible in the shortest time possible. Latency incurred due to data storage activities will adversely affect an application’s performance. For an iSCSI-based storage infrastructure, depending on the nature of the workload, use of jumbo frames may be the difference between a storage solution with average data throughput and high latency and one with high performance and low latency.

6)     Storm control, disabled for unicast, enabled for Broadcast and Multicast.

Three types of storms can occur on a network: broadcast, multicast, and Unicast. Storm management features typically work by disabling the ports that are exhibiting this behavior. Because iSCSI hosts and arrays can at times use more than 80 percent of the available network bandwidth, the switch management software commonly will interpret the high utilization as a Unicast traffic storm; therefore, if the switch has the capability to manage Unicast storm behavior, this feature must be disabled.

7)     The firmware on the switches is latest.

8)     LAG/Port Channels

LAGs should be enabled on inter-switch links as well, unless switches are stacked. No configuration of the physical ports used for LAG. Any config should be put on the port-channel. The ports will inherit this.

Auto iSCSI Optimization on Switches:

The stacking PowerConnect 5500, 6200, and 7000 switch series and the non-stacking PowerConnect 8000 switch series are the world’s first switches to automatically optimize themselves for iSCSI storage. After enabling iSCSI feature, the above switches are able to automatically detect a directly connected EqualLogic array to any of their switch ports and to configure themselves for iSCSI environments, at both global and port levels. The above switch series also provide information about all active iSCSI sessions to allow for easier management and optimization. Some of the key iSCSI optimization features include:

    • EqualLogic iSCSI Auto-Detection: Ability of the switch to detect any active EqualLogic array attached directly to its ports.
    • PowerConnect Auto-Configuration: Automatic configuration of switch ports and global settings for iSCSI arrays, hosts, and switch links or stacking ports.

The iSCSI Auto-Detection feature is, by default, disabled on Dell PowerConnect series switches. This feature will be enabled by default in later firmware releases. To enable the iSCSI on a switch connected to EQL Array:

console>enable

console#config

console(config)#iscsi enable

console(config)#exit

console#copy running-config startup-config

console#exit

The switch recognizes the appearance and/or withdrawal of Dell Equal Logic array ports via LLDP 2. Whenever LLDP packets are received by the switch in one of its ports, the Chassis ID attribute is compared with EqualLogic OUI (00:09:8a). If the match is found, the switch recognizes the existing of an EQL array on that port. LLDP is globally enabled by default, on all Dell PowerConnect switches. The ”iscsi enable” command only works for the switch and it does not configure any server or host connected to the switch. This command also does not create any new VLAN for the iscsi traffic. The features enabled include:

.

.

.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s