Pluribus Open DCI (Over Adaptive Cloud Fabric)

The DCI from Pluribus is one of the most feature rich solutions of that nature , within the Dell EMC Networking SDN Ecosystem. At the heart of this solution is Adaptive Cloud fabric – A distributed Control plane which is logically unified. It allows configuration of the DCI fabric (Multiple sites, all fabric switches) from a single management interface. There are no centralized controllers, and integration/inter-operability in brown-field deployments can be achieved at both L2 & L3, courtesy the support for everything from BGP & OSPF at L3, to xSTP at L2.

As I discussed in the very first part of this series, the need to stretch or provide the same Layer 2 domain between multiple DCs could be driven by need for VM mobility, storage replication, etc. Stretching the L2 domains over an L3 DC interconnect presents a set of challenges around mobility and adjacency, for e.g. Address allocation might need preserving across mobility events (VM move), and so with the subnet.

Pluribus have based their DCI solution on VXLAN. Pluribus calls this a “VXLAN-based fabric with native VLAN Extensions”, a descriptor for Adaptive Cloud Fabric. While VXLAN is the multi-point solution, Pluribus also offer a point-to-point Pseudo-wire which is equally compelling, called vLE – Virtual Link Extension.

The end result is a Scalable, Terabit DCI Solution that is standards based & Merchant Silicon driven, as well as inter-operable with existing Core + IP Transport Network. It can provide Inter-DC Anycast Gateway distribution & fault isolation between sites. All of this is before we arrive at the Pluribus’ traditional strong suite around visibility and Analytics for Application traffic.

I will start with the VXLAN based DCI.

VXLAN DCI, Multi-point to Multi-point:

Underlay options:

Any transport:

Layer 2, Layer 2/3, Layer 3 (BGP, OSPF etc.)
Recommended: L3, preferably with BGP

Features:

In the following list of features or components, note the High Availability aspect of the solution, from Layer 1-3.

Eliminating Single point of Failure

Cluster (Virtual Chassis) with vLAG (Multi-Chassis LAG, L2 multi-path)
VRRP for Active-Active First-hop/Gateway Redundancy
ECMP: bidirectional multi-path traffic load balancing
VTEP High Availability: Redundancy at VXLAN fabric Edge
- VTEP HA at both ends of VXLAN Tunnel
- Eliminates [Device + Link] Single Point of Failure
- Tunnel scale optimization with single tunnel to each HA pair
- Active-Active load Distribution of VTEP function in Cluster
- VTEPs share VIP (VRRP based)
- VXLAN Tunnels between the VTEPs are provisioned automatically, without any manual intervention.
- Tunnels ate built using D_IP = VIP, S_IP = VIP
- A VTEP pair can therefore act as a single logical VXLAN end point using a single shared VIP as source/Destination address.
Distributed AnyCast Gateways: Manifestation of Distributed VRFs.
- The Endpoints use the same Virtual MAC + IP gateway addresses on all leaf switches, i.e. L3 gateway for East-West traffic is stationed directly on the first hop switch.
- Distributed VRFs optimize traffic flows, because routing can now take place at the first hop, instead of hair-pinning it to a central/core location. If communication between VRFs is needed, an external router is needed (see, vRouters).
- Also, unlike box-by-box and EVPN based Solutions, a distributed subnet configuration for Anycast Gateways can be provisioned with a single command in ONVL/ACF.
For North-South traffic, a “border leaf” or leaf pair is still required to provide routing using the vRouter feature (with routing protocols support) as well as firewall service insertion via vFlow, when needed. Each VRF should have two (for redundancy) uplinks or Northbound connections pointing to a next-hop gateway, for all traffic external to the DC.

In addition to the above, ACF uses its vPort DB for Address Registration and Tracking (Vs. e.g. MP-BGP EVPN). The behavior is EVPN like, but as you will see below, it has been beefed up with a number of optimizations and enhancements.

When the fabric is extended between multiple sites, the control plane communication is in-band, taking place over the IP Transport Network. This Network & the core itself, can be built on top of any 3rd party switches – the only requirement is that it provides IP reachability between all ACF switches. However, if the underlay is ACF/ONVL based, that would result in the benefit of the single Point of Management, Visibility & Automation stretching to this pocket/layer as well. The vRouter instance dedicated to Fabric Control Plane communication is created with the fabric-comm attribute, to identify its role.

Pluribus recommend BGP for the underlay due to the scale & flexibility it provides, but the end goal is IP reachability. As long as it can be ensured, other options like OSPF & Static Routing can definitely be utilized.

Traffic Handling: Optimizations & Enhancements

ARP Optimization

This refers to the use of vPort database (instead of Flooding), to perform Dynamic End-point discovery and ARP Proxy to support Mobility across DCs. When “Host-A” in DC-1 is discovered through initial traffic exchange by its upstream switch, it’s details are recorded in the vPort database. If Host-B in DC-2 broadcasts an ARP request for the MAC address of Host-A, an ARP proxy function is executed in DC-2 by the receiving fabric switch. Consequently, the ARP response comes from the fabric based on the contents of the distributed vPort database, instead of the original host. This eliminates

Any direct communication between ARP requester and registered host,
Any flooding of broadcast ARP packets.

vPort Forwarding

This enhancement also draws from the vPort database. It eliminates “Flood & Learn” for unknown DA traffic. While similar to ARP optimization discussed above, vPort forwarding is applied to the cases when the L2 DA in a frame can be successfully looked up in the vPort db, and as a consequence, packet flooding can be avoided.

Host Location update/registration, Post-Mobility event

When end-points or VMs move within or across DCs, vPort database is automatically updated with the current location information. Any switches holding stale info in their caches, are notified to update their tables.

—

Pluribus Virtual Link Extension (vLE): Point-to-Point Pseudo-wire Service

This is a compelling solution where two sites need to be connected. By nature, this is a transparent, Layer-2, Point-to-Point Service – it will take (any) traffic from a physical port in DC-1, and transparently forward it to a physical port in DC-2. The end-points are adjacent as if directly connected.

Being transparent, it will transport all VLANs and all types of traffic (including protocol/control traffic e.g. for LACP). This has implications, because by tunneling LACP messages between switches in the two DCs, it is possible to bundle vLEs into a virtual aggregated link and hence achieve redundancy via Link Aggregation. (vLEs are configured between VTEPs’ physical IP addresses, which are not redundant – consequently, another mechanism to eliminate SPoF is required). vLE performs constant link state tracking to preserve end-to-end link state consistency, while the use of LACP enables fast link failover and convergence.

In summary, vLE features include:

HA Support (LACP tunneling)
- Virtual Link Extension (vLE) aggregation with LACP
Link State tracking
- Allows VLE LACP fast convergence
- VLE aggregation w/o LACP (mode on)

When Links State Tracking is enabled, the fabric control plane keeps track of the link state at each vLE access port. The administrative port state of the local port, is synchronized with the operational link state of the remote port. This will avoid blackhole of the traffic originating for the local vLE access port, when the remote end of the vLE is down.

There is no loop prevention mechanism operating on a vLE object. The VLAN that transports the traffic over vLE is dedicated to this function, and cannot be used for other L2 bridging.

—

	Anonymous on Switch Port auto shuts/ port s…
	Shannon on Force10 S4810 VLT – Quic…
	Dua F on Layer 2 Discards Troubleshooti…
	Hasan Mansur on MX7000 Networking – Part…
	ajityadav on MX7000 Networking – Part…