I had initially intended for this to be a single post covering VXLAN from both Dell EMC as well as its SDN partners, but the length of the post stretched to the point where it made more sense to disaggregate it into two separate posts.
As discussed in the previous post, Flood & Learn & MP-BGP EVPN are the commonly found control-plane varieties of VXLAN. Dell EMC’s legacy implementation with OS 9 is Static VXLAN. As such, it does not have a control plane for peer discover/host re-distribution. An additional flavour based on BGP EVPN should become available in the future with Dell EMC’s Next Gen OS 10. Within the Dell EMC Networking SDN partnership ecosystem, Cumulus also offers Static VXLAN, along with LNV & BGP EVPN. Pluribus has a vport based, EVPN like offering, while Big Switch have the Flood & Learn variety.
I will start with a brief overview of the VTEP function, before giving a few comparisons. The focus is largely on VXLAN implementation with Dell EMC Networking.
VTEPs
VXLAN makes use of VTEPs (Virtual Tunnel Endpoint) to originate and terminate VXLAN tunnels. VTEPs map end devices (hosts/VMs) to VXLAN segments, and perform encapsulation/decapsulation. They reside typically on the hypervisor vSwitch, or Top of Rack switch. A VTEP has two functional interfaces:
- An interface on local LAN that provides bridging to local endpoints
- An IP interface to the transport IP network.
- It’s IP address identifies the VTEP device on the transport IP network.
- The VTEP uses this IP address to encapsulate frames before transmitting them out of this interface.
- This IP interface also facilitates the discovery of remote VTEPs for its VXLAN segments ,and learning of remote MAC-VTEP mappings.
VTEPs can be classified in a number of ways. In the previous post, we have already looked at the classification, based on Control plane (Floor & Learn, MP-BGP EVPN). Other ways to distinguish the solutions, include the base (HW vs. SW), Orchestration, etc. I will give a quick overview of these, below.
VTEP Function: Hardware vs. Software:
VTEP could be Hardware or Software/Hypervisor based.
Software based VTEP
SW VTEP does encap decap via CPU. An example would be the NSX Software gateway. It has very modest demands of the underlay:
- IP connectivity
- Jumbo frame support.
The underlay (or underlying network fabric), can be:
- Pure L2 underlay using (Dell EMC’s L2 Multipath) VLT, Virtual Link Trunking.
- Leaf-Spine Hybrid underlay with mix of L2 and L3
Since the VTEPs are implemented in the software, we do not need the switch to provide any handware support for the VTEP function in this model. As such, we are free to use a switch with chip/silicon that has no support for VXLAN.
Hardware VTEP
HW VTEP performs line rate encapsulation/decapsulation, in ASIC (as opposed to CPU). Hardware VTEP support is limited by chipsets that support the function, for e.g. Trident 2 (Broadcom). (I discuss it in more detail, further below).
A Switch (< Trident 2 chip) can still work as an underlay in conjunction with a software based VTEP (as mentioned previously), but it will remain agnostic to the overlay & VTEP operation.
VTEP Orchestration/Discovery:
Provisioning of the VTEPs could be Dynamic & Controller based, or it could be Static & Controller less.
-
Dynamic:
Remote VTEPs are auto discovered, and VXLAN tunnels are auto provisioned. For e.g. The orchestration platform could be NSX. The Controller <> VTEP communication is facilitated by OVSDB.
Software VTEP orchestration is Controller based only.
-
Static:
This is driven by manual mappings. Dell EMC’s legacy implementation uses Static VXLAN – it does not use multicast. The architecture creates profiles, containing mapping of user defined VNIs to Vlans. The profiles are then statically mapped to remote VTEPs.
The comparison between Dynamic and Static, can be based on the following criteria:
- Baseline vs additional components: Static is leaner.
- Dynamic vs Manual Configuration: Dynamic offers more automation
- Feature Maturity: Dynamic is more mature
- Proprietary orchestration vs Simple Tunnel: Self explanatory – Proprietary lockin vs. not.
- Scale: static does not scale well. Dynamic does.
- Flooding: Static suffers greater impact.
Dell EMC Networking’s OS 9 Static VXLAN implementation:
Dell EMC Networking’s legacy OS 9 VXLAN implementation can thus be summarized as follows:
- HW VTEP, Controller based: (NSX is the only qualified controller).
- HW VTEP, Controller-less: Static. No Stacking/VLT support. upcoming releases will enable High Availability via VLT, for Static solutions.
- SW VTEP : VLT supported.
High Availability – VLT support
Until recently, HA (VLT) support was only available for software based VTEPs with Dell EMC Networking OS 9. However, VLT based HA (Anycast) for Staic VXLAN tunnels should become available in the near future. (VLT stands for Virtual Link Trunking, a Dell EMC Networking L2 multipathing technology identical to Cisco vPC or MLAG/MC-LAG from other vendors).
Do be mindful that of the fast approaching availability of MP-BGP based VXLAN with Dell EMC Networking OS 10, which scales superior to all other flavours.
VTEP L2/L3 Function: Bridge/Gateway Vs. Routing
A quick word on the L2 vs. L3 functionality with VTEPs.
L2
As an L2 gateway, this provides VXLAN<> VLAN bridging. An L2 VTEP cannot do inter-VXLAN routing, & requires an L3 device to do it.
L3/VXLAN Routing
An L3 VTEP can do inter-VXLAN routing. With DEll EMC Networking switches, future enhancements to OS9 (& to OS10, Dell EMC’s Linux based, Next Gen DC OS), will enable certain capabilities in this space. This capability is chip dependent, and explained in more detail, below.
Chipset Considerations
With regards to VXLAN, be mindful that the following functions are chip/silicon dependent.
HW VTEP:
Hardware VTEP requires Trident 2 / higher chips.
L3/VXLAN Routing
- With Broadcom ASICs, VXLAN Routing is supported on Trident 2+ & Tomahawk + platforms.
- Some vendors can achieve VXLAN Routing on Trident 2 chipset, by doing a dual pass in the chip. The first pass is for L2 features, the second pass is for routing.
- Examples with current chips are:
- Non-Native Support: Broadcom Trident 2 or Tomahawk using loopback (as mentioned above), which enables re-circulation of packets to achieve VXLAN routing. This re-circulation or loopback is External in Trident 2 (physical cable), and internal in Tomahawk.
- Native Support: Broadcom Trident II+ & Tomahawk+ using a RIOT (Routing in/out of Tunnels) profile. This will be able to do single pass (Native) VXLAN routing. Look out for Dell EMC Networking platforms such as S6010, S4048-T.
- Mellanox Spectrum
Dell EMC OS9 (and OS10) are shortly due to support/enable capabilities, for both Native and Non-native VXLAN Routing. Drawing from the above points, here is how it will translate to Dell EMC Networking platforms. Most of the following is OS9 based, except in the case of S41xx, which talks in the context of OS10.
Chips that support Native VXLAN routing will achieve the function as follows:
- Native VXLAN Routing:
- Trident 2+ Platforms: (10G or 40G depending on chip)
- 10G: S4048T
- 40G: S6010-ON
- Tomahawk+ platform: (25G Silicon)
- S5048-ON
- Maverick Platform: (Flexible front panel port configurations via port profiles)
- S41xx-ON (on OS10)
- Trident 2+ Platforms: (10G or 40G depending on chip)
Chips that dont support Native VXLAN routing will achieve the function via loopback. This capability is shortly due.
- Non-Native VXLAN Routing: (Dual-pass/Loop back)
- Trident 2: (10G or 40G depending on chip)
- 10G: S4048-ON
- 40G: S6000-ON
- Tomahawk platforms: (100G silicon)
- Z9100-ON, S6100-ON
- Trident 2: (10G or 40G depending on chip)
- Do note that Trident 2+ will support Native VXLAN Routing in silicon, however, it would also be able to support it via Dual pass/Loop back. Therefore, i have included it here as well.
- Trident 2+ Platforms: (10G or 40G depending on chip)
- 10G: S4048T
- 40G: S6010-ON
- Trident 2+ Platforms: (10G or 40G depending on chip)
With OS9.14, for platforms supporting RIOT, internal re-circulation/loopback will become available.
In the next part, I will focus on the VXLAN possibilities with Dell EMC Networking’s SDN Ecosystem.
Thank you very much for this update Hasan. If I understand it right, there is no support for S4810 as VTEP because of its chipset?
Hi Mark,
That is Correct.
The S4810 uses the Trident Chipset. As such, it cannot support a hardware VTEP gateway. you could use it in the underlay of a software VTEP deployment, where the VTEP function is executed on e.g. NSX
If you are looking for VTEP on the ToR Switches, then we are looking at Trident 2 and up. so, S4048-ON, S4048T etc.
cheers
Hasan
I’ve been trying to find information on VXLAN support on the S4048-ON platform and I stumbled upon your site. You mention VLT support – any idea when this is coming? Additionally, are you fairly confident that VXLAN routing is coming to the S4048-ON as well? Currently doing VXLAN routing + MLAG with HER on some Arista boxes (7280R), which I believe also rely on recirculation, and the S4048-ON would be an attractive alternative given its price point.
Hi Mike
It has arrived.
with the just released OS 9.13, we have support for VLT with Static VXLAN.
Do remain mindful that this HA is for Access facing nodes/ports, the northbound connections to the spine are assumed to be L3.
The answer to your second question is in affirmative too.
With OS 9.13, the Trident 2 platforms (S4048, S6000) can now support VXLAN Routing. Note that this is via External loopback, so a cable that connects two ports externally. this is because there is no support for this in the silicon for these platforms.
Native VXLAN (Internal) Routing for Trident 2+ and Tomahawk+ platforms, is arriving soon too.
Arista 7280, i think uses the Jericho chipset, which falls under a separate family within Broadcom – the Strata DNX, where the focus is more on Buffers, TCAM etc.
Thanks
Hasan
Thanks for your work on the blog Hasan, it has been very useful for planning.
Im hoping that you can provide some advice on combining static VxLAN with VLT on Z9100’s.
Specifically how should the VTEPs be configured on each of the VLT members. ‘show vlt mismatch’ is currently complaining that the VTEPS dont match between the two switches. I did try with both switches set as the same VTEP, however the issue was then that there was a VLAN to VNI config mismatch between the remote and the peer.
Hi Carl
Thanks a bunch for the appreciation ! I am glad you have found the blog, useful.
Are you using OS9 on Z9100s, or OS10?
Assuming OS9, there are only a few steps to complete in this regard. Note that VLT HA is intended for client/access side (below leaf), while the network side (between leaf & spine layers) is supposed to be L3 +/ ECMP. VLT LAGs on the network side (above leaf) are not supported.
Both VLT peers should be configured with the same local VTEP IP address. One would complete the usual VLT config on both peers. An identical loopback IP will be configured on both peers, to serve as the Anycast IP.
Other than that, there will be the usual VXLAn config
[
vxlan-instance xx static
local-vtep-ip a.b.c.d
no shut
vni-profile profilename-abc
vnid 50
remote-vtep-ip a.b.c.e vni-profile profilename-abc
]
and configuration for VXLAN instances on VLTi and VLT LAGs to clients
VLT LAG
[
int po-ch 99
no ip add
vxlan-instance 77
vlt-peer-lag-port-channel 99
channel-member tengig 0/10
]
VLTi
[
int po-ch 1
no ip add
vxlan-instance 77
channel-member tengig 0/48
channel-member tengig 0/52
no shut
]
VLAN to VNID mapping
[
int vlan 50
vxlan-vnid 50
no ip add
tagged po-ch 99
no shut
]
if these do not work, it is best to give a quick call to support as they should be able to resolve it quicker, having access to labs and kit.
thanks,
Hasan