VXLAN with BGP EVPN [Part 2] Routing/Bridging | Dell EMC Networking

This is the second post in the “VXLAN with EVPN” series of posts. I expect there to be about 3 more posts in this series. In this post, we will have an overview of Bridging & Routing with EVPN.

Before I start with the post, might I recommend a visit to Mike’s blog? It is choke full of great content on Dell EMC focused Hyper Converged Infra, as well as Smart Fabric Services, VXRail etc., among other things. Have a look at hcidiver.com.

Back to our post – A quick look at a term which will be relevant to the discussion later.

AnyCast Gateways

The Distributed Anycast gateway feature in EVPN, enables an end host in a VNI to always use its local VTEP as the default gateway, to send traffic outside its subnet. In host mobility scenarios, because the gateway IP and MAC address are identical on all VTEPs for a VNI, if the host moves behind a different VTEP, its gateway address still remains the same. Thus, with Anycast Gateways, all VTEPs:

  • For a particular VNI, have identical Gateway Virtual IP Addresses
  • Across All VNIs, have identical Gateway Virtual MAC addresses.

Bridging in EVPN

Operation

Bridging refers to the forwarding of packets between hosts, within the same VNI.

When a switch learns a new MAC + IP route on a particular VLAN, e.g. upon reception of a data packet such as an ARP request,

  • The MAC address is placed in the local switch’s bridge forwarding table.
  • IP to MAC mapping is installed into the ARP/ND table.

The local MP-BGP process, then :

  • learns every new local MAC address, from the local forwarding table, and 
  • learns its corresponding IP route from the ARP/ND table.

MP-BGP then advertises the MAC+IP route to the remote VTEPs, via Type 2 EVPN routes.

On the remote end, the MAC+IP routes that BGP learns, are placed into the BGP table. From there,

  • If the route target community sent with the route matches a local VNI route target import, the route will be placed into that switch’s MAC forwarding table, with the appropriate VXLAN tunnel as its destination.
  • The IP address, if included, will be placed in the EVPN ARP cache.

When a VTEP switch originates MP-BGP EVPN routes for its locally learned end hosts, it uses its own VTEP source address as the BGP next hop. The remote VTEP would learn the originating VTEP address as the next hop, for forwarding to those hosts in the overlay. As such, the next hop value remains unchanged.

EVPN BUM  handling

Choices

  1. Ingress Replication/Head End Replication (Unicast)

    • Ingress VTEP/NVE replicates (separate copy) to each (interested) egress VTEP/NVEs. Replication list is automatically built from the BGP EVPN RT-3 (carrying the VNIs, of interest to a VTEP). 
    • Underlay is transparent to this.
    • Offers Config Simplicity for Underlay.
    • Replication bandwidth required – NVEs ideally less than 128, BUM traffic low.
  2. L3 Underlay Multicast (/Service Node)

    • Single copy to spine, which replicates to all interested VTEPs/NVEs.
    • Multicast support required in underlay.
    • Requires Config in underlay.
    • Can handle high volume of BUM traffic.

ARP (IP4) / ND (IP6) Suppression

ARP suppression in EVPN refers to the ability of a VTEP, to suppress ARP flooding over VXLAN tunnels. Instead, a local proxy handles ARP requests received from locally attached hosts, for remote hosts.  ARP suppression is the implementation for IPv4, where-as ND suppression is the implementation for IPv6.

When an end host in the VNI sends an ARP request for another end-host IP address, its local VTEP intercepts the ARP request and checks for the ARP-resolved IP address in its ARP suppression cache table. If it finds a match, the local VTEP sends an ARP response on behalf of the remote end host. The local host then learns the MAC address of the remote host in the ARP response.

If the local VTEP doesn’t have the ARP-resolved IP address in its ARP suppression table, it floods the ARP request to the other VTEPs in the VNI. An example of where this may happen is the initial ARP request to a silent host in the network. After the silent host  returns an ARP response, the local VTEP learns about its MAC and IP addresses. The MP-BGP EVPN control plane distributes this information to all other VTEPs, so no further flooding is needed in case of further ARP requests for that host.

  • ARP suppression helps with BUM traffic.
  • Kernel does not ARP refresh or Age out entries learnt via a protocol, the protocol removes these when the advertisements are withdrawn e.g. MAC move (VM move).
  • IP Address piece of the MAC+IP address advertisement (Type 2 Routes), is required for ARP suppression to happen.

Typically, the active IP hosts in EVPN would be learned by the VTEPs either through local learning or control-plane-based remote learning. This is because most end stations send GARP or RARP requests to announce themselves to the network immediately after they come online. Therefore, the local VTEP is able to learn the MAC and IP addresses, and distribute this information to other VTEPs through the MP-BGP EVPN control plane.

Routing in EVPN

Routing refers to the forwarding of packets, when source and destination reside in different VNIs.

In EVPN, routing occurs within a VRF Context. This is true, independent of whether the model is symmetric or asymmetric, which we will discuss below. The underlay routing table is assumed to be in the default or global routing table, while the overlay routing table is assumed to be in a VRF-specific routing table. 

With regards to Routing, there are two fundamental questions which the competing models discussed below, try to address. One of them relates to location, other to the manner. In a nutshell,

  1. Will every VTEP/NVE act as an L3 gateway and do routing, or will only specific VTEPs do routing ? (Distributed Vs. Centralized)
  2. Will routing be executed only by the VTEP/NVE at the ingress of the VXLAN tunnel, or will it be done at both the ingress and the egress of the VXLAN tunnel? (Asymmetric vs. Symmetric)

Routing – Where Performed

Centralized

  • When a specific pair/subset of VTEPs, is the first-hop/gateway router for a virtual network.
  • Works well for mostly North-South Traffic.
  • Easier to hang services (ToR Aggregation pair) via these routers.
  • Requires “Default Gateway” extended community support, in BGP.
  • Scale : ARP requests for the whole network must be handled.
  • Because symmetric routing expects the egress VTEP to perform routing, the centralized model does not work with symmetric routing; it works only with asymmetric routing.
  • Traffic Flow 

    • Internal Network :
      • Sub-optimal traffic flow, if most traffic is east-west.
        • Traffic may go leaf-A1 > spine1 > (Border Leaves/Centralized Routers) router 1 > Spine1 > leaf-B2
    • External Network :
      • If spine nodes are border nodes, then spine nodes must function as VTEP as well.
      • Traffic travels Leaf <> Spine, to reach the external network.
      • Spine Switch runs IP4/6 Unicast routing via BGP/Any IGP, in the tenant VRF instances, with the external routing device.
    • Redistributing external routes into EVPN domain :
      • The spine switch learns external routes, and advertises them into  the EVPN domain as EVPN routes, so that other leaf VTEPs can learn/use them for sending outbound traffic.
      • The spine switch can also be configured to send EVPN routes learned in the L2VPN/EVPN address family, to the IPv4/6 Unicast address family, and advertise them to the external routing device.
  • Protocol Support:

    • Leaf ToRs do not need RIOT support.
    • Only the nodes functioning as Gateways, need to support RIOT capability, along with BGP-EVPN control plane. e.g. The spine switch runs MP-BGP EVPN with the other VTEPs, and exchanges EVPN routes with them.

EVPN VXLAN Centralized - hasanmansur.com

Distributed

  • If each VTEP is the first-hop router for its local endpoints, on a virtual network.
  • Widely deployed, no additional protocol support needed.
  • Careful planning on deploying services such as firewalls and load balancers.
    • If needed only for traffic in and out of the data center, deploying in the border leaf block still works.
    • If needed east-west, the best way to deploy the services is on each host itself. i.e. virtualized FW services & Load Balancers.
    • Alternatively, Use VRFs
  • Traffic Flow 

    • Internal Network :
      • The border leaf switch runs MP-BGP EVPN with the other VTEPs, and exchanges EVPN routes with them.
    • External Network :
      • Traffic travels from VTEP <> spine <> border leaf, to reach the external network.
      • Border Leaf switch runs IP4/6 Unicast Routing via BGP/Any IGP,  in the tenant VRF instances, with the external routing device.
    • Redistributing external routes into EVPN domain: 
      • The border leaf switch learns external routes, and advertises them into  the EVPN domain as EVPN routes, so that other leaf VTEPs can learn/use them for sending outbound traffic.
      • The border leaf switch can also be configured to send EVPN routes learned in the L2VPN/EVPN address family, to the IPv4/6 Unicast address family, and advertise them to the external routing device.
  • Protocol Support 
    • Spine switches only run the BGP-EVPN control plane and IP routing; they  do not need to support the VTEP function.
    • All Leaf ToRs must support RIOT/VTEP capability.

EVPN VXLAN Distributed - hasanmansur.com

Centralized Vs. Distributed – EVPN Routes

  • A centralized architecture does not use EVPN for L3 routing. All the routing happens on a centralized switch, but EVPN routes are leveraged for VXLAN bridging, and features such as ARP suppression and mobility.
  • A distributed architecture involves distributing EVPN routes, that are used for both routing and bridging VXLAN tunnels, to each ToR leaf switch. A distributed architecture provides routing closest to the hosts.

Routing – How Performed

Routing in EVPN could be done either in Asymmetric, or Symmetric models.

Asymmetric 

Assume an endhost in one VNI, intends to send a packet to a host in a different VNI (i.e.  hosts in different VLANs, requiring inter VLAN routing). When a packet arrives at an ingress VTEP, the VTEP does a FIB lookup for the destination IP address in the destination VNI. Upon locating the IP address, it next does a lookup in the ARP table, to find out which VTEP this host is sitting behind. It then sends the packet to the particular egress VTEP, for delivery to the destination host. in Asymmetric model, the Ingress VTEP puts the destination VNI in the packet, and bridges the packet directly after routing it to the target VNI. Therefore,

  • Ingress VTEP routes, whereas the egress VTEP only bridges
  • Each VTEP must be provisioned with all VLANs/VNIs, even if there are no locally-attached hosts for a particular VLAN.
  • As an extension to the above point, Asymmetric Routing ONLY works if destination subnet is locally attached to first-hop router. Otherwise, if it is not local, the solution is symmetric routing. 
  • Each VTEP has to learn & maintain ARP and MAC info, for all VNIs in the overlay, regardless of whether it has locally attached hosts in a VNI or not.
  • Supported by Dell EMC, Cumulus, Juniper etc.

Symmetric 

In this model, There is a transit VNI/VRF involved as well, present on both Ingress & Egress VTEPs. In contrast, think of the regular VNIs, which map to VLANs, as L2 VNIs. The Ingress VTEP routes the packet from the source VNI to the transit VNI. The Egress VTEP takes the packet from this transit VNI, and route it to the destination VNI. With symmetric routing,

  • Both the ingress VTEP and egress VTEP route the packets, unlike Asymmetric model.
  • On ingress VTEP, the source VNI determines the VRF which also provides the L3/Transit VNI to be used. On the egress VTEP, the L3 VNI in the packet determines the VRF to be used in the route table lookup.
  • Because symmetric routing expects the egress VTEP to perform routing, the centralized model does not work with symmetric routing; it works only with  asymmetric routing. 
  • Symmetric requires non-default VRF support.
  • Each VTEP has to learn & maintain ARP and MAC info, only for VNIs in which it has membership.
  • Supported by Cumulus, Cisco, Arista. Radmap for Dell EMC OS10, for release 10.5.x.

Differences Between Asymmetric & Symmetric

  • VNI

    • Symmetric Routing makes use of a Transit VNI, which is different than the L2 VNIs where the source and destination enspoints are stationed. In Asymmetric model, the Ingress VTEP routes directly into the destination VNI at the first hop. in Symmetric, Ingress VTEP routes into the L3/Transit VNI, and Egress VTEP routes from the transit VNI and into the target/destination L2 VNI of the destination.
    • Symmetirc model, scales much better, since each VTEP only attends to its local VNIs. There is no need to configure on the local VTEP, all the VNIs present in the network, as in the Asymmetric model. 
  • Route Table Vs. ARP/ND Table

    • In Asymmetric routing, Because the packet is bridged after routing—i.e., the destination network is local to the ingress VTEP— all the remote endpoints are present in  ARP/ND table & MAC forwarding tables, while the routing table remains mostly empty. VTEPs have to learn and maintain, ARP & MAC for all VNIs.
    • In Symmetric routing, If the destination network is locally connected/available, the behavior is the same as asymmetric routing. For destination networks that are not local, the routing table contains all the remote endpoints (as /32 routes), with the respective egress VTEP as the next hop. For the remote hosts, the egress VTEP is the only entry in the ARP/ND table & the MAC forwarding table. VTEPs have to learn & maintain, ARP & MAC Only for participating VNIs. The remote endpoints show in route table, as /32 entries.
  • VRF

    • Asymmetric routing can function in the same VRF as the underlay network, i.e. in the default VRF, if the user does not have a  requirement for multiple VRFs.
    • Symmetric routing requires the explicit configuration of an L3 or VRF VNI. It cannot function in the default VRF.

—–

In the next post, I will review Feature Availability for VXLAN with BGP EVPN, in Dell EMC OS10. To make it more useful, I also intend to include parity comparisons with Cumulus VXLAN with BGP EVPN, Dell EMC OS9 Static VXLAN, as well as Pluribus’ Fabric Control plane based VXLAN implementation. That post should hopefully land next week.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s