Pluribus NetVisor Linux + Adaptive Cloud Fabric : Detailed | Dell EMC Networking

Pluribus became a part of the Dell EMC SDN Eco-system in 2016. It has significant wins globally in Cloud/hosting, Service Provider Core and Enterprise space.

The OS, called Netvisor, is based on Canonical’s Ubuntu. Netvisor drives the Abstraction & Virtualisation of the switch hardware, allowing the creation of independent, virtual networks on top. This in turn, is key to the constructs of multi-tenancy, NFV, network containers & their mobility etc offered by the solution. Pluribus sees Netvisor’s depth of virtualisation as a key differentiation against the competition.

Beyond the OS, where the solution truly shines is when its fabric technology, Adaptive Cloud Fabric, is brought into context. There are no Centralised Controllers (as in Big Switch Cloud Fabric) with ACF. It presents a distributed architecture where each switch can manage/control the entire fabric. Thus, Fabric itself appears as one programmable switch. CLI from any switch can, via fabric-wide scope, execute changes across any switch in the fabric.

Netvisor appears to be built more in line with the principles of cluster computing, rather than a traditional NOS. It is a peer-to-peer fabric, the switches have no Master/Slave relationship. All of them perform local switching (vs. Master), all execute their control planes independently.

As a DCI (VXLAN based), which i will look into in detail in a separate post – ACF still presents a single, distributed fabric with centralised provisioning, across geographically dispersed DCs. You can scale the solution to very healthy numbers (28 VTEPs on Dell EMC 40G platforms) and manage the entire fabric from any CLI/Console, from any location. This ability to execute management from a single point in DCI solutions, is not seen with EVPN, box-by-box multicast or Static solutions. Nor is it found with Cisco OTV, Big Switch VXLAN etc. (Cisco stretched ACI can facilitate this).

One of the components which lends ONVL its power & versatility, is vPorts.  This is a software based, distributed table made up of endpoint info (contrast with Big Switch, Cloud Fabric’s information base, known as VFT – which too is software based, but Centralised. VFT is held on the controller, which syncs down & is programmed into the switch forwarding tables. I will cover Cloud Fabric operations in a separate post). Think of it as an L2 table, but with rich meta data (including vSphere metadata) and additional L3 info included. It is a principal means of distributing intelligence across the fabric.

The responsibility for maintaining the vPort database, as well as programming the forwarding tables in ASIC, based on results from peer-to-peer protocols like xSTP & MAC learning,  lies with nvOSd control plane. It is important to recognise that we are talking about two distinct planes – a software based fabric control plane, and a traditional underlay data/control plane. The two are distinct, but co-exist.

vPort database is fabric wide, with each switch TCP connected into every other switch to enable nvOSd to ensure consistency/maintenance of vPorts. Configuration changes are executed via a database style 3-phase commit. Fabric-wide rollback is available. vPorts in a sense replace the hardware forwarding table on switches. Ergo,

  • vPorts essentially virtualize the L2/3 forwarding tables
  • Those hardware tables, now behave like a cache.

The “cache” behaviour of hardware tables is Conversational. The cache is populated Only in case of an active conversation between end stations – the entry has to be present in the vPort database already. It can age out, and if it needs re-programming due to a new conversation, it can be re-activated. This is in contrast to traditional Layer 2 control plane, which relies on flooding for learn/re-learn. 

Pluribus ONVL ACF DCI - hasanmansur.com

L3 control plane (OSPF, BGP etc) is executed within separate, dedicated containers – consequently enabling multi-tenancy. vRouters are implemented within respective containers for a tenant, and can be applied to the entire fabric, or individual vNets. For now, Dell EMC Networking switches support upto two vRouters.

This is complemented by isolated Management interfaces for vPods, via vNet Manager, giving tenants visibility only to their respective resources.

The presence of traditional L2 and L3 control plane protocols allows integration (in brownfield environments) via familiar constructs – xSTP, OSPF, BGP etc. From an architecture perspective, it makes it very easy for me to insert these switches as a ToR pair in a server/VM aggregation layer, or as a DCI ToR pair in a dedicated DCI Edge block, while having it connect into the Core Network via traditional protocols. I still get a single fabric and single point of management despite the scattered blocks over multiple sites, and there is no topology lock-in. In Contrast, the Big Switch Cloud fabric requires me to plan things differently in multiple aspects. One of the foremost differences being BCF’s Controller footprint/Architecture, which increases linearly with each site.

Back to Pluribus and ACF. Automation is achieved via Ansible & Fabric Manager (among other options), and requires the Fabric License (as opposed to Enterprise license – i will cover the difference in a separate post, as long as i remember to).

UNUM is the Management + Automation + Analytics Platform, including as a component, Insight Analytics. Analytics add another axis to ONVL. Each switch has built-in capability to operate as a Flow Broker (sensor) and extract flow metadata. This does require external Analytics server to which the data is exported/recorded.

Every switch has fabric-wide visibility into

  • Every VM/End-point, their lifecycle and mobility events.
  • Every application flow/telemetry, regardless of whether that flow is passing through a particular switch

The end point location/mobility/life-cycle information is recorded in the vPort database I referenced earlier, of which each switch has a copy.

UNUM is also the vehicle for setting up the fabric. It takes care of discovery, and presents as a choice, various topology & protocol options. Fabric setup includes automation for setting up clusters, ip addresses, BGP, VRRP etc. It verifies the end result too.

I will cover DCI, VMWare integration, and fabric operations in separate, subsequent posts.

Leave a Reply