VXLAN on VMware NSX: VTEP, proxy, Unicast/Multicast/Hybrid mode

Series

This post is part of a series, other related posts are:

Abstract

Virtual Extensible LAN (VXLAN) is a network which help to build an overlay network and it's the base of network virtualization. In simple words, VXLAN encapsulates Ethernet frames on a UDP routable packet. With VXLAN a single L2 segment can span L3 boundaries. Moreover VXLAN overcomes VLAN limits: 802.1q standard define a maximum of 4094 VLANs, VXLAN define a maximum of 2^24 VNIs (VXLAN Network Identifier).

Introduction

This post will point attention on VXLAN as implemented on VMware NSX; this is not a technical discussion on VXLAN. For more VXLAN details, please read RFC7348.

Because VXLAN encapsulates Ethernet frames, VXLAN can be considered a “tunnel”, a “multipoint tunnel” to be more accurate. Every ESXi host enabled for NSX is configured with an additional VMkernel port used as VTEP (VXLAN Tunnel End Point). In other words a VTEP is a host interface which forward Ethernet frames from a virtual network via VXLAN or vice-versa.

All hosts with the same VNI configured must be able to retrieve and synchronize data (ARP and MAC tables for example). Let’s discuss the following diagram:

VXLAN lab

Each VNI can be configured as Unicast, Multicast or Hybrid. In Unicast and Hybrid mode a VTEP Proxy is elected for each (physical) network segment. A VTEP Proxy is responsible for local replication of VXLAN frames. In Unicast mode a VTEP Proxy is called UTEP (Unicast Tunnel End Point), in Hybrid mode it’s called MTEP (Mulsticast Tunnel End Point). UTEP and MTEP are still VTEPs, with the responsibility of local replication. Every VXLAN packet with the REPLICATE_LOCALLY bit set is replicated from the VTEP Proxy to each local VTEPs.

Moreover with Unicast and Hybrid mode a NSX Controller is mandatory: it acts as a cache server for ARP and MAC tables, and it’s resposible of UTEP and MTEP elections.

Unicast Mode

If the VNI 5002 is configured in Unicast mode, each VTEP replicates encapsulated frames using unicast to every local VTEP and to every remote UTEP. In the following example we can suppose ESXi1 and ESXi3 elected as UTEP, and ESXi2 is a simple VTEP.

If VM1 want to send data to VM2, it has to know VM2 MAC address. VM1 send a broadcast ARP request to query all VM within the same network (VNI 5002). The ESXi1 host query the NSX Controller for the VM2 ARP entry. Suppose that NSX Controller lacks the information, the VM2 sends the ARP Reply and both ESXi1 and NSX controller update the VM2 ARP. VXLAN encapsulation does not occur in this case.

If VM1 want to send data to VM3, and neither VM1 or NSX controller knows the MAC address, the ESXi1 host forward the broadcast ARP request via unicast VXLAN to every local VTEP (ESXi2) and to all remote UTEP (ESXi3). The VM3 replies to the ARP request and ESXi1, ESXi2 and NSX Controller updates the ARP entry of VM3.

Hybrid Mode

If the VNI 5002 is configured in Hybrid mode, each VTEP replicates encapsulated frames using multicast to every local VTEP and using unicast to every remote UTEP. In the following example we can suppose ESXi1 and ESXi3 elected as UTEP, and ESXi2 is a simple VTEP.

If VM1 want to send data to VM2, it has to know VM2 MAC address. VM1 send a broadcast ARP request to query all VM within the same network (VNI 5002). The ESXi1 host query the NSX Controller for the VM2 ARP entry. Suppose that NSX Controller lacks the information, the VM2 sends the ARP Reply and both ESXi1 and NSX controller update the VM2 ARP. VXLAN encapsulation does not occur in this case. Each UTEP (ESXi3) must replicate incoming VXLAN packets to every local VTEP.

If VM1 want to send data to VM3, and neither VM1 or NSX controller knows the MAC address, the ESXi1 host forward the broadcast ARP request via multicast VXLAN to every local VTEP (ESXi2) and via unicast VXLAN to all remote MTEP (ESXi3). The VM3 replies to the ARP request and ESXi1, ESXi2 and NSX Controller updates the ARP entry of VM3. Each UTEP (ESXi3) must replicate incoming VXLAN packets to every local VTEP.

Multicast Mode

If the VNI 5002 is configured in Multicast mode, each VTEP sends data using multicast and each local and remote VTEP is able to get a copy of it. VTEP Proxy role does not exist with Multicast mode.

If VM1 want to send data to VM2, it has to know VM2 MAC address. VM1 send a broadcast ARP request to query all VM within the same network (VNI 5002). The ESXi1 host encapsulates the broadcast frame in a VXLAN multicast packet and every other VTEP node receives that VXLAN packet.

If VM1 want to send data to VM3, and VM1 does not know the MAC address, the ESXi1 host forward the broadcast ARP request via multicast VXLAN to every VTEP (ESXi2 and ESXi3). The VM3 replies to the ARP request and ESXi1 and ESXi2 updates the ARP entry of VM3.

Design consideration

Unicast mode is the default configuration when adding a VNI. It’s the most easiest option and it doesn’t require any multicast support by the physical network. The drawbacks are:

  • high local network physical utilization (remote VTEPs still receive a single VXLAN packet replicated by proxies);
  • high ESXi CPU usage because of sending the same VXLAN packet to every single local VTEP and remote UTEP.

Hybrid mode is a good compromise It requires multicast support for the local network (by default should work on every switch) and uses unicast and VTP Proxy for remote replication.

Multicast mode is the most scalable configuration but the multicast support on between L3 networks could not be easy to achieve. Multicast mode could be the best choice for customers which already have vCloud Networking and Security and are willing to migrate to VMware NSX.

Posted on 10 Feb 2015 by Andrea.
  • Gmail icon
  • Twitter icon
  • Facebook icon
  • LinkedIN icon
  • Google+ icon