Security policies on vSwitch/dvSwitch

As described on previous posts both vSwitch and dvSwitch can enforce networking through three policies:

Option Default on
vSwitch dvSwitch PortGroup
Promiscuous mode Reject Reject
MAC address changes Accept Reject
Forged transmits Accept Reject

Let’s describe what each policy can prevent and cannot.

Promiscuous mode

The promiscuous mode allows a VM to put a vNIC into promiscuous mode and receive traffic destined to other VM. Turning promiscuous mode into accept make the vSwitch/dvSwitch acting like an hub: each VM can receive all traffic. It can be useful if a SPAN port is required and a dvSwitch is not available. Should be configured in reject mode unless a valid reason exists.

Dumping traffic with promiscuous mode on (excluding traffic destined to local host):

# tcpdump -nn ! host 172.31.30.13
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:05:47.491738 IP 172.31.30.4.443 > 172.31.30.8.42164: Flags [P.], seq 2763936949:2763939151, ack 3368237223, win 510, options [nop,nop,TS val 208082165 ecr 67886826], length 2202
14:05:47.491907 IP 172.31.30.4.443 > 172.31.30.8.42164: Flags [F.], seq 2202, ack 1, win 510, options [nop,nop,TS val 208082165 ecr 67886826], length 0
14:05:47.492012 IP 172.31.30.8.42164 > 172.31.30.4.443: Flags [.], ack 2203, win 43, options [nop,nop,TS val 67931292 ecr 208082165], length 0
14:05:47.496139 IP 172.31.30.8.42164 > 172.31.30.4.443: Flags [P.], seq 1:38, ack 2203, win 43, options [nop,nop,TS val 67931293 ecr 208082165], length 37

Traffic generated from all VMs located on the same host has been captured.

Dumping traffic with promiscuous mode off (excluding traffic destined to local host):

# tcpdump -nn ! host 172.31.30.13
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes
14:05:30.152436 ARP, Request who-has 172.31.30.6 tell 172.31.30.4, length 46
14:05:30.909774 ARP, Reverse Request who-is 00:1a:4b:be:f6:c2 tell 00:1a:4b:be:f6:c2, length 46
14:05:31.320636 ARP, Request who-has 172.31.30.6 tell 172.31.30.4, length 46
14:05:31.909751 ARP, Reverse Request who-is 00:50:56:af:49:3a tell 00:50:56:af:49:3a, length 46

Only broadcast packets can be captured.

MAC address changes

The MAC address changes option allow the VM to change the MAC address of the vNIC. If configured to “Reject”, the VM won’t be able to use the vNIC iwth a custom MAC address (no frames will be sent or received):

# ifconfig eth0
eth0      Link encap:Ethernet  HWaddr 00:50:56:af:32:9b
          inet addr:172.31.30.13  Bcast:172.31.30.31  Mask:255.255.255.224
          inet6 addr: fe80::250:56ff:feaf:329b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:32147 errors:0 dropped:0 overruns:0 frame:0
          TX packets:18764 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:31133273 (31.1 MB)  TX bytes:2198415 (2.1 MB)

# ifconfig eth0 hw ether 00:50:56:af:32:b9

If the “MAC address changes” is set to “Accept”, then the VM will became unreachable unless “Forged transmits” is set to “Accept” too. The reason is quite simple:

  • MAC address changes: enable or disable the VM to change it’s MAC address;
  • Forged transmits: enable or disable the VM to generate frames with a different MAC address.

Both options must be set to “Accept” if a VM needs to changes it’s own MAC address. It should be configured with Microsoft Network Load Balancing in unicast mode or nested virtualization.

Forged transmits

The Forged transmits option allow the VM to generate (send/forward) frames with a different source MAC address. In the following example a soft switch will be configured and used as source IP:

# ifdown eth0
# ifconfig eth0 0.0.0.0 up
# brctl addbr br0
# brctl addif br0 eth0
# ifconfig br0 172.31.30.13 netmask 255.255.255.224 up
# ifconfig br0
br0       Link encap:Ethernet  HWaddr 00:50:56:af:32:9b
          inet addr:172.31.30.13  Bcast:172.31.30.31  Mask:255.255.255.224
          inet6 addr: fe80::250:56ff:feaf:329b/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:362 errors:0 dropped:52 overruns:0 frame:0
          TX packets:208 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:25694 (25.6 KB)  TX bytes:25193 (25.1 KB)

# ping -c3 172.31.30.1
PING 172.31.30.1 (172.31.30.1) 56(84) bytes of data.
64 bytes from 172.31.30.1: icmp_seq=1 ttl=64 time=0.497 ms
64 bytes from 172.31.30.1: icmp_seq=2 ttl=64 time=0.355 ms
64 bytes from 172.31.30.1: icmp_seq=3 ttl=64 time=0.317 ms

--- 172.31.30.1 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2004ms
rtt min/avg/max/mdev = 0.317/0.389/0.497/0.080 ms

It’s obvious that “Forged transmits” option allow a VM to acts as a switch, if multiple vNICs are configured. On most environment this is NOT desired. Moreover this design can lead to a deny of service attack who can bring an entire VMware cluster unreachable.

Please note that “MAC address changes” is still set to “Accept”: bridge interface use the same MAC of physical interface, but other VM attached to the linux bridge won’t be able to send and receive frames.

BPDU Filter

Each administrative user can potentially send BPDU out of a VM:

# brctl stp br0 on
# brctl show
bridge name     bridge id               STP enabled     interfaces
br0             8000.005056af329b       yes             eth0
# brctl showstp br0
br0
 bridge id              8000.005056af329b
 designated root        8000.005056af329b
 root port                 0                    path cost                  0
 max age                  20.00                 bridge max age            20.00
 hello time                2.00                 bridge hello time          2.00
 forward delay            15.00                 bridge forward delay      15.00
 ageing time             300.00
 hello timer               0.62                 tcn timer                  0.00
 topology change timer     0.00                 gc timer                 137.64
 flags

eth0 (1)
 port id                8001                    state                forwarding
 designated root        8000.005056af329b       path cost                  2
 designated bridge      8000.005056af329b       message age timer          0.00
 designated port        8001                    forward delay timer        0.00
 designated cost           0                    hold timer                 0.00
 flags

The VM has become the Root Bridge and the physical switch is now a leaf on the STP topology:

#show spanning-tree vlan 1

VLAN0001
  Spanning tree enabled protocol ieee
  Root ID    Priority    32768
             Address     0050.56af.329b
             Cost        4
             Port        1 (GigabitEthernet1/0/1)
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec

  Bridge ID  Priority    32769  (priority 32768 sys-id-ext 1)
             Address     0022.be9e.a200
             Hello Time   2 sec  Max Age 20 sec  Forward Delay 15 sec
             Aging Time 300

Interface        Role Sts Cost      Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
Gi1/0/1          Root FWD 4         128.1    P2p
Gi1/0/2          Desg FWD 4         128.2    Edge P2p
Gi1/0/3          Desg FWD 4         128.3    Edge P2p
Gi1/0/4          Desg FWD 4         128.4    Edge P2p
Gi1/0/48         Desg FWD 4         128.48   Edge P2p

If the physical switch is configured with bpdugard (as best practices suggest), the situation is even worse:

#show int status

Port      Name               Status       Vlan       Duplex  Speed Type
Gi1/0/1   esxi1:vmnic0       err-disabled unassigned   auto   auto 10/100/1000BaseTX
Gi1/0/2   esxi1:vmnic1       err-disabled unassigned   auto   auto 10/100/1000BaseTX
[...]
# show log
[...]
Aug 11 15:19:14.441: %SPANTREE-2-BLOCK_BPDUGUARD: Received BPDU on port GigabitEthernet1/0/1 with BPDU Guard enabled. Disabling port.
Aug 11 15:19:14.441: %PM-4-ERR_DISABLE: bpduguard error detected on Gi1/0/1, putting Gi1/0/1 in err-disable state
Aug 11 15:19:15.447: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/1, changed state to down
Aug 11 15:19:16.236: %SPANTREE-2-BLOCK_BPDUGUARD: Received BPDU on port GigabitEthernet1/0/2 with BPDU Guard enabled. Disabling port.
Aug 11 15:19:16.236: %PM-4-ERR_DISABLE: bpduguard error detected on Gi1/0/2, putting Gi1/0/2 in err-disable state
Aug 11 15:19:16.462: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/1, changed state to down
Aug 11 15:19:17.242: %LINEPROTO-5-UPDOWN: Line protocol on Interface GigabitEthernet1/0/2, changed state to down
Aug 11 15:19:18.258: %LINK-3-UPDOWN: Interface GigabitEthernet1/0/2, changed state to down

Now the ESXi host has become unreachable in less than one second. All VMs will be relocated to another host on the same cluster, and so on…

To overcome, BPDU guard should be enabled on each ESXi host:

bpdugard

The option is Net.BlockGuestBPDU = 1. Using the CLI:

# esxcli system settings advanced set -o /Net/BlockGuestBPDU --int-value=1

Now BPDU sent by VMs will be filtered out: mind that loop won’t be detected anymore.

References

Posted on 11 Aug 2014 by Andrea.
  • Gmail icon
  • Twitter icon
  • Facebook icon
  • LinkedIN icon
  • Google+ icon