Friday, February 7, 2014

VXLAN Basics

VXLAN is an acronym for Virtual eXtensiable Local Area Network. A joint venture by a few companies of which Cisco is one of them.

VXLAN is yet another overlay solution which provides a solution to facilitate sending Layer-2 traffic over a Layer-3 network.

Also, one major reason for VXLAN [as per me at least] is due to the importance virtualization and cloud computing are gaining over the recent years. To isolate huge number customers in a cloud environment becomes extremely challenging when we have just 4,096 VLANs currently available to us [of which again, some cannot be used].

Even though VLANs add a 4 byte header to a packet, they effectively use only 12 bits for VLAN tagging. VXLAN has doubled the bits it uses to 24 for tagging information, which immediately scales the value from 4,096 to 16,000,000 [16 million].

So what VXLAN essentially does is very simple, in the sense, it just encapsulates the original data into another frame [UDP frame, and Cisco uses port '8472']. Since this is an encapsulation solution, it adds an overhead of close to about 50 bytes. The first packet sent out from the router performing VXLAN encapsulation [called VTEP router] is a multicast packet. Subsequent messages would become unicast packets. Its worth mentioning here that only unknown unicasts are flooded using multicast. The MAC addresses once known are not flooded anymore.

Before we try out basic configurations related to VXLAN the below terminologies are best to keep in mind:

VTEP [Virtual Tunnel End-Point] - This in our case will the CSR1000V [Cisco's cloud services router]. This router will be used to encapsulate the Layer-2 frames at the source side and strip the encapsulation at the receiver side.

VNI [VXLAN Network Identifier] - Each of the 16 million VXLAN ID's which are available at our disposal is known as a VNI. VNI's work very similarly to our legacy VLAN's and provide isolation between the VNI traffic.

The multicast mode used is PIM-BIDIR which is a tweaking of PIM sparse-mode.

Lets now move forward and try out our very own VXLAN configuration:
With the current implementation [XE-3.11], VXLAN on Cisco [CSR1000v, to be specific] supports only multicast mode.

Topology:


  1. We have Router1 and Router2 [both CSR1000V] and we have a core router [ASR1K]
  2. It worth keeping in mind, Router1 and Router2 are virtual routers which were spawned on a UCS server [I am using ESXi 5.5]
  3. The VM's are also part of the same UCS server
  4. Yet another point of interest is that to configure VXLAN related commands your CSR1000V should have the premium license
Configurations:

Before we start VXLAN configuration, let us first ensure the basic routing and multicast related configurations are in place:

Router1:

ip multicast-routing distributed
router ospf 100
 router-id 1.1.1.1
interface GigabitEthernet2
 description "connected to core"
 ip address 10.1.1.1 255.255.255.0
 ip pim sparse-mode
 no shutdown
 ip ospf 100 area 100
ip pim bidir-enable
ip pim rp-address 100.100.100.100 bidir

CORE:

ip multicast-routing distributed
router ospf 100
 router-id 2.2.2.2
interface GigabitEthernet0/0/0
 description "connected to Router1"
 ip address 10.1.1.2 255.255.255.0
 ip pim sparse-mode
 no shutdown
 ip ospf 100 area 100
interface GigabitEthernet0/0/1
 description "connected to Router2"
 ip address 11.1.1.2 255.255.255.0
 ip pim sparse-mode
 no shutdown
 ip ospf 100 area 100
interface Loopback100
 ip address 100.100.100.100 255.255.255.255
 ip pim sparse-mode
 ip ospf 100 area 100

ip pim bidir-enable
ip pim rp-address 100.100.100.100 bidir

Router2:

ip multicast-routing distributed
router ospf 100
 router-id 3.3.3.3
interface GigabitEthernet2
 description "connected to core"
 ip address 11.1.1.1 255.255.255.0
 ip pim sparse-mode
 ip ospf 100 area 100
 no shutdown
ip pim bidir-enable
ip pim rp-address 100.100.100.100 bidir

With the above configurations, we should be able to reach all the routers in our topology. Now let us go onto the next phase of configuration, that is, VXLAN:

Router1:

interface Loopback100
 ip address 10.10.10.10 255.255.255.255
 ip pim sparse-mode
 ip ospf 100 area 100
interface nve1
 no shutdown
 source-interface Loopback100
interface GigabitEthernet3
 description "connected to VM1"
 no shutdown
 service instance 10 ethernet
  encapsulation dot1q 10
  rewrite ingress tag pop 1 symmetric
bridge-domain 10
 member GigabitEthernet3 service-instance 10
  1. Interface NVE [network virtualization endpoint] is the one on which we configure the VNI and multicast mapping
  2. We cannot assign an IP address to this interface, hence we use a Loopback interfaces as its source-interface [which once assigned immediately creates a tunnel interface, the VTEP]
  3. The Loopback IP address should be reachable
  4. Finally, we have ethernet virtual circuit [EVC] configuration on the GigabitEthernet3 to make that an Layer-2 interface 
    1. As per the above configuration, we expect to receive a tagged traffic "VLAN 10". By using "rewrite ingress tag pop 1 symmetric" we are essentially removing the tagging related to ingress traffic and re-adding the tagging to the traffic going out from this interface
The next configuration is the binding configuration, which is shown based on what is observed on the router:

Router1(config)#interface nve 1
Router1(config-if)#member vni ?
  WORD  VNI range or instance between 4096-16777215 example: 6010-6030 or 7115

Router1(config-if)#member vni 4096 ?
  mcast-group  Configure multicast group for vni(s)

Router1(config-if)#member vni 4096 mcast-group ?
  A.B.C.D  Starting Multicast Group IPv4 Address

Router1(config-if)#member vni 4096 mcast-group 225.1.1.1
Router1(config-if)#exit

The range of VNI is from 4096 to 16777215. After the above configuration, the final configuration that needs to be done is binding this VNI to our bridge-domain:

bridge-domain 10
 member vni 4096

A very similar configuration repeats on Router2:

Router2:

interface Loopback100
 ip address 11.11.11.11 255.255.255.255
 ip pim sparse-mode
 ip ospf 100 area 100
interface nve1
 no shutdown
 source-interface Loopback100
 member vni 4096 mcast-group 225.1.1.1
interface GigabitEthernet3
 description "connected to VM1"
 no shutdown
 service instance 10 ethernet
  encapsulation dot1q 10
  rewrite ingress tag pop 1 symmetric
exit                ! in case you are pasting this configuration
exit                ! in case you are pasting this configuration
bridge-domain 10
 member vni 4096
 member GigabitEthernet3 service-instance 10

That ends our configuration. Now, let us send our most favorite ping traffic from VM1 to VM2.
But again, before we do this, let me do a quick EPC configuration to see what kind of packets we see going out of Router1.

The configuration details:

A very basic level EPC

Router1#monitor capture vxlan interface gigabitEthernet 2 both match any buffer size 200
Router1#monitor capture vxlan start

Now, ping from VM1 to VM2:

[root@localhost ~]# ping 172.16.11.111 -c 5
PING 172.16.11.111 (172.16.11.111) 56(84) bytes of data.
64 bytes from 172.16.11.111: icmp_seq=1 ttl=64 time=4.33 ms
64 bytes from 172.16.11.111: icmp_seq=2 ttl=64 time=1.52 ms
64 bytes from 172.16.11.111: icmp_seq=3 ttl=64 time=1.75 ms
64 bytes from 172.16.11.111: icmp_seq=4 ttl=64 time=2.05 ms
64 bytes from 172.16.11.111: icmp_seq=5 ttl=64 time=1.79 ms

--- 172.16.11.111 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4001ms
rtt min/avg/max/mdev = 1.526/2.293/4.337/1.035 ms
[root@localhost ~]#


Then, stop the EPC capture:

Router1#monitor capture vxlan stop  

Check the capture contents / export them to a machine if you prefer to view via wireshark.
To export use this command:

Router1#monitor capture vxlan export tftp://<IP_address>/<location>/<file_name>.pcap
!!
Exported Successfully

Router1#


You will see the following if viewed on Router1:

Router1#show monitor capture vxlan buffer brief
 -------------------------------------------------------------
 #   size   timestamp     source           destination   protocol
 -------------------------------------------------------------
   2  110    4.497975   10.10.10.10      ->  225.1.1.1        UDP
   3  110    4.498982   11.11.11.11      ->  10.10.10.10      UDP

   4  148    4.498982   10.10.10.10      ->  11.11.11.11      UDP
   5  148    4.499974   11.11.11.11      ->  10.10.10.10      UDP
   6  148    5.496984   10.10.10.10      ->  11.11.11.11      UDP
   7  148    5.497975   11.11.11.11      ->  10.10.10.10      UDP
   8  148    6.496984   10.10.10.10      ->  11.11.11.11      UDP
   9  148    6.498982   11.11.11.11      ->  10.10.10.10      UDP
  10  148    7.497975   10.10.10.10      ->  11.11.11.11      UDP
  11  148    7.499974   11.11.11.11      ->  10.10.10.10      UDP
  13  148    8.496984   10.10.10.10      ->  11.11.11.11      UDP
  14  148    8.498982   11.11.11.11      ->  10.10.10.10      UDP
  17  110    9.498982   11.11.11.11      ->  10.10.10.10      UDP
  18  110    9.499974   10.10.10.10      ->  11.11.11.11      UDP

       
Router1#


I have removed the OSPF and PIM captures. Apart from those, the ping traffic from VM1 to VM2 triggers our first UDP packet, with source as 10.10.10.10 and destination 225.1.1.1.

As mentioned earlier, this is the first message sent out from Router1 [our VTEP]. The source and destination port of this packet is 8472 which is the default VXLAN port.

However, the subsequent UDP messages are sent out with arbitrary source ports with destination ports retained as 8472 and are unicast packets. The source and destination addresses we observe are that of our Router1's and Rotuer2's Loopback addresses.

The below command will give the number of packet's sent / received via the NVE interface:

Router1#show nve interface nve 1 detail
Interface: nve1, State: Admin Up, Oper Up Encapsulation: Vxlan
source-interface: Loopback100 (primary:10.10.10.10 vrf:0)
   Pkts In   Bytes In   Pkts Out  Bytes Out
         7        666          7        666
Router1#


From above, 5 are ping packets and the remain 2 are related to VXLAN. The same have been highlighted in the EPC capture.

Router1#show nve peers
Interface  Peer-IP          VNI        Up Time     
   nve1    11.11.11.11      4096       -        
Router1#


The above is the binding data which will give information. However, this information will expire after sometime [provided the traffic flow inactive].

Here is one small portion of VXLAN UDP tweaking which by default uses 8742, and we use the "vxlan udp port" in the global config-mode to make this change.

The usage of this is very simple, but, has to be configured on both the VTEP routers.

The port being used by VXLAN can be viewed using the show command "show platform software vxlan F0 udp-port". The same is displayed below:

Router1#show platform software vxlan F0 udp-port
VXLAN UDP Port: 8472

Router1#


Now, let's go ahead and make the port change:

Router1(config)#vxlan udp port ?
  <1024-65535>  Port number

Router1(config)#vxlan udp port 1025 ?
  <cr>

Router1(config)#

Lets see the show command to see the change in the VXLAN UDP port:

Router1#show platform software vxlan F0 udp-port
VXLAN UDP Port: 1025

Router1#


If we fail to configure Router2 with the same port, we end up seeing 'OverlayBadPkt' drops on Router2. This can be verified after sending traffic from VM1 to VM2.

So, let us configure the same port on Router2 as well:

Router2(config)#vxlan udp port 1025

Now, let me send the ping traffic again:

[root@localhost ~]# ping 172.16.11.111 -c 5
PING 172.16.11.111 (172.16.11.111) 56(84) bytes of data.
64 bytes from 172.16.11.111: icmp_seq=1 ttl=64 time=4.48 ms
64 bytes from 172.16.11.111: icmp_seq=2 ttl=64 time=1.73 ms
64 bytes from 172.16.11.111: icmp_seq=3 ttl=64 time=1.69 ms
64 bytes from 172.16.11.111: icmp_seq=4 ttl=64 time=1.81 ms
64 bytes from 172.16.11.111: icmp_seq=5 ttl=64 time=1.56 ms

--- 172.16.11.111 ping statistics ---
5 packets transmitted, 5 received, 0% packet loss, time 4003ms
rtt min/avg/max/mdev = 1.566/2.258/4.480/1.114 ms
[root@localhost ~]#


Details can be captured via EPC and the same viewed using wireshark.

Hope you found this post informative.

3 comments:

  1. Had the same lab for testing in my college, this helps a lot, but for my solution i had to change the tagging from "encapsulation dot1q 10" to "encapsulation untagged"...

    hope it helps ;-)

    ReplyDelete
  2. what type of port facing the service instance port? is it a trunk or an access port in VLAN 10?

    ReplyDelete
    Replies
    1. you can make this face a access port of VLAN 10 but it would be best to have it linked to a trunk port

      Delete