Capturing packets of VMware machines, part 1

I have always been the guy in our network analysis team responsible for the actual capture of network packets. I bought all the recording hardware we used, acquired network TAPs of all sorts and speeds, and did most of the planning of where to put which engine.

One of the most complicated analysis jobs took two weeks to plan, and involved major headaches like SSL encrypted links, a load balancer, NAT devices and a huge VMware infrastructure. The VMware part was the biggest challenge of all, because we had to find a place where we could capture the traffic of three virtual machines running inside a DRS cluster, and we had to make sure we really didn’t miss anything coming or going to these servers.

Later, when I was teaching Wireshark courses at Fast Lane, the topic of capturing the traffic of virtual machines came up every once in a while when I spoke about data capturing methodology in class. Since I’m also a certified VMware instructor it happened more than once that another instructor teaching the Wireshark class asked me how to do this, and sometimes even pulled me into his own class to speak about capturing virtual machines for a few minutes. And since that topic seems to become more and more popular I thought it would be a good idea to write a little how-to about it.

Traditional capture setup

Usually, the first thing I do when you try to capture packets to solve a problem I determine the best location to set up your sniffer. It can either be put close to the client, or to the server, or somewhere in the network path between the two nodes. Sometimes, I use more than one capture location, for example at the client and the server, at the same time.

With purely physical networks the chances of selecting a good spot for the capture are pretty good – unless it is a very complex network with lots of redundancy and high speeds in the backbone. All you have to do is to find out which path the packets travel along and pick them up somewhere you like. Well, you need to determine if you can afford to use a SPAN port, or if you need to go by a TAP, but that’s usually it.

The virtual environment

In virtual environments things get a lot more complicated since there often is no physical spot where you can easily pick up packets from a single virtual machine. Consider the following example (and let’s assume all physical links are gigabit links):

A simple virtual server setup

Fig. 1 – A simple virtual server setup

Let’s say we want to take a look at anything the Mail server sends or receives. How do we do that? Well, we face a couple of problems here:

  1. We could do a SPAN port on the physical switch, but we don’t know what physical link the mail server will actually use since there are multiple network cards connecting the virtual switch to the outside world. We’d have to capture both at the same time, and not all switches allow us to do that.But let’s say it does. We still have the problem that we copy frames from two full duplex links to an “output only” link, which means in worst case situations there would be 2 times 2 GBps copied to one link with just 1 GBps output capacity. Guess what? The switch will drop frames right, left and center on the SPAN session – and our capture box will not even notice. Notice that capture filters won’t help either, because the frames do not get that far in the first place.
  2. The second problem is that not all packets the Mailserver might receive or send travel through the physical link. It could communicate with the Web Server, and none of the packets would ever have to leave the virtual environment, and you’d never capture them. They would simply travel from one server via the virtual switch to the other server.
  3. Let’s make things worse. Let’s assume we have an enterprise level virtualization plattform running, which im my case would mean VMware vSphere. With vSphere, most big environments run clusters of virtualization hosts, and that cluster is usually DRS enabled. DRS stands for “Distributed Ressource Scheduler”, which is kind of a  virtual machine load balancer: it can move machines from one physical box to another on it’s own, at least if it is running in fully automated mode. Guess what happens to our SPAN session from problem #1? As I demonstrated live on my Sharkfest talks twice the capture will just not see any more packets as soon as the mail server is moved away from the physical host. In the capture, it looks like the communication had been cut in a single instance. At the same time, the mail server keeps sending and receiving as if nothing ever happened, but on a different physical link. Ouch.

It’s time for a little bet: I bet that you’ve thought about simply installing Wireshark on the mail server at least once so far. Easy, right? No fuss with SPAN, TAP, virtual environments and whatnot. Well, you’re right. But that “easy” way of capturing the mail server’s packets has serious flaws. I won’t go into much detail here, but capturing packets on the node in question is a pretty bad idea: depending on the server setup you’ll see ghosts like tons of CRC errors and huge over-sized frames. And that does not even include the fact that you’ve drastically changed your problem environment and risk system stability/performance while capturing. Virus scanner and personal firewalls may add further strange results, so – unless you really know what you’re doing – let’s forget about capturing packets directly on a node once and for all. It leads to the dark side! ;-) You can read more why local captures are a bad idea here.

Virtual capture setups

Okay, by now we know that capturing packets coming from or going to a virtual system most likely requires a new strategy. Somehow, we need to get our SPAN/TAP into the virtual environment, and if you’re searching the internet for this kind of thing you’ll probably find a couple of commercial product that will help you with that. And while they’re not bad I’d like to show you how to perform captures with what we already have, at least when running a VMware vSphere setup. I guess other virtual environments offer similar options but I haven’t worked with them yet, so I’ll stick to VMware here.

VMware vSphere offers two kinds of virtual switches: standard and distributed. The standard vSwitch is what every vSphere installation has, no matter what license it is running on. Distributed vSwitches are only available for those who have a Enterprise Plus license, so I’ll focus on the standard vSwitch in this post, which might look like this in a very simple environment:

vSwitch Example

Fig. 2 – vSwitch Example

You’ll notice that there are virtual machines on the left, and physical network cards on right right. There also is a so called “Port Group” called “Production”. All virtual machines have to be connected to a port group, and while some administrators think of it as simple VLAN groups they are more than just that. They are ports grouped together with a specific set of properties that we will take a closer look at soon.

Now, let’s assume that Server1 and Server2 are talking to each other and we need to capture what is going on, for example by capturing all packets of Server1. Fortunately there is a vSwitch feature called “Promiscuous Mode”, which should sound familiar if you’ve already captured data before. But other than the promiscuous mode we know from ethernet NICs (which means that it accepts all frames that arrive instead of filtering on its MAC) the promiscuous mode of a vSwitch basically means that it will become a hub (well, sort of): it will forward all packets to all ports, which means that all virtual machines will see all packets of all other machines as well. It should be obvious why vSwitches are not in promiscuous mode by default – it floods VMs with traffic they don’t care about, and it allows rogue VMs to sniff packets they should not be able to sniff. Of course sniffing packets is what we want to do, so promiscuous mode seems to be the way to get them. The only problem is that when you just do that on the vSwitch your production VMs could all drown in packets, especially if you have not just a few VMs like in the test setup above, but maybe a couple of hundred. And that’s where port groups come in.

The Port Group “Trick”

Port groups are used to partition a vSwitch, and as I already said most VMware administrators I talked to about port groups just think about them as a tool to create different VLAN groups. But they also have their own security settings, including a toggle for promiscuous mode, which means that you can enable promiscuous mode just for some VMs and avoid the huge packet flood. And port groups have a feature that often comes as a surprise: you can create multiple port groups with identical settings. So if you need to capture the traffic of a VM like “Server1” in the example setup you can do what I do:

  1. Create a temporary port group with settings identical to the one Server1 is connected to. This means that you’ll have to make sure that the VLAN setting is exactly the same.
  2. Move the Server1 VM to the temporary port group. Whenever I did this I in the past I did not lose any connection the server had at that moment – but of course I can’t guarantee that it won’t in your case.
  3. Create a capture VM running e.g. Wireshark and connect it to the same temporary port group:
    vSwitch Capture Setup
  4. Enable promiscuous mode on the temporary port group by setting the override checkmark for “Promiscuous Mode” and chose “Accept” instead of “Reject”:
    Port Group Setting
  5. Log into your capture VM and capture packets. When capturing with a Windows machine I usually disable all protocol bindings on the network card to force it to become completely passive:
    Capture Card Setup
  6. Analyze ;-)

Well, of course you’ll have to move the VM back to the original port group when you’re finished and remove the temporary port group. There are two things you need to keep in mind:

  1. You’ll have to consider the amount of disk I/O that your capture VM will generate by writing packets to it’s VMDK, so please make sure that you don’t get into trouble with your storage administrators by putting additional load on their storage system.
  2. Very important: the capture of traffic using port group promiscuous mode only works if the capture VM is on the same ESXi host as the VM that you want to capture the traffic of. Otherwise you’ll only see broadcast/multicast packets. So you need to make sure that you move all VMs to the same ESXi host before you start the capture.

 

 

 

 

 

 

Discussions — 34 Responses

  • Robert Bullen April 9, 2013 on 9:38 pm

    The organization I just left is struggling with this very subject. And it will be a problem that we see more and more. So it was nice to see the problem and a workaround succinctly described in one place. Nicely done.

    Reply
  • Gary Burdell June 14, 2013 on 10:20 pm

    Thank you for this. We really needed a way to do wireshark capture in the VM environment. Great job and good instructions!

    Reply
  • Mike King May 7, 2014 on 7:07 pm

    Did you ever expand on this:
    “I won’t go into much detail here (maybe I’ll do it in another post, another time), but capturing packets on the node in question is a pretty bad idea: depending on the server setup you’ll see ghosts like tons of CRC errors and huge over-sized frames.”

    I’d very much like to see some details on this. Is this in general, or just VMWare/Virtual specific?

    Reply
  • Pete Austin March 24, 2015 on 10:35 am

    Thanks for this. Was having a nightmare getting wireshark on a VM to actually capture any traffic coming in the mirrored port I had set up. I already had it in its own Port Group and just needed to active Promiscuous Mode.

    Disaster averted.

    Reply
  • FreeSteFF April 17, 2015 on 9:29 am

    Hello, I am using VMWare with the latest version of Kali Linux .
    Hen I run the wiereschark intarface responds to myself that I must be super user to use the tool.
    Must monted virtual machine administrator right ?

    Reply
  • Sandeep May 5, 2015 on 7:37 am

    Hi,would like to know how we install wireshark in server 2003 on esxi host,I am unable to get the network connectivity in my server 2003 in esxi.I have 10 CSR 1000v instances on whom I want to capture the packets and the 10 machines alongwith the server 2003 VM are in the same port group.

    Reply
    • Jasper Bongertz Sandeep May 6, 2015 on 6:41 pm

      Not sure about the 1000v, but as far as I know you should be able to configure SPAN ports like on a physical Cisco switch. So you’d create a monitor session with source being all the ports you want to monitor and destination the port of the monitoring VM. Unfortunately I have no 1000v so I can’t test this.

      Reply
  • Varina May 13, 2015 on 7:52 pm

    Hi Jasper,

    Great article…. would you be able to advise on how to accomplish this:

    We have RSPAN setup to a destination port on Switch 1 port 2. Our ESX hosts are plugged into the same switch. We need the SPAN traffic to go to a virtual NIC on a VM that resides on the ESX host(s).

    Thanks so much,
    Varina

    Reply
    • Jasper Bongertz Varina May 13, 2015 on 8:52 pm

      In that case you could try to have that VM run on a dedicated vSwitch. Add a single physical NIC to that vSwitch and mirror the traffic to the port that NIC is connected to on the physical switch. Then put the vSwitch into promiscuous mode. That way all the packets coming in from the physical switch should be visible to the VM if you run Wireshark inside (or any other tool that enabled promiscuous mode on the VM NIC)

      Reply
  • Shamim Ahamed May 20, 2015 on 9:44 am

    I have problem to capture XML packet i follow the steps its working fine great article.

    Reply
  • Scott May 24, 2015 on 5:17 pm

    Terrific article! Now I can have an informed discussion with our VMWare admins, instead of having to blindly accept their objections!

    Reply
  • Abdul Mohsin August 7, 2015 on 1:41 am

    Thanks a lot. .successful! !!

    Reply
  • Buddy Edwards January 20, 2016 on 8:50 pm

    Is there any reason not to just leave the new Port Group on the VMWare host for future use.

    Reply
    • Jasper Bongertz Buddy Edwards January 20, 2016 on 9:02 pm

      You could of course do that, but it should be documented then. Otherwise the reason for the existence that port group may be lost in time :-)

      Reply
  • MN March 17, 2016 on 6:19 pm

    I tried what you explained in your article but I cannot see all the traffic on the monitoring VM.
    I have created separate port group with promiscous mode set to accepted. Moved the 2 VMs in there but I’m not seeing all the traffic of the VM I want to analyze. Should wireshark be configured a specific way?

    Reply
  • MN March 17, 2016 on 6:24 pm

    Never mind. Somehow the portgroup changed back to its default settings after moving the second VM into the portgroup. So it does work.

    Reply
  • SR August 4, 2016 on 1:52 pm

    I’m implementing MS ATA right now and it requires forwarding the DC traffic to the ATA server. Both virtual. This setup would be permanent. Will this work in this situation?

    Reply
    • Jasper Bongertz SR August 4, 2016 on 1:58 pm

      It should work, without really having an idea of what your setup looks like. Just make sure that the packets pass where you put your capture VM.

      Reply
  • Kris Springer September 2, 2016 on 7:10 pm

    I’ve got 2 ESXi physical servers, box 1 and box 2. I’ve got Security Onion installed on box 1 with promiscuous mode enabled. It sees and captures the traffic on the interfaces in it’s Virtual Port Group just fine. I’ve got a second ESXi box connected by the Physical Nics and I have it’s Virtual Port Groups also enabled for promiscuous mode. Traffic does flow between physical boxes over the Nics, but my sniffing VM on box 1 does NOT capture any traffic on box 2. Is there a workaround for this? I want my sniffer VM to live on box 1 so it isn’t using box 2 resources.

    Reply
    • Jasper Bongertz Kris Springer September 3, 2016 on 8:36 am

      Unfortunately, no. Sniffing almost always requires the sniffing VM and traffic being on the same box. A rare exception is when you can use distributed switches (Enterprise Plus feature set) which can to ERSPAN, which forwards packets via a special VLAN. Port group sniffing requires the sniffer to be on the same box as the traffic.

      Reply
  • Bill Wade January 13, 2017 on 9:59 pm

    I have this setup as described but all I see on wireshark is the broadcast/multicast traffic. Looks like the wireshark port doesn’t see the unicast traffic between the 2 VMs that I want to capture.

    Reply
  • Bill Wade January 13, 2017 on 10:03 pm

    And I set the portgroup and wireshark port to promiscuous mode. Thanks for any insight!

    Reply
    • Jasper Bill Wade January 14, 2017 on 12:37 am

      Are the two VMs on the same ESXi host as the capture VM? Otherwise it won’t work, not even with a distributed switch via SPAN port.

      Reply
  • Bill Wade January 16, 2017 on 3:46 pm

    Good point, I’ll move them to the same host. Thanks!

    Reply
    • Jasper Bill Wade January 17, 2017 on 11:33 am

      You’re welcome, and I’ll update the post accordingly since that requirement wasn’t made clear so far.

      Reply

*