XenServer - In depth investigation series

Background

Jumbo frame is always tricky because there is no standard.

In XenServer environment, jumbo frame is often used for network (storage network) used for IP based Storage traffic. But not all NIC drivers support jumbo frame, unfortunately the NIC driver (kernel module) documentation doesn’t normally mention jumbo frame supportability.

Symptom

Recently I’ve discovered that jumbo frame is NOT supported for Cisco VIC Ethernet NIC driver - enic, to my surprise. I would have thought that all Cisco NICs should support jumbo frame just because it carries the Cisco badge. BUT, it’s not the case.

If one insists on enabling jumbo frame for storage network over enic driven NICs or bond, kernel panic and random host reboots are expected.

NOTE: Kernel Crash dump generated by XenServer dom0 is different from Linux kernel crash dump generated by kdump (kexec-tools) running on bare metal. Of course, dom0 is the privileged first PV guest on a host.

Crash Dump Analysis

In kernel crash dump generated in /var/crash, we should see the following in xen.log

ip_fragment (defined in net/ipv4/ip_output.c), called ip_do_fragment) when IPv4 tried to fragment a large datagram (packet) because it could not be sent in one piece. This indicates that the packet size exceeded 1500 bytes. In other words, jumbo frame was enabled.

ip_do_fragment then called skb_copy_bits to copy bits from skb (socket buffer) to kernel buffer, during the process, memcpy caused segmentation fault, kernel mm tried do_page_fault to handle page fault (determine address and the problem then pass it off to the appropriate routine) BUT failed unfortunately.

Based on bad_area_nosemaphore and _bad_area_nosemaphore (defined in arch/x86/mm/fault.c) it seemed to be in an interrupt, with no user context (or were running in a region with pagefaults disabled), as a result the page fault could not be handled.

Looking deeper into no_context (defined in arch/x86/mm/fault.c), it seemed that kernel tried to access some bad page, triggered oops_begin and oops_end (defined in arch/x86/kernel/dumpstack.c), do_exit (kernel/exit.c) called.

In dom0.log we saw similar call trace and more information about the Oops.

If you look into kernel/exit.c, we should understand that BUG() was called. Kernel was not able to handle the paging request error nor recover, finally the running kernel gave up and panicked ;-D

Conclusion

The conclusion of the investigation is that enic does NOT support jumbo frame, DO NOT use it for storage networks on top of Cisco VIC NICs in XenServer.

I ended up changing the MTU for the storage network back to 1500 to fix the problem. The easy way is to remove the Storage IP, change the storage network MTU (if you don’t remove IP the MTU field is greyed out), reconfigure storage IP afterwards on each host in the pool. Alternatively, use xe command line (xe network-param-set uuid= MTU=1500) to change MTU for the network, and then unplug / plug the corresponding underlying PIFs are required, obviously more complicated process, your choice.

IMPORTANT: Broadcom NetXtreme II driver - bnx2x, you may know that jumbo frame can be enabled for bnx2x with GRO on back in XenServer 6.2 SP1 as per [CTX200270](http://support.citrix.com/article/CTX200270) (Yes, I wrote it...). It is NOT the case any more. This has changed, probably due the fact that the bnx2x driver keeps evolving.

The following Linux NIC drivers are known to support jumbo frame (some with conditions)

  • igb

  • ixgbe

  • e1000 (some cards may be affected due to errata)

  • e1000e (cards older than 82571 are affected)

  • bnx2 (not bnx2x)

  • be2net

  • bna

  • cxgb4