Troubleshooting MPLS WAN services, like VPLS, pseudowires and Layer-3 VPNs, can be trickier than traditional provider...
offerings. In this tip, learn how network engineers can fix MPLS WAN service problems.
Enterprise wide area network (WAN) engineers are well-versed in troubleshooting traditional service provider offerings ranging from leased lines and other TDM offerings to frame relay and ATM. After all, these services have very clear demarcation points; the service provider is responsible for the lower physical- and data-link layers and the enterprise WAN engineer has to take care of all the higher layers in the OSI model -- from the network layer to application layer.
Some MPLS-based services are very close to traditional service provider offering models. For example, pseudowires emulate point-to-point links, although they are usually delivered in a Carrier Ethernet package. Virtual private LAN service -- or VPLS, commonly marketed under names like "Enterprise Private LAN" -- is a LAN emulation service that also retains clear separation between the service provider responsibilities and include delivering Layer-2 MAC frames between all endpoints and the enterprise’s routed network. When using these types of MPLS WAN services, follow the traditional troubleshooting procedures using the steps below.
MPLS-based pseudowire should look like a simple point-to-point link. As long as you can ping the other end of the link (assuming you’re running IP across the link), the link is usually operational. However, even pseudowires present some challenges highlighted in the following scenarios:
- MTU mismatch. When troubleshooting MPLS-based pseudowire, see whether the maximum packet size -- or maximum transmission unit (MTU) -- you need could be larger than the Ethernet’s default setting due to jumbo frames used in typical data center environments or additional header fields imposed by your private MPLS-based solutions. If you suspect MTU problems, use a tool like mturoute (for Windows) or tracepath (for Linux) to measure the actual end-to-end MTU.
- Pseudowire might not be totally transparent. Verify that your edge devices can see each other using a Layer-2 protocol like Cisco’s Discovery Protocol (CDP) or Link Layer Discover Protocol (LLDP -- defined in the IEEE standard 802.1AB). Non-transparent pseudowires might not be a showstopper for routed Layer-3 connections, but they can totally wreck your Layer-2 data center interconnect (DCI).
- Pseudowire might not provide end-to-end state signaling. When the link is lost at one end of the pseudowire (or broken somewhere in the service provider cloud), the other end may still appear operational.
End-to-end state signaling across Carrier Ethernet uses Connectivity Fault Management (Ethernet CFM -- defined in the IEEE standard 802.1ag) and Ethernet Operations, Administration, and Management (Ethernet OAM -- defined in the IEEE standard 802.3ah). If your edge devices support these standards, use them. I also prefer and suggest you use the service providers’ offerings that support these standards.
Troubleshooting VPLS services
Virtual Private LAN Service (VPLS) should resemble a switched Layer-2 domain. As they are built with a full-mesh of pseudowires, you might experience some of the problems described in the previous section.
When troubleshooting MPLS-based WAN services like VPLS, you can’t expect total transparency; after all, devices in the service provider network appear as bridges to the VPLS edge devices. You also can't expect end-to-end signaling, since there are more than two devices connected to the VPLS cloud. With VPLS WAN services, you could still be affected by MTU issues, but these should be easy enough to detect with the tools I mentioned above: mturoute and tracepath.
The worst VPLS troubleshooting scenario is undoubtedly partial connectivity within the VPLS cloud. Due to broken or misconfigured pseudowires, some edge devices might be able to communicate while others would have limited connectivity. Depending on the Layer-3 routing protocol you use, the VPLS LAN might seem to be operational even though some devices are not able to communicate.
Here's one method for troubleshooting VPLS under these circumstances: To do a quick check of the VPLS health, check the routing protocol neighbors on all routers attached to the VPLS service. Routing protocol hello messages are multicast packets, which should be propagated to all devices attached to the VPLS service. Each router should thus see all other routers attached to the same VPLS cloud as routing protocol neighbors.
To troubleshoot MPLS WAN services with partial connectivity issues, you could also use this procedure:
- Identify the endpoints that cannot communicate.
- Check the routing tables on the first-hop routers. If they don’t have routes to the destination, you have to perform traditional routing protocol troubleshooting.
- Do a traceroute between the endpoints. If the trace stops at the edge of the VPLS service, you might be experiencing VPLS connectivity issues.
- To verify your diagnosis, perform pings between routers directly attached to the VPLS service. If the initial pings succeeds, don’t forget to repeat the tests with the maximum MTU size you expect to be able to transport across the VPLS service.
Troubleshooting MPLS Layer-3 VPNs
When you use Layer-3 MPLS VPN services, the service provider operates the routed core of your network so there’s not much you can do to fix the problems. (If you’ve ever used managed LAN services, you know how the MPLS VPN customers feel). When troubleshooting MPLS Layer-3 VPNs, there are only a few things you can do before opening a ticket with your service provider:
- Check the WAN link status on your customer edge (CE) routers. If the WAN link is down, then that's the source of your problem.
- Check the routing protocol status on the CE routers. If you can’t reach the provider edge (PE) router, either the PE router failed or the local link has failed in a way that’s not reflected in Layer-1/Layer-2 link status (see the Troubleshooting pseudowires section above).
- If the CE-routers communicate with the PE-routers but you still can’t get the routes across the MPLS VPN network, obviously the service provider has routing issues that you can do nothing about (apart from buying a backup service from another service provider).
Don’t forget: even if you can successfully ping across the MPLS VPN network, it might still have some hidden MTU issues, so make sure you perform MTU check (see the Troubleshooting pseudowires section for more on MTU checks).
Last but not least, remember that pinging between CE routers is not equivalent to end-to-end host connectivity. When you perform ping tests between the routers, these packets are usually sent between WAN interfaces. If you can’t access the end hosts, at least make sure you perform router-to-router pings using their LAN interfaces.
For more information:
- Learn about the different classes of MPLS services to find the best MPLS/VPN for your WAN.
- Should your company consider building MPLS networks into its WAN?
- Learn how to prepare enterprise WANs for MPLS/VPN integration.
About the author: Ivan Pepelnjak, CCIE No. 1354, is a 25-year veteran of the networking industry. He has more than 10 years of experience in designing, installing, troubleshooting and operating large service provider and enterprise WAN and LAN networks. Pepelnjak is currently chief technology advisor at NIL Data Communications, focusing on advanced IP-based networks and Web technologies. His books include MPLS and VPN Architectures and EIGRP Network Design. Check out Pepelnjak's IOS Hints blog.