After I had successfully managed to get the Cisco SD-WAN home lab setup and the CA infrastructure configured it was time to start configuring the controllers and vEdges. Again plenty of resources out there to help with this process so go search it all and use what makes sense to you.
With the transport overlay (VPN0) configured for all controllers and vEdges and control connections up and running, after a few hours I noticed that my vEdges would connectivity to my vManage. I would have to reboot them all the time to get all things back in sync. Well as you can imagine this became tiring multiple times a day.
One troubleshooting tip I came across was to check the route table to make sure the vEdges could get to the vManage. After reviewing this, I noticed the default route was missing from the vEdges to use the Public-Internet transport to get to the controllers. After I rebooted them, it would magically appear again, but a few hours later, gone like the wind. After some more searching I came across a bug report from Cisco identifying this issue with my software version (18.4.5) and it matched the exact issue.
When vEdges cannot reach vManage or any other controller check the route table to see if static default is listed, even if in config. Rebooting injects the default static back to route table.
Well we know the reboot works, but I’m tired of that process. The report showed the workaround/fix for the annoying issue.
– Add “no track-default-gateway” to system config (https://bst.cloudapps.cisco.com/bugsearch/bug/CSCvp46172)
With gateway tracking enabled, the software sends ARP messages every 10 seconds to the next hop of a static route. If the software receives an ARP response, it places the static route into the local route table. After 10 consecutive ARP responses are missed, the static route is removed from the route table. The software continues to periodically send ARP messages, and as soon as it once again receives an ARP response, the static route is added back to the route table.
Sounds like vManage is struggling with ARP….which might make sense since periodically I lose connectivity to the vManage UI and have to clear the arp on the network switch my server is connected to for the mgmt network.