WireGuard connectivity issues

Fabio Natali, 29 March 2024

Intro

astronaut.jpg

Figure 1: Astronaut Edward H. White II floats in the zero gravity, connected to the Gemini IV spacecraft via WireGuard. Wait, what? Image: NASA.

tl;dr: If you use WireGuard, make sure NTP (or a similar time synchronisation mechanism) is set up on all VPN endpoints as the WireGuard protocol is sensitive to time-sync issues. Also, if you connect to a WireGuard server in an IPv6 network, make sure the server's firewall has sufficiently permissive ICMPv6 rules. Read on to find out why this is important.

I have a newly installed Guix server in my home office network. The network is dual-stack, it supports IPv6 and IPv4, and the server is also configured to be reachable via IPv6 and IPv4.

I can connect to the server from the LAN, using its IPv4 address. I can also connect to it from the global internet using a WireGuard VPN that's configured with the server's IPv6 address.

After the initial setup, I test the VPN, which works just fine. I can ping the server and SSH into it from my laptop. Brilliant, job done.

Ground Control to Major Tom

Except…

A few hours after the initial setup, I'm no longer able to SSH into the server via the VPN. The connection times out. I can still connect via the local IPv4 address though. Weird.

I'm mildly annoyed but I think it's a minor glitch and I restart the server's network service without giving it too much thinking. That fixes the problem.

Can you hear me Major Tom?

Except…

After a few hours the problem shows up again. And again.

When I finally decide to look into this properly, I fortuitously notice that the server's clock is running a few minutes behind. I find out that the Network Time Protocol daemon (ntpd) isn't running. As a matter of fact, it's not even included in my minimalist Guix configuration, i.e. no service ntp-service-type. I add it in and redeploy.

The time issue is definitely something that needed to be fixed. A quick web search (e.g. here and here) confirms that the WireGuard protocol is sensitive to time-sync issues.

However, in my case, it turns out this wasn't the primary or only issue at play. A few hours after enabling the NTP service, to my disappointment, connectivity is lost again.

Far above the world

The way the problem manifests (everything works for a few hours, then it stops) makes me think it could be a router-server discoverability issue. Perhaps it's something that shows up when the router's IPv6 neighbour discovery cache gets cleaned up? Perhaps the server's firewall is too strict in the way it handles ICMPv6 traffic?

Originally the firewall only allowed ICMPv6 of type echo-request, nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert. I adjust the server's nftables as follows:

table inet firewall {
    chain inbound {
        # ...
        icmp type echo-request limit rate 5/second accept
        icmpv6 type echo-request limit rate 5/second accept
        icmpv6 type {nd-neighbor-solicit, nd-router-advert, nd-neighbor-advert,
                     nd-router-solicit, mld-listener-query, time-exceeded,
                     packet-too-big} accept
    }
}

With this in place connectivity is restored and it's been working perfectly for a couple of weeks, so I think I can say the problem is now fixed.

Outro

In hindsight, it seems that the problem was primarily around the firewall misconfiguration. I'm still glad I had the chance to fix the time issue too, as I know that'd have also given me some headache down the line.

I hope you might find this useful if you use WireGuard, especially within a IPv6-only or dual-stack network.

Revision 41cb4cd.