debugging ipv6 problems

ipv6dbg · February 17, 2012, 11:43:31 AM

I'm trying to debug a problem when accessing a specific german hosting company through HE tunnels and I'm not sure if it's related to my HE tunnel or to the hosting company itself.
The problem exists for quite some time now (several months) and I think if it's not related to the tunnel but to the hosting company, somebody else should have noticed before and the problem should have been fixed long time before.

The major problem is sending/receiving large packets from this hosting company, so ping works perfectly fine unless large payload is specified

This company has several different products and the problem only exists with their "shared webhosting" product:
Sending 1480 bytes ping packets (1432 bytes payload) works fine, 1481 bytes does not work

Some example domains which are affected:
tcm24.de 2a01:238:20a:202:1091::145
berlin-bookmarks.de 2a01:238:20a:202:1086::86

This hosting company also has a "dedicated server" product which seems to work perfectly fine,
example domain which is not affected by this problem:

otterweb.de 2a01:238:4395:d500:6e05:ef69:a431:7be1

traceroute6 shows different routing within their network which makes me believe it's not a tunnel issue but a problem within the hosting company, I would like to ask for your opinion on this problem.

kasperd · February 18, 2012, 01:44:07 PM

I was able to reproduce the problem.

# ping6 -c 1 -n -s 1432 he.net
PING he.net(2001:470:0:76::2) 1432 data bytes
1440 bytes from 2001:470:0:76::2: icmp_seq=1 ttl=56 time=187 ms

--- he.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 187.309/187.309/187.309/0.000 ms
# ping6 -c 1 -n -s 1433 he.net
PING he.net(2001:470:0:76::2) 1433 data bytes
1441 bytes from 2001:470:0:76::2: icmp_seq=1 ttl=56 time=187 ms

--- he.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 187.386/187.386/187.386/0.000 ms
# ping6 -c 1 -n -s 1432 2a01:238:20a:202:1091::145
PING 2a01:238:20a:202:1091::145(2a01:238:20a:202:1091::145) 1432 data bytes
1440 bytes from 2a01:238:20a:202:1091::145: icmp_seq=1 ttl=60 time=68.7 ms

--- 2a01:238:20a:202:1091::145 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 68.792/68.792/68.792/0.000 ms
# ping6 -c 1 -n -s 1433 2a01:238:20a:202:1091::145
PING 2a01:238:20a:202:1091::145(2a01:238:20a:202:1091::145) 1433 data bytes

--- 2a01:238:20a:202:1091::145 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

As you can see I can ping he.net with 1432 and 1433 data bytes. But I can only ping 2a01:238:20a:202:1091::145 with 1432 data bytes, and it fails with 1433 data bytes.

Adding the IPv6 and ICMP headers means the total packet sizes are 1480 and 1481 bytes. I got similar symptoms with traceroute6 -I -n 2a01:238:20a:202:1091::145 1480 and traceroute6 -I -n 2a01:238:20a:202:1091::145 1481.

My local network is configured such that even the packets with a total of 1480 bytes are going to get fragmented before they leave my network. So I know that 2a01:238:20a:202:1091::145 is able to receive a fragmented echo request and respond to that. That also makes me almost certain that it is receiving the 1481 byte packet as well. And it most likely is correctly reassembling that and sending a reply. It is the reply that get lost on the way.

The traceroute with 1481 byte packets worked all the way until the last hop before the destination. But that tells me nothing, since the problem is on the return path, and the time exceeded messages are truncated to 1280 bytes anyway, so won't experience any problems. In other words a traceroute from my end will show me nothing about the location of the problem.

The 1480 byte threshold is the MTU of the tunnel link from HE to me. A qualified guess at what happens is that the packets make it to that point, and the tunnel server send an ICMPv6 message back to the source, which most likely doesn't make it or isn't handled correctly by 2a01:238:20a:202:1091::145.

The next steps in debugging would be to ping the hops seen with traceroute with similar packet sizes to get an idea about where the problem is. It is also an option to create a link with an MTU less than 1480 bytes within your own network such that you can see for yourself if it keeps sending 1480 byte packets if it is told to use an even smaller packet size.

kasperd · February 18, 2012, 02:04:47 PM

Quote from: kasperd on February 18, 2012, 01:44:07 PMThe next steps in debugging would be to ping the hops seen with traceroute with similar packet sizes to get an idea about where the problem is.

I tried out this and got some really weird results.

# traceroute6 -I -n -m8 2a01:238:20a:202:1091::145 1480
traceroute to 2a01:238:20a:202:1091::145 (2a01:238:20a:202:1091::145), 8 hops max, 1480 byte packets
 1  2001:470:1f0a:1da2::1  71.579 ms  84.884 ms  99.060 ms
 2  2001:470:0:69::1  109.552 ms  109.614 ms  109.840 ms
 3  2001:7f8::1a44:0:1  109.842 ms  110.708 ms  110.756 ms
 4  2a01:238:0:a3ad::2  174.790 ms  174.842 ms  174.873 ms
 5  2a01:238:0:abad::1  118.913 ms  119.366 ms  119.396 ms
 6  2a01:238:20a:202:1091::145  111.315 ms  41.366 ms  39.989 ms
# traceroute6 -I -n -m8 2a01:238:20a:202:1091::145 1481
traceroute to 2a01:238:20a:202:1091::145 (2a01:238:20a:202:1091::145), 8 hops max, 1481 byte packets
 1  2001:470:1f0a:1da2::1  96.218 ms  103.659 ms  116.716 ms
 2  2001:470:0:69::1  126.899 ms  126.977 ms  127.057 ms
 3  2001:7f8::1a44:0:1  125.636 ms  125.699 ms  125.798 ms
 4  2a01:238:0:a3ad::2  125.823 ms  125.934 ms  126.619 ms
 5  2a01:238:0:abad::1  128.735 ms  133.339 ms  133.395 ms
 6  * * *
 7  * * *
 8  * * *
# traceroute6 -I -n -m8 2a01:238:0:abad::1 1472
traceroute to 2a01:238:0:abad::1 (2a01:238:0:abad::1), 8 hops max, 1472 byte packets
 1  2001:470:1f0a:1da2::1  63.454 ms  70.686 ms  77.362 ms
 2  2001:470:0:69::1  91.451 ms  91.466 ms  91.479 ms
 3  2001:7f8::1a44:0:1  91.016 ms  91.056 ms  91.314 ms
 4  2a01:238:0:a3ad::2  91.454 ms  91.768 ms  91.794 ms
 5  2a01:238:0:abad::1  91.823 ms  91.840 ms  94.473 ms
# traceroute6 -I -n -m8 2a01:238:0:abad::1 1473
traceroute to 2a01:238:0:abad::1 (2a01:238:0:abad::1), 8 hops max, 1473 byte packets
 1  2001:470:1f0a:1da2::1  45.976 ms  53.717 ms  67.135 ms
 2  2001:470:0:69::1  85.758 ms  85.871 ms  83.667 ms
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *

A traceroute from my computer to 2a01:238:20a:202:1091::145 shows a route of six hops. With 1481 bytes I only see the first five hops (for the reasons explained in my previous comment). I then tried to do a traceroute of the fifth hop with 1472 bytes. I see that it takes the same route (which is to be expected but not guaranteed). But if I try a traceroute with 1473 bytes, it fails already after the second hop.

So I get an MTU problem already at 1473 bytes.

But why do I see an MTU problem already with 1473 bytes on the hop from 2001:470:0:69::1 to 2001:7f8::1a44:0:1, when I am able to send larger packets over that same hop when I am using a different destination IP? That symptom I simply cannot think of a logical explanation for. Maybe a good nights sleep will help me understand that symptom.

kasperd · February 26, 2012, 07:29:03 AM

I got one step closer to tracking down the problem.

The problem in running a traceroute from my computer towards 2a01:238:0:abad::1 is not directly caused by the MTU of any of the hops, but rather caused by something dropping fragmented packets.

When I try to do a traceroute towards 2a01:238:0:abad::1 I get responses from hop #2 2001:470:0:69::1 regardless of which packet size I use and regardless of whether the packets are fragmented when they leave my computer.

However I only get responses from hop #3 2001:7f8::1a44:0:1 if the packets leave my network without fragmentation. It is not the MTU of the link between hop #2 and hop #3 that causes the problem. If I reduce the MTU of the link from my own computer and cause smaller packets to get fragmented, then those smaller packets will have a problem as well.

This leads me to conclude that either 2001:470:0:69::1 or 2001:7f8::1a44:0:1 is filtering fragmented packets. However that filtering is only applied to some destination IPs and not to other destination IPs. In particular, the filtering is applied when the destination is 2a01:238:0:abad::1 but not if the destination is 2a01:238:20a:202:1091::145.

I would assume that since the route from hop #2 actually uses the same next hop for both destinations, and that next hop is not itself part of 2a01:238::/38, most likely 2001:470:0:69::1 has a single route covering both of those two destinations, and probably all of 2a01:238::/32 is covered by that routing table entry. This makes it unlikely that hop #2 is treating the two destinations differently.

OTOH hop #3 does use a next hop within 2a01:238::/38, that makes it more likely that hop #3 treats packets to different destinations within 2a01:238::/38 differently. So I am fairly confident in saying that 2001:7f8::1a44:0:1 sometimes filter fragmented packets, even if those packets after reassembly would be less than 1500 bytes. That means 2001:7f8::1a44:0:1 breaks communication between fully compliant hosts in some cases.

However it is also clear, that 2001:7f8::1a44:0:1 is not filtering fragmented packets targeted for 2a01:238:20a:202:1091::145. I know for a fact that I managed to get a fragmented packet through all the way to 2a01:238:20a:202:1091::145, and I got a reply back. However the reply did not need to get fragmented. Even though my configuration didn't allow my echo request to leave my network without fragmentation, the configuration did allow the echo reply to go back into my network without fragmentation.

That means that right now the most plausible explanation for the MTU issue I was able to reproduce between my computer and 2a01:238:20a:202:1091::145 is that the echo reply gets fragmented, and some router is filtering fragmented packets. This working theory can be verified by changing the configuration on my network such that the echo replies get fragmented at a smaller size, and check if doing so reduces the packet size I can ping 2a01:238:20a:202:1091::145 with.

It is not unlikely that filtering of fragmented packets is only applied to ICMPv6 packets, and that fragmented UDP packets would be let through. The thing is, that certain filters against spoofing router advertisements can be bypassed by fragmenting the spoofed packets. For that reason there may be people who consider it appropriate to filter all fragmented ICMPv6 packets, and in this case do so selectively depending on the destination address.

kasperd · February 26, 2012, 02:01:19 PM

Quote from: kasperd on February 26, 2012, 07:29:03 AMThis working theory can be verified by changing the configuration on my network such that the echo replies get fragmented at a smaller size, and check if doing so reduces the packet size I can ping 2a01:238:20a:202:1091::145 with.

I tested this. It did not work.

As you can see once I send a packet too big messages, 2a01:238:20a:202:1091::145 keeps sending packets of the same size:

22:47:48.705251 IP 203.0.113.7 > 216.66.80.30: IP6 2001:470:1f0b:1da2::db8 > 2a01:238:20a:202:1091::145: ICMP6, echo request, seq 1, length 1432
22:47:48.752954 IP 216.66.80.30 > 203.0.113.7: IP6 2a01:238:20a:202:1091::145 > 2001:470:1f0b:1da2::db8: ICMP6, echo reply, seq 1, length 1432
22:47:48.753673 IP 203.0.113.7 > 216.66.80.30: IP6 2001:470:1f0b:1da2::db8 > 2a01:238:20a:202:1091::145: ICMP6, packet too big, mtu 1280, length 1240
22:47:49.710899 IP 203.0.113.7 > 216.66.80.30: IP6 2001:470:1f0b:1da2::db8 > 2a01:238:20a:202:1091::145: ICMP6, echo request, seq 2, length 1432
22:47:49.751440 IP 216.66.80.30 > 203.0.113.7: IP6 2a01:238:20a:202:1091::145 > 2001:470:1f0b:1da2::db8: ICMP6, echo reply, seq 2, length 1432
22:47:49.752424 IP 203.0.113.7 > 216.66.80.30: IP6 2001:470:1f0b:1da2::db8 > 2a01:238:20a:202:1091::145: ICMP6, packet too big, mtu 1280, length 1240

Maybe something is dropping the packet too big messages.

Since an ICMP error cannot itself result in an ICMP error, it is not possible to do a traceroute using packet too big messages in order to find out how far the packet too big messages gets. However I am going to repeat the above experiment with each IP address on the IPv6 path between my computer and 2a01:238:20a:202:1091::145.

I did a ping of he.net as well to see how it works on a well behaved network:

22:53:40.196953 IP 203.0.113.7 > 216.66.80.30: IP6 2001:470:1f0b:1da2::db8 > 2001:470:0:76::2: ICMP6, echo request, seq 1, length 1432
22:53:40.383345 IP 216.66.80.30 > 203.0.113.7: IP6 2001:470:0:76::2 > 2001:470:1f0b:1da2::db8: ICMP6, echo reply, seq 1, length 1432
22:53:40.384223 IP 203.0.113.7 > 216.66.80.30: IP6 2001:470:1f0b:1da2::db8 > 2001:470:0:76::2: ICMP6, packet too big, mtu 1280, length 1240
22:53:41.197041 IP 203.0.113.7 > 216.66.80.30: IP6 2001:470:1f0b:1da2::db8 > 2001:470:0:76::2: ICMP6, echo request, seq 2, length 1432
22:53:41.383856 IP 216.66.80.30 > 203.0.113.7: IP6 2001:470:0:76::2 > 2001:470:1f0b:1da2::db8: frag (0|1232) ICMP6, echo reply, seq 2, length 1232
22:53:41.383945 IP 216.66.80.30 > 203.0.113.7: IP6 2001:470:0:76::2 > 2001:470:1f0b:1da2::db8: frag (1232|200)

Notice how after the first echo reply is bounced, later echo replies are smaller.

kasperd · February 26, 2012, 02:34:45 PM

Quote from: kasperd on February 26, 2012, 02:01:19 PMHowever I am going to repeat the above experiment with each IP address on the IPv6 path between my computer and 2a01:238:20a:202:1091::145.

I did this and nothing appears to be blocking the packet too big messages. I can ping 2a01:238:0:abad::1 hop #5 with a packet that is too big for the return path. The first reply gets lost, but the second reply gets fragmented and returned to me.

That means 2001:7f8::1a44:0:1 is neither blocking packet too big messages from me, neither blocking fragmented packets from being sent back to me. But I have already established that blocking of fragmented packets is dependent on destination IP. If blocking of packet too big messages was dependent on destination address as well, that router could still be responsible. It does however seem strange that somebody would make opposite choices of which destinations gets packet too big messages filtered and which destinations gets fragmented packets filtered.

It is also possible that there is a filtering of fragmented packets, which depends both on source and destination IP. But a hypothetical misconfiguration of 2001:7f8::1a44:0:1 that would explain the behaviour I am seeing would look too weird for me to think that is a likely cause. More likely there is a second problem between 2001:7f8::1a44:0:1 and 2a01:238:20a:202:1091::145 where either packet too big messages are filtered or fragmented packets are filtered.

In addition I noticed something strange happening on 2001:470:1f0a:1da2::1. After I had pinged 2001:470:1f0a:1da2::1 and bounced the echo reply with a packet too big message, I did not receive a single packet larger than 1280 bytes. I still saw the first echo reply being lost, and the second echo reply being fragmented. That I can only explain with 2001:470:1f0a:1da2::1 actually sending the packet too big messages on my behalf even though it isn't directly connected to the link with the lower MTU. I don't think a router is expected to behave that way.

If the only problem on the path was the filtering of fragmented packets, things could mostly work. Most traffic goes over TCP anyway, and TCP can segment data into small enough segments that fragmentation is not going to be necessary. However some TCP stacks will not reduce the size of the segment, which triggered the first packet too big message. The segment, which had already been sent, is instead fragmented. And the smaller MTU is only used at the TCP layer for later segments. A TCP stack with that behaviour combined with filtering of fragmented packets will result in the first TCP connection running into a problem and timing out, but later TCP connections will work for as long as the IPv6 stack remembers the PMTU. Once it times out the TCP layer will have another connection time out before the problem goes away again.

I don't think there is much more I can find out from here. If 2001:7f8::1a44:0:1 (or some other router on the same network) is also on your path towards the destination IPs, then that is a likely culprit for at least one of the MTU problems. And you'd need somebody with access to that router to investigate. Though I saw strangeness within the HE network, I didn't find evidence to think any of the MTU problems are within the HE network.

ipv6dbg · February 27, 2012, 02:25:25 PM

Thank you for your extensive help in debugging this issue and confirming my thoughts that it's a problem on their side and not within HE or my network.

It still remains strange that they're having this problem for some time now (I think several months) and they have IPv6 enabled for quite a lot of websites which should then impose a problem for many ipv6 users unless the majority of users is getting different routing which I currently do not believe after checking with different ipv6 looking glasses

kasperd · February 27, 2012, 11:27:46 PM

Quote from: ipv6dbg on February 27, 2012, 02:25:25 PMthey have IPv6 enabled for quite a lot of websites which should then impose a problem for many ipv6 users unless the majority of users is getting different routing

For anything that runs over TCP you can avoid most MTU problems by tweaking the MSS advertisement. However 2a01:238:20a:202:1091::145 is advertising an MSS of 1440 bytes. When adding 60 bytes of IP and TCP headers to that, you get 1500, which suggests the first hop from the server has an MTU of 1500 bytes. So the MSS advertised by the server won't solve many MTU problems.

If the first hop from the client is the one with the smallest MTU, then the MSS advertised by the client will keep the server from sending packets that would exceed the path MTU. As long as the smallest MTU is on the first or the last hop, TCP does not need PMTU discovery. However if the smallest hop is somewhere in between, then PMTU discovery is required. A typical HE user is likely to have an MTU of 1500 bytes between their router and their LAN, but only a 1480 byte MTU between their router and the Internet because of the tunnel. There could be a problem.

I am clamping the MSS at the edge of my network to get rid of most MTU problems. I wouldn't see a problem with that site. But I don't think everybody else is clamping their MSS as well, so there may still be some users, who has a problem. Just saying that there may be a significant number of users who does not experience a problem for one reason or another.

News:

debugging ipv6 problems