• Welcome to Hurricane Electric's IPv6 Tunnel Broker Forums.

MTU and ICMP "packet too big" problem

Started by igwanv6, June 17, 2012, 06:56:41 PM

Previous topic - Next topic

igwanv6

Hi,

While debugging an issue, I noticed that the tunnel server doesn't send "Packet too big" to the sender of an oversized packet sent to my tunnel.
The tunnel is set to MTU 1472 in the advanced options.

I've tried the following from 3 different servers on separate networks (with native IPv6 connectivity), with the same results.

# ping6 2001:470:1f06:1c0::2 -s 1424 -M do
PING 2001:470:1f06:1c0::2(2001:470:1f06:1c0::2) 1424 data bytes
1432 bytes from 2001:470:1f06:1c0::2: icmp_seq=1 ttl=59 time=177 ms
1432 bytes from 2001:470:1f06:1c0::2: icmp_seq=2 ttl=59 time=173 ms
1432 bytes from 2001:470:1f06:1c0::2: icmp_seq=3 ttl=59 time=191 ms
^C
--- 2001:470:1f06:1c0::2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 173.375/180.731/191.250/7.647 ms
# ping6 2001:470:1f06:1c0::2 -s 1425 -M do
PING 2001:470:1f06:1c0::2(2001:470:1f06:1c0::2) 1425 data bytes
^C
--- 2001:470:1f06:1c0::2 ping statistics ---
8 packets transmitted, 0 received, 100% packet loss, time 7003ms



kasperd

I can't even ping your IPtraceroute to 2001:470:1f06:1c0::2 (2001:470:1f06:1c0::2), 30 hops max, 80 byte packets
1  2001:470:1f0b:1da2:635a:c32:ae34:df91  0.559 ms  0.187 ms  0.195 ms
2  2001:470:1f0a:1da2::1  42.833 ms  48.560 ms  54.246 ms
3  2001:470:0:69::1  54.363 ms  39.479 ms  39.518 ms
4  2001:470:0:1d2::1  52.212 ms  52.269 ms  52.203 ms
5  2001:470:0:128::1  124.441 ms  124.452 ms  124.513 ms
6  2001:470:20::2  136.985 ms  139.459 ms  141.907 ms
7  *  *  *
8  *  *  *
9  *  *  *
10  *  *  *

kasperd

The IP was responding for a little bit, so I can confirm that sending packets of 1472 bytes get me a reply and packets of 1473 bytes are silently discarded somewhere. With appropriate TTL packets of 1473 bytes do get time exceeded from the tunnel server, so problem appears to exist between tunnel server and your endpoint.

However the lack of too big messages is not the only problem. If I set my own TTL lower, it still doesn't work. I tried to set my own TTL to 1280 bytes and then ping with packets larger than 1280 bytes, but still way below your 1472 byte limit. I never got a reply from your IP. Rather I got reassembly exceeded messages. There appeared to be no packet drops, every packet I send resulted in one reassembly time exceeded message for the first fragment.

So something is consistently allowing the first fragment through and dropping the second fragment. Again problem appears to exist between the tunnel server and your endpoint. If I ping the tunnel server with fragmented echo requests, I do get replies.

Since I cannot run a tcpdump on the network connection between your endpoint and the tunnel server, I have no way of debugging any further.

igwanv6

#3
Thanks for taking the time to investigate this.


Well I finally got the 'Packet Too Big', but with mtu=1480, not mtu=1472 which I set in the options of the tunnel (on HE side). But I only manage to get it every ten minutes or so... Probably rate limited to hell on HE side.

# ping6 2001:470:1f06:1c0::2 -s 1433
PING 2001:470:1f06:1c0::2(2001:470:1f06:1c0::2) 1433 data bytes
From 2001:470:20::2 icmp_seq=1 Packet too big: mtu=1480
^C


I ran some more traces.

I noticed the proto 41 packets sent by the tunnel server have the DF flag set. My ISP rejects those packets > 1492 with the correct error code (fragmentation needed but DF set).

But the tunnel server doesn't care and sends up to 1500 bytes IPv4 packets/fragments anyway. Those big packets/fragments having DF set, they're simply dropped by my ISP and I don't see them.

The problem would be solved completely if the MTU option on tunnel configuration (HE side) was working as intended. Sending "packet too big" thus forcing senders to split their packets at the 1472 (IPv6) boundary, producing IPv4 proto 41 packets of at most 1492 bytes.

I tried to set my own TTL to 1280 bytes and then ping with packets larger than 1280 bytes, but still way below your 1472 byte limit. I never got a reply from your IP.

I just tested that. I _do_ get the two fragments on my side but my router doesn't answer. Probably the firewall on my router blocking the fragment.



kcochran

Your MTU change wasn't sticking.  Now will.

kasperd

Quote from: igwanv6 on June 19, 2012, 06:56:39 PMWell I finally got the 'Packet Too Big', but with mtu=1480, not mtu=1472 which I set in the options of the tunnel (on HE side). But I only manage to get it every ten minutes or so... Probably rate limited to hell on HE side.
Seems pretty extreme if it takes ten minutes before you even see the first error. Rate limiting it is acceptable, but if it is not triggered by the first large packet in a stream, then it will result in a poor performance for the user.

Quote from: igwanv6 on June 19, 2012, 06:56:39 PMI noticed the proto 41 packets sent by the tunnel server have the DF flag set. My ISP rejects those packets > 1492 with the correct error code (fragmentation needed but DF set).
That parts sounds like it is working as intended. One question though, how much of the initial payload is included in the ICMP error message? Some routers will only include 8 bytes of the IPv4 payload, which in the case of 6in4 means the first 8 bytes of the IPv6 header. In such cases the error message doesn't even contain the IPv6 addresses of the packet. Other IPv4 routers will include somewhere between 500 and 600 bytes of the IPv4 payload (I don't recall the exact number). In such cases it contains sufficient information from the IPv6 packet to act upon.

Quote from: igwanv6 on June 19, 2012, 06:56:39 PMBut the tunnel server doesn't care and sends up to 1500 bytes IPv4 packets/fragments anyway.
It may have been changed. In the past I know that at least some of the tunnel servers did care. The would make use of those ICMP error messages, even if they did not contain enough information to identify the triggering IPv6 packet.

The only information the tunnel server could use from such an ICMP error was the source and destination IPv4 address, and if that combination of IP addresses matched a tunnel, it would lower the MTU for that tunnel accordingly, for a few minutes. So you'd lose one packet every few minutes because an IPv6 packet would have to go through without triggering an ICMPv6 error message, then once the ICMP error message made it to the tunnel server, the MTU of the tunnel would be lowered for a few more minutes.

That could actually be used to lower the MTU of anybodys tunnel as long as you knew the IPv4 addresses of both endpoints of the tunnel. You didn't even have to spoof the source IP of the ICMP errors you were sending, as the tunnel server obviously doesn't know the IPv4 address of whatever router on the path would be legitimately triggering such ICMP errors in the first place.

Quote from: igwanv6 on June 19, 2012, 06:56:39 PMI just tested that. I _do_ get the two fragments on my side but my router doesn't answer. Probably the firewall on my router blocking the fragment.
You'll need to fix that as well. Otherwise it is probably never going to work. Though TCP can segment the data stream and thus doesn't have to send fragmented packets, there are some TCP stacks that will not change segmentation on retransmission. Thus if the TCP stack sends a segment and gets a too big message back, then it will make later segments smaller, but the segment that triggered the message in the first place will just be retransmitted at the same segment size, but using fragmentation.

igwanv6

Quote from: kcochran on June 19, 2012, 08:47:32 PM
Your MTU change wasn't sticking.  Now will.

Thanks, it works perfectly now.

Quote from: kasperd on June 20, 2012, 03:21:35 AM
You'll need to fix that as well. Otherwise it is probably never going to work. Though TCP can segment the data stream and thus doesn't have to send fragmented packets, there are some TCP stacks that will not change segmentation on retransmission. Thus if the TCP stack sends a segment and gets a too big message back, then it will make later segments smaller, but the segment that triggered the message in the first place will just be retransmitted at the same segment size, but using fragmentation.

The firewall was blocking incoming icmp fragments in its input chain (no problem with forwarding). Thanks for the useful tips.

kasperd

Quote from: igwanv6 on June 21, 2012, 05:01:54 AMThe firewall was blocking incoming icmp fragments in its input chain (no problem with forwarding).
That sort of filtering is almost impossible to get right. IPv6 permits an arbitrary amount of extension headers (only limited by reassembled packet size). The only way to know for sure what is inside the packet is by reassembling it. No individual fragment is guaranteed to tell you what protocol is inside the packet.

If there is a reasonable number of extension headers, then the first fragment will contain enough information to tell what the transport protocol is. As long as the last extension header starts inside the first fragment, you can actually tell what the transport protocol is.

It would make sense for a firewall to block the first fragment, if it can't tell what the protocol is from that fragment alone. And for many protocols it would even make sense to require the first 8 bytes of the protocol payload to be present in the first fragment for filtering. Otherwise the firewall wouldn't be able to filter on ICMP types or port numbers.

For later fragments the firewall can't do any meaningful filtering. The best you can do really is to allow the later fragments through and only filter the first fragment. That means if a fragmented packet isn't accepted by the policy, the first fragment is dropped, and the rest are allowed through, but can never get reassembled.

I don't know what sort of thinking leads to letting the first fragment through only to drop later fragments.

colonelf74

I know this represents a challenge to high-bandwidth environments, but I wanted to just ask.

A while ago, I took the plunge with my Mac and changed the MTU of it's ethernet interface to 9000.
The only particular reason why is I've managed to get my entire LAN setup running Category 6
and 1000Base-TX so I figured why not?

Is there any chance Hurricane Electric will support larger MTUs like this in the future
via tunnelbroker?  I'd set mine to 9000 tomorrow if I could.

broquea

Unless you have a pure path to the tunnelserver with 9000+ configured on all the interfaces involved, won't ever happen. Also if you DO have a pure 9000+ path to the tserv, why the hell don't you have native IPv6?

colonelf74

*chuckle*  Well, first off I have no earthly idea whether Apple's Airport Extreme would support such a thing either way.

And yes, I know hiding behind a NAT isn't real security, but it always makes me feel better.
And yes, I'm being cheap.  Possibly lazy, too.  :-)

Heck, on the upside the lone Mac I have for now on the network doesn't even listen on ports
on its ethernet interface.

And why am I using HE's tunnel?  For the moment, Comcast doesn't support anything more than one PC running IPv6 plugged directly into their cablemodem.  And their mondo NAT I've read about online...yeah it sucks.  My Halo: Reach is so much happier on HE's network.

I'll be giving those people a call sometime in the near future.  Maybe what's floating around the Internet publicity-wise isn't true.  Otherwise I'm on hold with them until 2013.  *grumble*

broquea

#11
I'm on Comcast with native IPv6 and use a dlink dir-825 which gets a /64 from dhcpv6-pd and advertises on the lan, and then I secure each host individually. Also I think the above reply was for your other thread?

colonelf74

Even funnier, while tinkering around with my Airport Extreme I discovered a way to "get back to Comcast".

I'm now running an automatically configured tunnel from my home network out to the wonderous world of Comcast IPv6.

As for why I can't have native IPv6, the dang Airport Extreme refuses to route it.  I think.
I can say I've tried it and it doesn't work.

colonelf74

Well, thanks a bunch to HE, but I'm on to Comcast now, via multiple IPv6 tunnels.

Now I've got a firewalling question to ask.  Time for a new thread.