High packet loss through tunnelbroker tunnel.

marioxcc · January 14, 2014, 01:45:47 PM

Hello.

I'm using the IPv6 tunnel service (Normal, /64) from my residential ADSL connection to get IPv6 connectivity as my ISP don't provides IPv6 service yet. I have noticed an increased packet loss. Here is one example:

mario@mario-laptop:~$ sudo ping -c 200 -i 0.05 -s 1400 -n -q he.net
PING he.net (216.218.186.2) 1400(1428) bytes of data.

--- he.net ping statistics ---
200 packets transmitted, 200 received, 0% packet loss, time 10450ms
rtt min/avg/max/mdev = 135.535/217.233/502.044/71.940 ms, pipe 10
mario@mario-laptop:~$ sudo ping6 -c 200 -i 0.05 -s 1400 -n -q he.net
PING he.net(2001:470:0:76::2) 1400 data bytes

--- he.net ping statistics ---
200 packets transmitted, 154 received, 23% packet loss, time 10410ms
rtt min/avg/max/mdev = 168.243/273.917/441.806/61.529 ms, pipe 9

Measurements in a different moment and to a different host:

mario@mario-laptop:~$ sudo ping -c 100 -i 0.05 -s 1400 -n -q en.wikipedia.org
PING text-lb.eqiad.wikimedia.org (208.80.154.224) 1400(1428) bytes of data.

--- text-lb.eqiad.wikimedia.org ping statistics ---
100 packets transmitted, 90 received, 10% packet loss, time 5360ms
rtt min/avg/max/mdev = 180.385/255.738/394.109/46.845 ms, pipe 7
mario@mario-laptop:~$ sudo ping6 -c 100 -i 0.05 -s 1400 -n -q en.wikipedia.org
PING en.wikipedia.org(2620:0:861:ed1a::1) 1400 data bytes

--- en.wikipedia.org ping statistics ---
100 packets transmitted, 57 received, 43% packet loss, time 5225ms
rtt min/avg/max/mdev = 752.173/1180.984/1542.387/222.351 ms, pipe 29

The computer is connected to the modem via 802.11g. I would by far prefer a wired connection but that's currently not possible. The wireless link is very unpredictable, sometimes it has a very high packet loss ratio and sometimes not but nonetheless the 6in4 tunnel is almost always less reliable. Also, even through the modem advertise a MTU of 1500 by DHCP the actual MTU to the rest of the Internet is 1492. Sometimes I get average RTT above 1 s through the tunnel as demonstrated in the second test set. I'm using Debian GNU/Linux. Is there something I can do to fix this?.

Regards and thanks in advance.

marioxcc · January 17, 2014, 03:43:43 PM

Hi. I don't get the outrageous packet loss (I'm not sure what changed anyway) but the high latency is still there, especially when I'm using the tunnel (When my connection to the Internet is almost idle the latency to the server through is very reasonable).

kasperd · January 18, 2014, 02:01:48 AM

I tried to do a traceroute to your computer, but it appears to be offline. Could you post the output of a traceroute to the tunnel server (216.218.224.42 as far as I can tell) and a traceroute6 to he.net?

marioxcc · January 18, 2014, 07:41:09 AM

Quote from: kasperd on January 18, 2014, 02:01:48 AM
I tried to do a traceroute to your computer, but it appears to be offline. Could you post the output of a traceroute to the tunnel server (216.218.224.42 as far as I can tell) and a traceroute6 to he.net?

Sure. Here is the traceroute using native IPv4, both with TCP and protocol 41. As you can see, it seems that protocol 41 is treated somewhat differently.

root@mario-laptop:~# traceroute -T -z 0.1 216.218.224.42
traceroute to 216.218.224.42 (216.218.224.42), 30 hops max, 60 byte packets
1 dsldevice.lan (192.168.1.254) 11.890 ms 5.647 ms 5.602 ms
2 dsl-servicio-l200.uninet.net.mx (200.38.193.226) 24.934 ms 25.844 ms 25.741 ms
3 bb-la-grand-5-tge0-7-0-10.uninet.net.mx (189.246.218.38) 78.791 ms 77.281 ms 81.261 ms
4 * if-3-2.icore1.EQL-LosAngeles.as6453.net (206.82.129.101) 80.079 ms 73.035 ms
5 if-4-28.tcore2.LVW-LosAngeles.as6453.net (216.6.84.53) 73.040 ms 73.989 ms 78.967 ms
6 if-2-2.tcore1.LVW-LosAngeles.as6453.net (66.110.59.1) 72.063 ms 73.039 ms 74.004 ms
7 66.110.59.66 (66.110.59.66) 73.804 ms 71.842 ms 73.848 ms
8 hurricane-ic-138362-las-bb1.c.telia.net (213.248.67.142) 91.845 ms 91.795 ms 91.843 ms
9 10ge1-3.core1.lax2.he.net (72.52.92.122) 97.846 ms 98.778 ms 86.710 ms
10 10ge2-3.core1.phx2.he.net (184.105.222.85) 89.710 ms 88.587 ms 88.575 ms
11 10ge5-3.core1.dal1.he.net (184.105.222.78) 84.642 ms 84.644 ms 84.436 ms
12 tserv1.dal1.he.net (216.218.224.42) 84.187 ms 85.196 ms 84.217 ms

root@mario-laptop:~# traceroute -P 41 -z 0.1 216.218.224.42
traceroute to 216.218.224.42 (216.218.224.42), 30 hops max, 60 byte packets
1 dsldevice.lan (192.168.1.254) 16.169 ms 11.595 ms 11.590 ms
2 dsl-servicio-l200.uninet.net.mx (200.38.193.226) 25.259 ms 25.271 ms 27.340 ms
3 bb-la-grand-5-tge0-7-0-10.uninet.net.mx (189.246.218.38) 81.319 ms 80.196 ms 76.257 ms
4 * if-3-2.icore1.EQL-LosAngeles.as6453.net (206.82.129.101) 73.179 ms *
5 * * *
6 * if-2-2.tcore1.LVW-LosAngeles.as6453.net (66.110.59.1) 72.594 ms 73.270 ms
7 66.110.59.66 (66.110.59.66) 73.200 ms 73.453 ms 73.419 ms
8 hurricane-ic-138362-las-bb1.c.telia.net (213.248.67.142) 86.554 ms 99.205 ms 98.282 ms
9 10ge1-3.core1.lax2.he.net (72.52.92.122) 88.227 ms 87.315 ms 87.354 ms
10 10ge2-3.core1.phx2.he.net (184.105.222.85) 90.283 ms 89.176 ms 89.166 ms
11 10ge5-3.core1.dal1.he.net (184.105.222.78) 85.264 ms 85.379 ms 85.331 ms
12 * * *
13 * * *
14 * * *
15 * * *^C

Here is a traceroute to the server I'm using:

root@mario-laptop:~# traceroute -T -z 0.1 184.105.253.10
traceroute to 184.105.253.10 (184.105.253.10), 30 hops max, 60 byte packets
1 * * dsldevice.lan (192.168.1.254) 52.182 ms
2 dsl-servicio-l200.uninet.net.mx (200.38.193.226) 27.984 ms 27.004 ms 25.535 ms
3 bb-la-grand-5-tge0-3-0-4.uninet.net.mx (201.125.48.174) 74.710 ms 78.613 ms 74.688 ms
4 if-3-2.icore1.EQL-LosAngeles.as6453.net (206.82.129.101) 83.520 ms 73.675 ms 81.602 ms
5 if-4-28.tcore2.LVW-LosAngeles.as6453.net (216.6.84.53) 72.423 ms 74.431 ms 73.548 ms
6 if-2-2.tcore1.LVW-LosAngeles.as6453.net (66.110.59.1) 76.382 ms 72.447 ms 76.420 ms
7 66.110.59.66 (66.110.59.66) 72.528 ms 72.526 ms 73.639 ms
8 hurricane-ic-138362-las-bb1.c.telia.net (213.248.67.142) 85.347 ms 87.314 ms 86.346 ms
9 10ge1-3.core1.lax2.he.net (72.52.92.122) 85.365 ms 87.341 ms 86.413 ms
10 10ge2-3.core1.phx2.he.net (184.105.222.85) 97.229 ms 98.266 ms 97.477 ms
11 10ge5-3.core1.dal1.he.net (184.105.222.78) 85.390 ms 85.049 ms 85.440 ms
12 184.105.253.10 (184.105.253.10) 87.948 ms 82.847 ms 84.521 ms

root@mario-laptop:~# traceroute -P 41 -z 0.1 -m 15 184.105.253.10
traceroute to 184.105.253.10 (184.105.253.10), 15 hops max, 60 byte packets
1 * dsldevice.lan (192.168.1.254) 98.985 ms 94.715 ms
2 dsl-servicio-l200.uninet.net.mx (200.38.193.226) 25.613 ms 26.117 ms 25.243 ms
3 bb-la-grand-5-tge0-3-0-4.uninet.net.mx (201.125.48.174) 76.012 ms 73.650 ms 77.900 ms
4 * * if-3-2.icore1.EQL-LosAngeles.as6453.net (206.82.129.101) 80.395 ms
5 if-4-28.tcore2.LVW-LosAngeles.as6453.net (216.6.84.53) 72.402 ms 72.239 ms *
6 if-2-2.tcore1.LVW-LosAngeles.as6453.net (66.110.59.1) 71.959 ms 74.273 ms 74.329 ms
7 66.110.59.66 (66.110.59.66) 71.150 ms 72.063 ms 72.088 ms
8 hurricane-ic-138362-las-bb1.c.telia.net (213.248.67.142) 87.102 ms 86.256 ms 88.223 ms
9 10ge1-3.core1.lax2.he.net (72.52.92.122) 86.064 ms 87.235 ms 87.242 ms
10 10ge2-3.core1.phx2.he.net (184.105.222.85) 93.274 ms 89.121 ms 88.994 ms
11 10ge5-3.core1.dal1.he.net (184.105.222.78) 85.180 ms 96.430 ms *
12 10ge5-3.core1.dal1.he.net (184.105.222.78) 3355.566 ms * *
13 * * *
14 * * *
15 * * *

Traceroute to he.net:

traceroute to he.net (216.218.186.2), 30 hops max, 60 byte packets
1 dsldevice.lan (192.168.1.254) 22.158 ms 17.394 ms 17.680 ms
2 dsl-servicio-l200.uninet.net.mx (200.38.193.226) 25.476 ms 25.278 ms 26.204 ms
3 bb-la-grand-5-tge0-3-0-4.uninet.net.mx (201.125.48.174) 77.452 ms 76.153 ms 72.146 ms
4 Vlan553.icore1.EQL-LosAngeles.as6453.net (206.82.129.73) 73.395 ms 83.034 ms *
5 * * *
6 if-2-2.tcore1.LVW-LosAngeles.as6453.net (66.110.59.1) 73.106 ms 72.120 ms 72.128 ms
7 66.110.59.66 (66.110.59.66) 72.140 ms 71.371 ms 70.946 ms
8 hurricane-ic-138362-las-bb1.c.telia.net (213.248.67.142) 86.900 ms 87.072 ms 86.887 ms
9 100ge15-1.core1.sjc2.he.net (184.105.223.249) 95.722 ms 94.707 ms 93.792 ms
10 10ge1-1.core1.fmt1.he.net (72.52.92.109) 104.464 ms 97.580 ms 97.610 ms
11 he.net (216.218.186.2) 98.301 ms 98.282 ms 95.614 ms

Ping to the 6in4 server, through the tunnel and outside the tunnel with my Internet connection otherwise unloaded (My original post contains a similar test under load, which demonstrates the problem):

root@mario-laptop:~# ping6 -s 1400 -c 200 -i 0.05 -n -q 2001:470:1f0e:a81::1
PING 2001:470:1f0e:a81::1(2001:470:1f0e:a81::1) 1400 data bytes

--- 2001:470:1f0e:a81::1 ping statistics ---
200 packets transmitted, 200 received, 0% packet loss, time 10104ms
rtt min/avg/max/mdev = 124.495/127.390/168.189/4.300 ms, pipe 4
root@mario-laptop:~# ping -s 1400 -c 200 -i 0.05 -n -q 184.105.253.10
PING 184.105.253.10 (184.105.253.10) 1400(1428) bytes of data.

--- 184.105.253.10 ping statistics ---
200 packets transmitted, 196 received, 2% packet loss, time 10158ms
rtt min/avg/max/mdev = 123.072/130.268/218.439/14.033 ms, pipe 5

Regards.

marioxcc · January 18, 2014, 07:51:54 AM

Here is a repetition of the last test, but under slight load:

root@mario-laptop:~# ping6 -s 1400 -c 200 -i 0.05 -n -q 2001:470:1f0e:a81::1 ; ping -s 1400 -c 200 -i 0.05 -n -q 184.105.253.10
PING 2001:470:1f0e:a81::1(2001:470:1f0e:a81::1) 1400 data bytes

--- 2001:470:1f0e:a81::1 ping statistics ---
200 packets transmitted, 127 received, 36% packet loss, time 10526ms
rtt min/avg/max/mdev = 125.385/408.957/1668.243/437.315 ms, pipe 32
PING 184.105.253.10 (184.105.253.10) 1400(1428) bytes of data.

--- 184.105.253.10 ping statistics ---
200 packets transmitted, 200 received, 0% packet loss, time 10444ms
rtt min/avg/max/mdev = 124.434/166.468/347.902/29.103 ms, pipe 7

kasperd · January 18, 2014, 08:47:30 AM

Quote from: marioxcc on January 18, 2014, 07:41:09 AMAs you can see, it seems that protocol 41 is treated somewhat differently.

I don't see any difference to speak of. At hop 4 through 6 a few responses were missing. But a few missing responses from intermediate hops in a traceroute, though mildly annoying while debugging, is rarely a sign of a real problem for communication. From hop 7 and onwards, the two traces look the same again.

At the final hop, you do see a difference. The TCP packets do get a reply (most likely RST), but the protocol 41 packets get no reply. The lack of reply to the protocol 41 packets would be due to the tunnel server actually receiving the packet and then discarding them, because they do not contain a valid IPv6 payload. Additional debugging could be performed using a tool, which could produce valid protocol 41 packets and vary both the inner and outer hop limits.

From the traceroutes I'd conclude the IPv4 path between you and 216.218.224.42 appear to be working just fine.

I don't know where the IP address 184.105.253.10 came from. Because everything I see suggests you are using the tunnel server on 216.218.224.42. I'd still like to see a traceroute6 output.

marioxcc · January 18, 2014, 09:27:17 AM

Quote from: kasperd on January 18, 2014, 08:47:30 AM
Quote from: marioxcc on January 18, 2014, 07:41:09 AMAs you can see, it seems that protocol 41 is treated somewhat differently.
I don't see any difference to speak of. At hop 4 through 6 a few responses were missing. But a few missing responses from intermediate hops in a traceroute, though mildly annoying while debugging, is rarely a sign of a real problem for communication. From hop 7 and onwards, the two traces look the same again.

At the final hop, you do see a difference. The TCP packets do get a reply (most likely RST), but the protocol 41 packets get no reply. The lack of reply to the protocol 41 packets would be due to the tunnel server actually receiving the packet and then discarding them, because they do not contain a valid IPv6 payload. Additional debugging could be performed using a tool, which could produce valid protocol 41 packets and vary both the inner and outer hop limits.

Right.

Quote from: kasperd on January 18, 2014, 08:47:30 AM
From the traceroutes I'd conclude the IPv4 path between you and 216.218.224.42 appear to be working just fine.

I don't know where the IP address 184.105.253.10 came from. Because everything I see suggests you are using the tunnel server on 216.218.224.42. I'd still like to see a traceroute6 output.

That's what is shown in the "Tunnel Details" page, it's the PoP nearest me.

I forgot to include the IPv6 traceroute, sorry for that. Here is it:

traceroute to he.net (2001:470:0:76::2), 30 hops max, 80 byte packets
1 marioxcc-1.tunnel.tserv8.dal1.ipv6.he.net (2001:470:1f0e:a81::1) 85.605 ms 85.817 ms 86.765 ms
2 ge2-14.core1.dal1.he.net (2001:470:0:78::1) 83.086 ms 82.048 ms 83.100 ms
3 10ge2-4.core1.phx2.he.net (2001:470:0:258::1) 100.592 ms 101.642 ms 102.301 ms
4 10ge15-6.core1.lax2.he.net (2001:470:0:24a::2) 121.997 ms 130.204 ms 117.602 ms
5 10ge2-1.core1.lax1.he.net (2001:470:0:72::1) 111.253 ms 10ge9-5.core1.sjc2.he.net (2001:470:0:16a::1) 127.743 ms 127.804 ms
6 10ge1-1.core1.fmt1.he.net (2001:470:0:2f::1) 123.123 ms 10ge4-2.core3.fmt2.he.net (2001:470:0:18d::1) 120.411 ms 132.444 ms
7 10ge2-1.core1.fmt1.he.net (2001:470:0:2d::1) 127.336 ms he.net (2001:470:0:76::2) 122.156 ms 10ge2-1.core1.fmt1.he.net (2001:470:0:2d::1) 120.267 ms

kasperd · January 18, 2014, 09:57:01 AM

Quote from: marioxcc on January 18, 2014, 09:27:17 AMThat's what is shown in the "Tunnel Details" page, it's the PoP nearest me.

Does it also tell you the exact name of the tunnel server? The IP does not have any reverse DNS. You are probably using the correct IP address, it just doesn't look the way I'd expect it to from here.

Quote from: marioxcc on January 18, 2014, 09:27:17 AMI forgot to include the IPv6 traceroute, sorry for that. Here is it:

traceroute to he.net (2001:470:0:76::2), 30 hops max, 80 byte packets
1 marioxcc-1.tunnel.tserv8.dal1.ipv6.he.net (2001:470:1f0e:a81::1) 85.605 ms 85.817 ms 86.765 ms
2 ge2-14.core1.dal1.he.net (2001:470:0:78::1) 83.086 ms 82.048 ms 83.100 ms
3 10ge2-4.core1.phx2.he.net (2001:470:0:258::1) 100.592 ms 101.642 ms 102.301 ms
4 10ge15-6.core1.lax2.he.net (2001:470:0:24a::2) 121.997 ms 130.204 ms 117.602 ms
5 10ge2-1.core1.lax1.he.net (2001:470:0:72::1) 111.253 ms 10ge9-5.core1.sjc2.he.net (2001:470:0:16a::1) 127.743 ms 127.804 ms
6 10ge1-1.core1.fmt1.he.net (2001:470:0:2f::1) 123.123 ms 10ge4-2.core3.fmt2.he.net (2001:470:0:18d::1) 120.411 ms 132.444 ms
7 10ge2-1.core1.fmt1.he.net (2001:470:0:2d::1) 127.336 ms he.net (2001:470:0:76::2) 122.156 ms 10ge2-1.core1.fmt1.he.net (2001:470:0:2d::1) 120.267 ms

That trace looks fine. Could you run another traceroute6 command next time you experience high latency.

marioxcc · January 19, 2014, 11:08:20 AM

Quote from: kasperd on January 18, 2014, 09:57:01 AM
Quote from: marioxcc on January 18, 2014, 09:27:17 AMThat's what is shown in the "Tunnel Details" page, it's the PoP nearest me.
Does it also tell you the exact name of the tunnel server? The IP does not have any reverse DNS. You are probably using the correct IP address, it just doesn't look the way I'd expect it to from here.

It doesn't shows a name for the server. Both the tunnel creation and the tunnel details pages only states that it's located on Dallas and it's IP.

Quote from: kasperd on January 18, 2014, 09:57:01 AM
Quote from: marioxcc on January 18, 2014, 09:27:17 AMI forgot to include the IPv6 traceroute, sorry for that. Here is it:

traceroute to he.net (2001:470:0:76::2), 30 hops max, 80 byte packets
1 marioxcc-1.tunnel.tserv8.dal1.ipv6.he.net (2001:470:1f0e:a81::1) 85.605 ms 85.817 ms 86.765 ms
2 ge2-14.core1.dal1.he.net (2001:470:0:78::1) 83.086 ms 82.048 ms 83.100 ms
3 10ge2-4.core1.phx2.he.net (2001:470:0:258::1) 100.592 ms 101.642 ms 102.301 ms
4 10ge15-6.core1.lax2.he.net (2001:470:0:24a::2) 121.997 ms 130.204 ms 117.602 ms
5 10ge2-1.core1.lax1.he.net (2001:470:0:72::1) 111.253 ms 10ge9-5.core1.sjc2.he.net (2001:470:0:16a::1) 127.743 ms 127.804 ms
6 10ge1-1.core1.fmt1.he.net (2001:470:0:2f::1) 123.123 ms 10ge4-2.core3.fmt2.he.net (2001:470:0:18d::1) 120.411 ms 132.444 ms
7 10ge2-1.core1.fmt1.he.net (2001:470:0:2d::1) 127.336 ms he.net (2001:470:0:76::2) 122.156 ms 10ge2-1.core1.fmt1.he.net (2001:470:0:2d::1) 120.267 ms
That trace looks fine. Could you run another traceroute6 command next time you experience high latency.

Sure, here is it:
root@mario-laptop:~# traceroute -6 -T -z 0.1 he.net
traceroute to he.net (2001:470:0:76::2), 30 hops max, 80 byte packets
1 marioxcc-1.tunnel.tserv8.dal1.ipv6.he.net (2001:470:1f0e:a81::1) 787.008 ms 872.939 ms *
2 ge2-14.core1.dal1.he.net (2001:470:0:78::1) 682.742 ms 644.077 ms *
3 * 10ge2-4.core1.phx2.he.net (2001:470:0:258::1) 910.696 ms *
4 * 10ge15-6.core1.lax2.he.net (2001:470:0:24a::2) 763.820 ms 874.954 ms
5 10ge9-5.core1.sjc2.he.net (2001:470:0:16a::1) 842.333 ms 927.687 ms *
6 10ge1-1.core1.fmt1.he.net (2001:470:0:2f::1) 968.217 ms 923.580 ms 1017.215 ms
7 10ge2-1.core1.fmt1.he.net (2001:470:0:2d::1) 1001.242 ms 928.751 ms he.net (2001:470:0:76::2) 1062.293 ms

Additionally, here is a ping to the other side of the tunnel, both from outside and from inside:

root@mario-laptop:~# ping -q -c 100 -i 0.1 -n -s 1400 184.105.253.10
PING 184.105.253.10 (184.105.253.10) 1400(1428) bytes of data.

--- 184.105.253.10 ping statistics ---
100 packets transmitted, 93 received, 7% packet loss, time 10185ms
rtt min/avg/max/mdev = 127.467/219.350/500.283/82.390 ms, pipe 5

root@mario-laptop:~# ping6 -q -c 100 -i 0.1 -n -s 1400 2001:470:1f0e:a81::1
PING 2001:470:1f0e:a81::1(2001:470:1f0e:a81::1) 1400 data bytes

--- 2001:470:1f0e:a81::1 ping statistics ---
100 packets transmitted, 58 received, 42% packet loss, time 10330ms
rtt min/avg/max/mdev = 125.624/524.160/1856.687/549.140 ms, pipe 18

Thanks in advance. :).

kasperd · January 19, 2014, 03:49:44 PM

Quote from: marioxcc on January 19, 2014, 11:08:20 AMSure, here is it:
root@mario-laptop:~# traceroute -6 -T -z 0.1 he.net
traceroute to he.net (2001:470:0:76::2), 30 hops max, 80 byte packets
1 marioxcc-1.tunnel.tserv8.dal1.ipv6.he.net (2001:470:1f0e:a81::1) 787.008 ms 872.939 ms *
2 ge2-14.core1.dal1.he.net (2001:470:0:78::1) 682.742 ms 644.077 ms *
3 * 10ge2-4.core1.phx2.he.net (2001:470:0:258::1) 910.696 ms *
4 * 10ge15-6.core1.lax2.he.net (2001:470:0:24a::2) 763.820 ms 874.954 ms
5 10ge9-5.core1.sjc2.he.net (2001:470:0:16a::1) 842.333 ms 927.687 ms *
6 10ge1-1.core1.fmt1.he.net (2001:470:0:2f::1) 968.217 ms 923.580 ms 1017.215 ms
7 10ge2-1.core1.fmt1.he.net (2001:470:0:2d::1) 1001.242 ms 928.751 ms he.net (2001:470:0:76::2) 1062.293 ms

Additionally, here is a ping to the other side of the tunnel, both from outside and from inside:

root@mario-laptop:~# ping -q -c 100 -i 0.1 -n -s 1400 184.105.253.10
PING 184.105.253.10 (184.105.253.10) 1400(1428) bytes of data.

--- 184.105.253.10 ping statistics ---
100 packets transmitted, 93 received, 7% packet loss, time 10185ms
rtt min/avg/max/mdev = 127.467/219.350/500.283/82.390 ms, pipe 5

Based on that data, I think it would be useful to see both IPv4 traceroute to the tunnel server and IPv6 traceroute to some other location, where both are taken during a period of high latency. I think the previous IPv4 traceroute may have been taken at a time, where the latency was low. Though I am starting to suspect we are going to find the v4 and v6 sides of the tunnel server are both doing fine, and the tunnel server itself may turn out to be the bottleneck.

Did I already link to a previous posting, where I demonstrated a method to measure packet drops in both directions more or less independently?

marioxcc · January 20, 2014, 11:49:38 AM

Quote from: kasperd on January 19, 2014, 03:49:44 PMBased on that data, I think it would be useful to see both IPv4 traceroute to the tunnel server and IPv6 traceroute to some other location, where both are taken during a period of high latency.

Here is it:

root@mario-laptop:~# traceroute -4 -z 0.1 -q 5 184.105.253.10
traceroute to 184.105.253.10 (184.105.253.10), 30 hops max, 60 byte packets
1 dsldevice.lan (192.168.1.254) 87.054 ms 73.946 ms 76.404 ms 67.444 ms 111.670 ms
2 dsl-servicio-l200.uninet.net.mx (200.38.193.226) 578.202 ms 521.356 ms 438.617 ms 367.886 ms 399.132 ms
3 bb-la-grand-5-pos0-14-5-0.uninet.net.mx (201.125.48.54) 711.621 ms 743.567 ms 751.809 ms 817.721 ms 947.365 ms
4 Vlan552.icore1.EQL-LosAngeles.as6453.net (206.82.129.65) 1047.977 ms * * 950.824 ms *
5 * if-4-28.tcore2.LVW-LosAngeles.as6453.net (216.6.84.53) 1001.343 ms * * 1089.047 ms
6 * if-2-2.tcore1.LVW-LosAngeles.as6453.net (66.110.59.1) 1117.035 ms 1032.564 ms 1091.645 ms *
7 66.110.59.66 (66.110.59.66) 996.043 ms 909.640 ms 813.775 ms * 806.654 ms
8 hurricane-ic-138362-las-bb1.c.telia.net (213.248.67.142) 923.627 ms 854.775 ms 858.981 ms 783.757 ms 843.165 ms
9 10ge1-3.core1.lax2.he.net (72.52.92.122) 917.600 ms 938.064 ms 774.540 ms 776.835 ms 749.166 ms
10 10ge2-3.core1.phx2.he.net (184.105.222.85) 657.399 ms 583.464 ms 756.133 ms 769.112 ms 816.184 ms
11 10ge5-3.core1.dal1.he.net (184.105.222.78) 888.140 ms 886.990 ms 968.302 ms 871.717 ms 786.502 ms
12 184.105.253.10 (184.105.253.10) 708.111 ms 786.203 ms 915.318 ms 1088.226 ms 1339.538 ms
root@mario-laptop:~# traceroute -6 -z 0.1 -q 5 en.wikipedia.org

traceroute to en.wikipedia.org (2620:0:861:ed1a::1), 30 hops max, 80 byte packets
1 marioxcc-1.tunnel.tserv8.dal1.ipv6.he.net (2001:470:1f0e:a81::1) 1202.242 ms 1143.032 ms 1203.120 ms 1164.949 ms 1129.623 ms
2 ge2-14.core1.dal1.he.net (2001:470:0:78::1) 1029.443 ms 1047.834 ms 960.127 ms 859.947 ms 825.462 ms
3 10ge5-4.core1.atl1.he.net (2001:470:0:1b6::2) 760.820 ms 1122.684 ms 924.842 ms 1030.541 ms 922.392 ms
4 10ge16-5.core1.ash1.he.net (2001:470:0:1b5::1) 1050.991 ms * 1006.159 ms 1034.359 ms 981.119 ms
5 xe-5-3-3-500.cr1-eqiad.wikimedia.org (2001:470:0:1c0::2) 1029.644 ms 963.992 ms 999.735 ms 980.714 ms 963.025 ms
6 text-lb.eqiad.wikimedia.org (2620:0:861:ed1a::1) 1052.966 ms 943.506 ms 914.954 ms 963.356 ms 883.566 ms

Quote from: kasperd on January 19, 2014, 03:49:44 PM
I think the previous IPv4 traceroute may have been taken at a time, where the latency was low. Though I am starting to suspect we are going to find the v4 and v6 sides of the tunnel server are both doing fine, and the tunnel server itself may turn out to be the bottleneck.

That's right, it was taken in a low-latency time. The high latency and packet loss correlate very well with high usage of my Internet connection (But only with native (non-tunneled) traffic). The high-latency & high-packet-loss problem don't happens when I top my Internet connection with traffic going almost solely through 6in4 tunnel; meaning the server can handle just fine as much traffic as I can put/pull (Which makes sense, since this is a low-end residential connection transported through DSL). I doubt that the 6in4 server happens to be overload exactly when my Internet connection is under medium/heavy usage.

Quote from: kasperd on January 19, 2014, 03:49:44 PM
Did I already link to a previous posting, where I demonstrated a method to measure packet drops in both directions more or less independently?

Well, you just did ;). How can this tool help is troubleshooting this problem?.

Do you think than it's time for me to write to ipv6@he.net?.

Regards and thanks for your help.

kasperd · January 20, 2014, 12:41:18 PM

Quote from: marioxcc on January 20, 2014, 11:49:38 AMroot@mario-laptop:~# traceroute -4 -z 0.1 -q 5 184.105.253.10
traceroute to 184.105.253.10 (184.105.253.10), 30 hops max, 60 byte packets
1 dsldevice.lan (192.168.1.254) 87.054 ms 73.946 ms 76.404 ms 67.444 ms 111.670 ms
2 dsl-servicio-l200.uninet.net.mx (200.38.193.226) 578.202 ms 521.356 ms 438.617 ms 367.886 ms 399.132 ms
3 bb-la-grand-5-pos0-14-5-0.uninet.net.mx (201.125.48.54) 711.621 ms 743.567 ms 751.809 ms 817.721 ms 947.365 ms
4 Vlan552.icore1.EQL-LosAngeles.as6453.net (206.82.129.65) 1047.977 ms * * 950.824 ms *
5 * if-4-28.tcore2.LVW-LosAngeles.as6453.net (216.6.84.53) 1001.343 ms * * 1089.047 ms
6 * if-2-2.tcore1.LVW-LosAngeles.as6453.net (66.110.59.1) 1117.035 ms 1032.564 ms 1091.645 ms *
7 66.110.59.66 (66.110.59.66) 996.043 ms 909.640 ms 813.775 ms * 806.654 ms
8 hurricane-ic-138362-las-bb1.c.telia.net (213.248.67.142) 923.627 ms 854.775 ms 858.981 ms 783.757 ms 843.165 ms
9 10ge1-3.core1.lax2.he.net (72.52.92.122) 917.600 ms 938.064 ms 774.540 ms 776.835 ms 749.166 ms
10 10ge2-3.core1.phx2.he.net (184.105.222.85) 657.399 ms 583.464 ms 756.133 ms 769.112 ms 816.184 ms
11 10ge5-3.core1.dal1.he.net (184.105.222.78) 888.140 ms 886.990 ms 968.302 ms 871.717 ms 786.502 ms
12 184.105.253.10 (184.105.253.10) 708.111 ms 786.203 ms 915.318 ms 1088.226 ms 1339.538 ms

This is not what a traceroute is supposed to look like. At hop 2 you have latencies between 360 and 580 ms. And it does not improve at later hops. Your good traces from earlier showed less than 100 ms at every hop along a similar route.

What we are seeing here is a problem on the v4 path, and that problem is located close to your network. Possibly the problem might even be on your own router or modem, though that is not absolutely certain. The problems you are seeing could be caused by a combination of buffer bloat on your modem and flooding of your upstream. If that is not the culprit, you should ask your own ISP to explain why you see such huge latency spikes.

If the problem is due to buffer bloat, you should look into using traffic shaping on your router. If your router can be configured to do traffic shaping and cap the upstream at 95% of the actual upstream provided by the ISP, you should be able to avoid problems caused by buffer bloat on the modem.

Quote from: marioxcc on January 20, 2014, 11:49:38 AMThe high latency and packet loss correlate very well with high usage of my Internet connection (But only with native (non-tunneled) traffic). The high-latency & high-packet-loss problem don't happens when I top my Internet connection with traffic going almost solely through 6in4 tunnel; meaning the server can handle just fine as much traffic as I can put/pull

Actually it may be the other way around. If the bandwidth bottleneck is on the tunnel server (or on the v4 path between your ISP and the tunnel server), that bandwidth may cause TCP to scale down bandwidth usage and avoid overloading the modem.

As TCP is increasing bandwidth usage, it will eventually hit a bottleneck on some hop along the route. Once a bottleneck is hit, a buffer will start filling up. If that buffer is too large compared to the bandwidth on that hop, you will see latency increase. But if you can somehow get the bandwidth bottleneck to be smaller on some other hop on the path, then it is a different buffer, which fills up. Assuming that buffer has a more reasonable size, performance can actually be improved by reducing throughput on some hop.

In other words by running most of your traffic through the tunnel, you might be moving the bottleneck from your modem to some other place and thus avoid the problem. However, if this is really the problem, then moving the bottleneck from your modem to your router may be a better solution. That is why I suggests capping your router at 95% of the capacity of the modem, if the router can do so.

Quote from: marioxcc on January 20, 2014, 11:49:38 AM
Quote from: kasperd on January 19, 2014, 03:49:44 PM
Did I already link to a previous posting, where I demonstrated a method to measure packet drops in both directions more or less independently?

Well, you just did ;). How can this tool help is troubleshooting this problem?

If you have a Linux host on your LAN, you can download the client and let it communicate with the server IP which I mentioned in that post. It will measure packet drops on just the downstream (ignoring packet drops on the upstream). If my hunch is right, you should see much lower packet drop on the downstream than you do on the upstream.

Quote from: marioxcc on January 20, 2014, 11:49:38 AMDo you think than it's time for me to write to ipv6@he.net?

Based on your latest traceroute, I think the problem is on your end, in which case HE couldn't help you.

marioxcc · January 22, 2014, 02:28:05 PM

Hi.

The problem with the hypotesis than it's buffer bloat what causes this problem is that it fails to explain why 6in4 traffic is treated differently than the rest. For instance, under the following test which intentionally puts more traffic than what the connection can handle, the IPv6 ping reports far worser statistics than the native IPv4 test, the Internet connection was otherwise unload (save for the occasional traffic from NTP, etc...). Note than the 2 command were run on parallel.

root@mario-laptop:/home/mario# ping6 -q -s 1400 -i 0.02 -c 400 2001:470:1f0e:a81::1 & ping -q -s 1400 -i 0.02 -c 400 184.105.253.10 &
[1] 5167
[2] 5168
root@mario-laptop:/home/mario# PING 184.105.253.10 (184.105.253.10) 1400(1428) bytes of data.
PING 2001:470:1f0e:a81::1(2001:470:1f0e:a81::1) 1400 data bytes

--- 184.105.253.10 ping statistics ---
400 packets transmitted, 290 received, 27% packet loss, time 9946ms
rtt min/avg/max/mdev = 120.460/263.358/522.778/91.864 ms, pipe 22

--- 2001:470:1f0e:a81::1 ping statistics ---
400 packets transmitted, 44 received, 89% packet loss, time 11033ms
rtt min/avg/max/mdev = 154.642/1583.399/6152.360/1307.172 ms, pipe 220

Also, it seems than sometimes running a BitTorrent client even with a marginal bandwidth usage (But several native (IPv4) TCP connections and UDP connection-like usage (µTP)) will make the IPv6 performance fall.

I can only conjecture as for the reason of this weird behavior but I suspect than my ISP may attempt to throttle down protcol-41 traffic when it notices congestion. However, as I said, when almost all traffic goes through then IPv6 tunnel then no performance degradation happens (Some ms more for ping, however, but no packet loss or anything like that. The tunnel don't seems to be the bottleneck either, the traffic takes up all the contracted bandwidth).

Do you know how may I tunnel the 6in4 tunnel itself through a OpenVPN tunnel so to avoid protocol-specific interference from my ISP?. I think doing this and then repeating the test may give further insights into the roots of this problem.

Thanks for your help.

kasperd · January 22, 2014, 03:06:12 PM

Quote from: marioxcc on January 22, 2014, 02:28:05 PMThe problem with the hypotesis than it's buffer bloat what causes this problem is that it fails to explain why 6in4 traffic is treated differently than the rest.

The delay happens so early in the trace, that it can more or less only be explained by buffering. If there is a measurable difference between the protocols, it most likely means that the device which is buffering uses multiple queues. So what you want to find out next may be, how many queues are there, what criteria decides the queue a packet goes into, and perhaps if buffer memory is allocated statically to each queue or reallocated dynamically between them.

I recently noticed on my own connection, that queues appeared to be split by IPv4 address. My protocol 41 traffic was using a different IPv4 address than my other traffic. And since I was mostly pushing 6in4 traffic, it was only IPv6 traffic that got slowed down while IPv4 traffic went through unaffected.

Quote from: marioxcc on January 22, 2014, 02:28:05 PMFor instance, under the following test which intentionally puts more traffic than what the connection can handle, the IPv6 ping reports far worser statistics than the native IPv4 test, the Internet connection was otherwise unload (save for the occasional traffic from NTP, etc...). Note than the 2 command were run on parallel.

root@mario-laptop:/home/mario# ping6 -q -s 1400 -i 0.02 -c 400 2001:470:1f0e:a81::1 & ping -q -s 1400 -i 0.02 -c 400 184.105.253.10 &
[1] 5167
[2] 5168
root@mario-laptop:/home/mario# PING 184.105.253.10 (184.105.253.10) 1400(1428) bytes of data.
PING 2001:470:1f0e:a81::1(2001:470:1f0e:a81::1) 1400 data bytes

--- 184.105.253.10 ping statistics ---
400 packets transmitted, 290 received, 27% packet loss, time 9946ms
rtt min/avg/max/mdev = 120.460/263.358/522.778/91.864 ms, pipe 22

--- 2001:470:1f0e:a81::1 ping statistics ---
400 packets transmitted, 44 received, 89% packet loss, time 11033ms
rtt min/avg/max/mdev = 154.642/1583.399/6152.360/1307.172 ms, pipe 220

That does look strange. There is no obvious explanation for those numbers.

Quote from: marioxcc on January 22, 2014, 02:28:05 PMAlso, it seems than sometimes running a BitTorrent client even with a marginal bandwidth usage (But several native (IPv4) TCP connections and UDP connection-like usage (µTP)) will make the IPv6 performance fall.

Possibly connection tracking plays a role in this picture, but this is just a random guess.

Quote from: marioxcc on January 22, 2014, 02:28:05 PMI can only conjecture as for the reason of this weird behavior but I suspect than my ISP may attempt to throttle down protcol-41 traffic when it notices congestion. However, as I said, when almost all traffic goes through then IPv6 tunnel then no performance degradation happens (Some ms more for ping, however, but no packet loss or anything like that. The tunnel don't seems to be the bottleneck either, the traffic takes up all the contracted bandwidth).

The traffic may be put into different priority bands. That would explain why one queue sees more latency and packet loss than another. The queues might simply not have the same priority. The queue which is used for your protocol 41 packets does exhibit symptoms of buffer bloat. And when there are multiple queues, it is entirely possible they are not all affected by buffer bloat.

For bulk traffic like bittorrent transfers, a bit of buffer bloat is not bad. As long as interactive traffic doesn't have to go through the same queue. So it may be the problem is misclassification of traffic rather than bloat per se.

If your protocol 41 traffic gets classified as bulk traffic while other customers' traffic gets classified as interactive and adversely affects your protocol 41 traffic, then you should complain loudly. OTOH, if the problem only kicks in once you reach the bandwidth limit specified in your subscription, and you can avoid problems by staying within that limit, then it would be more productive to stay within that limit. If that is the case, then my previous suggestion should still work just the same. By controlling where the bottleneck is and how traffic is prioritized at the bottleneck, you can get a better result. Once you have moved the bottleneck to where you can control it, it no longer matters how the previous bottleneck would treat traffic.

Quote from: marioxcc on January 22, 2014, 02:28:05 PMDo you know how may I tunnel the 6in4 tunnel itself through a OpenVPN tunnel so to avoid protocol-specific interference from my ISP?.

I do not have that much experience with VPNs. In principle what you ask for should be easy, in practice it may very well depend on the VPN implementation. If you have VPN and 6in4 tunnel both able to pass packets, I can try to come with suggestions on how you get traffic routed the way you want it. VPN and 6in4 could end up next to each other or with VPN inside 6in4 or 6in4 inside VPN. The routing table has a lot to say in, which of those will be the case.

Quote from: marioxcc on January 22, 2014, 02:28:05 PMI think doing this and then repeating the test may give further insights into the roots of this problem.

Sure, the information you could find by experimenting with 6in4 inside VPN could potentially tell us something more about the root of the problem.

News:

High packet loss through tunnelbroker tunnel.