• Welcome to Hurricane Electric's IPv6 Tunnel Broker Forums.

Diagnostic advice - can ping, can't http

Started by arkanesystems, June 19, 2013, 09:59:48 AM

Previous topic - Next topic

arkanesystems

I'm suddenly having trouble accessing many (most?) IPv6 web sites over my tunnel, and could use some diagnostic advice.

Suddenly, in this case, means "started yesterday, with no local configuration changes, after weeks of working just fine".  Various IPv6-enabled web sites (google.com, facebook.com, tunnelbroker.net, www.bing.com, etc., etc.) either won't load at all, or load very, very slowly.  Rather frustratingly, I've been unable to find anything to suggest why: all the sites in question respond to ping just fine, and indeed also to ping with a buffer size of 1280.  Nothing suspicious - at least to my eyes - is showing up in the router logs or other diagnostics.  Those IPv6 test sites that load don't find any problems with my configuration.

In short, I'm stumped.  Any thoughts on where I should look next, and what for?

Thanks in advance,

Alistair

kasperd

I tried pinging arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net from a host with native IPv6 to see if I could spot any obvious problems.
$ ping6 -n -c 200 -i 0.2 arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net
PING arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net(2001:470:1f0e:43e::2) 56 data bytes
64 bytes from 2001:470:1f0e:43e::2: icmp_seq=1 ttl=54 time=154 ms
64 bytes from 2001:470:1f0e:43e::2: icmp_seq=2 ttl=54 time=155 ms

64 bytes from 2001:470:1f0e:43e::2: icmp_seq=199 ttl=54 time=155 ms
64 bytes from 2001:470:1f0e:43e::2: icmp_seq=200 ttl=54 time=165 ms

--- arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net ping statistics ---
200 packets transmitted, 200 received, 0% packet loss, time 39895ms
rtt min/avg/max/mdev = 153.899/157.066/181.787/4.574 ms
$ ping6 -s $[1280-48] -n -c 1 arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net
PING arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net(2001:470:1f0e:43e::2) 1232 data bytes
1240 bytes from 2001:470:1f0e:43e::2: icmp_seq=1 ttl=54 time=158 ms

--- arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 158.938/158.938/158.938/0.000 ms
$ ping6 -s $[1480-48] -n -c 1 arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net
PING arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net(2001:470:1f0e:43e::2) 1432 data bytes
1440 bytes from 2001:470:1f0e:43e::2: icmp_seq=1 ttl=54 time=158 ms

--- arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 158.053/158.053/158.053/0.000 ms
$ ping6 -s $[1500-48] -n -c 1 arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net
PING arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net(2001:470:1f0e:43e::2) 1452 data bytes
1460 bytes from 2001:470:1f0e:43e::2: icmp_seq=1 ttl=54 time=159 ms

--- arkanesystems-1-pt.tunnel.tserv8.dal1.ipv6.he.net ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 159.550/159.550/159.550/0.000 ms
$
Nothing obviously wrong. No measurable packet loss, no symptoms of MTU problems.

What you should do from your end is to get a packet dump of the traffic such that you can see the exact sequence of packets you send and receive on one of the TCP connections, where you experience poor performance. You should also check the roundtrip time to the servers you get slow responses from. Possibly they are doing geolocation incorrectly and send you to a server on the other side of the globe.

arkanesystems

Thanks for that; that confirms what I think I was seeing.

Roundtrip times look okay to me:

PS Z:\> ping www.google.com

Pinging www.google.com [2607:f8b0:4000:800::1010] with 32 bytes of data:
Reply from 2607:f8b0:4000:800::1010: time=72ms
Reply from 2607:f8b0:4000:800::1010: time=72ms
Reply from 2607:f8b0:4000:800::1010: time=71ms
Reply from 2607:f8b0:4000:800::1010: time=70ms

Ping statistics for 2607:f8b0:4000:800::1010:
    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),
Approximate round trip times in milli-seconds:
    Minimum = 70ms, Maximum = 72ms, Average = 71ms


I've got a Wireshark capture of the traffic going to and from the server(s) during one extremely-slow page load, but I must confess that I've very little idea what I might be looking for in it...

Alistair

kasperd

Quote from: arkanesystems on June 21, 2013, 09:04:22 AMI've got a Wireshark capture of the traffic going to and from the server(s) during one extremely-slow page load, but I must confess that I've very little idea what I might be looking for in it...
Intimate knowledge about TCP certainly helps diagnosing such traces. And sometimes what you need to look for is those packets, which are missing from the trace.

In general when debugging such performance issues, you want to find the time interval where the connection stalls. In the trace you may be looking for either a long interval between two packets, where nothing was send during the intermediate time. But sometimes you may see either one or both ends sending the same packets over multiple times, making no progress.

If you are able to identify the period where the connection stalls, you need to look at the last few packets before the stall, the packets send during the stall (if any), and the first packets after the stall. One possible problem you may be looking for is a situation where one party is trying to send a packet during the stall, which doesn't make it to the other end. Depending on where such a packet is lost, it may not even make it to the place where you are dumping traffic, which is why you sometimes need to look for what is missing from the trace.

Also remember that ICMPv6 packets related to the TCP connection can be of critical importance (in particular packet too big messages are important).

If you need help understanding the trace, I could take a look and tell you, what I think may be happening.

arkanesystems

Thanks very much for that!

As it happens, serendipitiously, while I was doing some googling to figure out what the trace was telling me, I stumbled across a reference to a bug in IOS 12.3 that produced symptoms like the ones I've been having, and sure enough, upgrading my router to IOS 12.4 fixed the problem.  So... not quite as intentionally fixed as planned, but at least I don't have the issue any more.  :)

Thanks,

Alistair