• Welcome to Hurricane Electric's IPv6 Tunnel Broker Forums.

IPv4 packet loss to ASH1 tunnel server via GBLX

Started by Napsterbater, January 02, 2014, 10:56:05 PM

Previous topic - Next topic

Napsterbater

Here is a pathping, Tunnel ID is 96243, the IPv4 address for it is the source if you need it.

C:\Users\Terry Jackson>pathping 216.66.22.2

Tracing route to tserv2.ash1.he.net [216.66.22.2]
over a maximum of 30 hops:
  0  bos.napshome.local [10.0.1.39]
  1  car1.napshome.local [10.0.1.3]
  2  gpon-local.xceleratebroadband.com [10.40.0.1]
  3  gpon-local.xceleratebroadband.com [172.16.1.2]
  4  38.122.47.121
  5  te0-7-0-5.ccr21.atl04.atlas.cogentco.com [154.54.85.234]
  6  gblx.atl04.atlas.cogentco.com [154.54.11.154]
  7  te2-1-10g.ar3.dca3.gblx.net [67.17.108.138]
  8  hurricane-electric-llc-ashburn.tengigabitethernet4-4.ar3.dca3.gblx.net [64.214.121.170]
  9  tserv2.ash1.he.net [216.66.22.2]

Computing statistics for 225 seconds...
            Source to Here   This Node/Link
Hop  RTT    Lost/Sent = Pct  Lost/Sent = Pct  Address
  0                                           bos.napshome.local [10.0.1.39]

                                0/ 100 =  0%   |
  1    0ms     0/ 100 =  0%     0/ 100 =  0%  car1.napshome.local [10.0.1.3]
                                0/ 100 =  0%   |
  2    2ms     0/ 100 =  0%     0/ 100 =  0%  gpon-local.xceleratebroadband.com[10.40.0.1]
                                0/ 100 =  0%   |
  3    1ms     0/ 100 =  0%     0/ 100 =  0%  gpon-local.xceleratebroadband.com [172.16.1.2]
                                0/ 100 =  0%   |
  4   17ms     0/ 100 =  0%     0/ 100 =  0%  38.122.47.121
                                0/ 100 =  0%   |
  5   11ms     0/ 100 =  0%     0/ 100 =  0%  te0-7-0-5.ccr21.atl04.atlas.cogentco.com [154.54.85.234]
                                0/ 100 =  0%   |
  6   12ms     0/ 100 =  0%     0/ 100 =  0%  gblx.atl04.atlas.cogentco.com [154.54.11.154]
                                0/ 100 =  0%   |
  7   39ms     0/ 100 =  0%     0/ 100 =  0%  te2-1-10g.ar3.dca3.gblx.net [67.17.108.138]
                               15/ 100 = 15%   |
  8   59ms    15/ 100 = 15%     0/ 100 =  0%  hurricane-electric-llc-ashburn.tengigabitethernet4-4.ar3.dca3.gblx.net [64.214.121.170]
                                2/ 100 =  2%   |
  9   36ms    17/ 100 = 17%     0/ 100 =  0%  tserv2.ash1.he.net [216.66.22.2]

Trace complete.

kasperd


broquea

Odd loss happening right there, might be asymmetric and loss on the return path seeing as HE's LG uses Telia to get back to you.

core1.ash1.he.net> traceroute 38.122.47.121 source-ip 216.218.252.169 numeric

Tracing the route to IP node (38.122.47.121) from 1 to 30 hops

  1    75 ms   75 ms   74 ms 213.248.67.117
  2    66 ms   74 ms  111 ms 154.54.11.93
  3    74 ms   75 ms  118 ms 154.54.41.54
  4    70 ms   86 ms   74 ms 154.54.31.93
  5    74 ms   61 ms   75 ms 154.54.3.170
  6    75 ms   75 ms   88 ms 38.122.47.121
# Entry cached for another 56 seconds.

kasperd

Quote from: broquea on January 03, 2014, 01:05:32 AM
Odd loss happening right there, might be asymmetric and loss on the return path seeing as HE's LG uses Telia to get back to you.
That is a possible explanation. Are there some good tools to debug that sort of scenario? I was wondering if it would be possible to create a traceroute-like tool, which could work around that problem. But anything I could come up with would break due to ingress filtering.

Napsterbater

At this time it seems to be fixed. But I was having issues with all IPv6 sites during the problem though, and after trying a different system on the same tunnel server but on a different ISP and everything worked fine I did multiple pings to the tunnel server and saw some loss so i did the pathping.

Here is the current.
C:\Users\Napsterbater>pathping 216.66.22.2

Tracing route to tserv2.ash1.he.net [216.66.22.2]
over a maximum of 30 hops:
  0  Bos.napshome.local [10.0.1.130]
  1  car1.napshome.local [10.0.1.3]
  2  gpon-local.xceleratebroadband.com [10.40.0.1]
  3  gpon-local.xceleratebroadband.com [172.16.1.2]
  4  38.122.47.121
  5  te0-7-0-5.ccr21.atl04.atlas.cogentco.com [154.54.85.234]
  6  gblx.atl04.atlas.cogentco.com [154.54.11.154]
  7  te7-2-10g.ar3.dca3.gblx.net [67.17.107.194]
  8  hurricane-electric-llc-ashburn.tengigabitethernet4-4.ar3.dca3.gblx.net [64.214.121.170]
  9  tserv2.ash1.he.net [216.66.22.2]

Computing statistics for 225 seconds...
            Source to Here   This Node/Link
Hop  RTT    Lost/Sent = Pct  Lost/Sent = Pct  Address
  0                                           Bos.napshome.local [10.0.1.130]
                                0/ 100 =  0%   |
  1    4ms     0/ 100 =  0%     0/ 100 =  0%  car1.napshome.local [10.0.1.3]
                                0/ 100 =  0%   |
  2    2ms     0/ 100 =  0%     0/ 100 =  0%  gpon-local.xceleratebroadband.com[10.40.0.1]
                                0/ 100 =  0%   |
  3    1ms     0/ 100 =  0%     0/ 100 =  0%  gpon-local.xceleratebroadband.com[172.16.1.2]
                                0/ 100 =  0%   |
  4   18ms     0/ 100 =  0%     0/ 100 =  0%  38.122.47.121
                                0/ 100 =  0%   |
  5   11ms     0/ 100 =  0%     0/ 100 =  0%  te0-7-0-5.ccr21.atl04.atlas.cogentco.com [154.54.85.234]
                                0/ 100 =  0%   |
  6   12ms     0/ 100 =  0%     0/ 100 =  0%  gblx.atl04.atlas.cogentco.com [154.54.11.154]
                                0/ 100 =  0%   |
  7   41ms     0/ 100 =  0%     0/ 100 =  0%  te7-2-10g.ar3.dca3.gblx.net [67.17.107.194]
                                0/ 100 =  0%   |
  8   50ms     0/ 100 =  0%     0/ 100 =  0%  hurricane-electric-llc-ashburn.tengigabitethernet4-4.ar3.dca3.gblx.net [64.214.121.170]
                                0/ 100 =  0%   |
  9   25ms     0/ 100 =  0%     0/ 100 =  0%  tserv2.ash1.he.net [216.66.22.2]

Trace complete.

broquea

Quote from: kasperd on January 03, 2014, 02:56:46 AM
Quote from: broquea on January 03, 2014, 01:05:32 AM
Odd loss happening right there, might be asymmetric and loss on the return path seeing as HE's LG uses Telia to get back to you.
That is a possible explanation. Are there some good tools to debug that sort of scenario? I was wondering if it would be possible to create a traceroute-like tool, which could work around that problem. But anything I could come up with would break due to ingress filtering.

tracepath can show you that at some point it is asym, but really only being able to source that reverse path trace would show that alternate path; which is why I went right to the LG. Also from my colo in FMT2 to ASH1 there wasn't any loss to the tserv when OP posted, so probably something on that return path wasn't happy. The tserv there used to get attacked a lot (by way of people going after tunnels sourced there or the machine itself), maybe that still happens?

kasperd

Quote from: broquea on January 03, 2014, 08:11:44 AMtracepath can show you that at some point it is asym
As far as I can tell all it does is to make an assumption about the initial TTL of time exceeded messages and print a message if the received TTL deviates from that assumption. That algorithm is going to have plenty of false positives and false negatives. It is a useful hint, but not much more than that. My first attempt at applying it showed me a route, which was asymmetrical in one direction but not in the other.

When doing a traceroute you rarely care about the path from intermediate routers to both endpoints. What you do care about is the route from each endpoint to the other. This is a lot easier to figure out on a symmetrical path, which is essentially the only reason a symmetrical path is desirable. On an asymmetrical path you care about the forward route at each hop, but not about the return path from those routers, which are only used for your packets in one direction.

The return path from some of those routers only comes into play when they need to send an ICMP packet back to you. And that's why sometimes that difference affects the output of traceroute more than whatever you were looking for in the first place. Probably there just does not exist any silver bullet, which is why piecing together a picture from looking on output from various LG like tools is often the most effective approach.

kasperd

This thread got me thinking about which methods could possibly be applied to be able to be able to get more accurate measurements on where in the network packet loss and latency increases happen. So far I came up with a simple approach, which could be used to measure packet loss and jitter on the forward and return paths independently, with only a minor addition on the server side.

The idea would be to add a new echo request ICMPv6 type, to which the server replies in a slightly different way.

Instead of sending just one reply, the server would send exactly three copies of the packet. The client can measure what percentage of requests get 0, 1, 2, and 3 replies. From those four percentages, it is then possible to construct a set of equations, which will compute the packet loss on forward and return path independently. Since this in itself would obviously be problematic due to possible amplification attacks, the server drops 75% of the received requests at random. That of course changes the packet drop ratio computed on the forward path, but the packet ratio computed for the return path would be unaffected.

Knowing the return path packet loss as well as using an ordinary ping to measure the roundtrip packet loss, it is possible to compute the forward packet loss as well.

Moreover the request contains a 16 byte reply address, such that it is possible to do triangular measurements should there be such a need. The server when handling the request use that address as destination address for the reply and in the reply data it replaces the 16 byte reply address with an 8 byte time stamp (which grows at a constant rate). This means the reply packet will always be 8 bytes shorter than the request.

The client knows at what time each request was send and at what time the reply was received. By computing correlation between those times and the time stamp from the reply packet it can see if the sending time or receiving time is most correlated with the time stamps, which will tell if any variation in latency happen on the forward path or the return path.

If anybody want to take a look at how it works, there is a proof of concept server on 2a01:4f8:d16:701:9c65:a13e:5a91:4dd, and there is an example packet trace on http://share.kasperd.net/ping.pcap

The client code is not yet complete. All it does is send a steady stream of requests, and does not yet do any calculations of roundtrip time or packet loss.

The ICMPv6 packet formats are as follows:

Code (Request) Select

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     Type      |     Code      |          Checksum             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           Identifier          |        Sequence Number        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +                                                               +
      |                                                               |
      +                          Reply Address                        +
      |                                                               |
      +                                                               +
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     Data ...
      +-+-+-+-+-


Code (Reply) Select

       0                   1                   2                   3
       0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     Type      |     Code      |          Checksum             |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |           Identifier          |        Sequence Number        |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |                                                               |
      +                       Time stamp                              +
      |                                                               |
      +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
      |     Data ...
      +-+-+-+-+-

kasperd

Quote from: kasperd on January 03, 2014, 03:18:19 PMThe client code is not yet complete.
Attached proof of concept client and server. I have still got a server running on 2a01:4f8:d16:701:9c65:a13e:5a91:4dd, which you can try to ping, if you want to test the client code. Does this look like a useful tool in its current shape?