• Welcome to Hurricane Electric's IPv6 Tunnel Broker Forums.

Yahoo/l.yimg.com failing to load

Started by hawk82, July 16, 2013, 06:22:15 PM

Previous topic - Next topic

hawk82

I have a pfsense firewall with a tunnelbroker IPv6 tunnel going to the ashburn endpoint. In the past few weeks, trying to load any Yahoo page that has l.yimg.com on it, fails to load. No text or images. Browser just sits there spinning its wheel. Refreshing the page doesn't help. Multiple browsers, same issue. Very similar issue to the Facebook problem there a few weeks ago, as noted in the forum here.


tracert l.yimg.com

Tracing route to ds-fo-anyycs-l.ay1.b.yahoodns.net [2001:4998:4:1::2001]
over a maximum of 30 hops:

 1    <1 ms    <1 ms    <1 ms  2001:470:8:xxxx::1
 2    42 ms    59 ms    42 ms  xxxxxxxxxxx.tunnel.tserv13.ash1.ipv6.he.net [2001:470:7:xxxx::1]
 3    39 ms    41 ms    47 ms  gige-g4-12.core1.ash1.he.net [2001:470:0:90::1]
 4    80 ms    80 ms    80 ms  eqnx.pat1.dc6.yahoo.com [2001:504:0:2:0:1:310:1]
 5    87 ms    81 ms    79 ms  2001:4998:4:1f9::1
 6    78 ms    82 ms    80 ms  l3.ycs.vip.dcb.yahoo.com [2001:4998:4:1::2001]



tracert www.yahoo.com

Tracing route to ds-any-fp3-real.wa1.b.yahoo.com [2001:4998:f00b:1fe::3001]
over a maximum of 30 hops:

 1    <1 ms    <1 ms    <1 ms  2001:470:8:xxxxxxxxx::1
 2    43 ms    49 ms    43 ms  xxxxxxxxxxxxxxx.tunnel.tserv13.ash1.ipv6.he.net [2001:470:7:xxxxxxxxxxx::1]
 3    42 ms    47 ms    52 ms  gige-g4-12.core1.ash1.he.net [2001:470:0:90::1]
 4    79 ms    80 ms    86 ms  eqnx.pat1.dc6.yahoo.com [2001:504:0:2:0:1:310:1]
 5    93 ms   220 ms    93 ms  2001:4998:f003:13::1
 6   106 ms    90 ms    97 ms  r2.ycpi.vip.nyc.yahoo.net [2001:4998:f00b:1fe::3001]


Also seems to happen on another PC on my home network. It loads yahoo.com almost all the way but gets stuck on l6.yimg.com.

I will try to get a wireshark capture and post back. Anyone else having this problem?

kasperd

Quote from: hawk82 on July 16, 2013, 06:22:15 PMAnyone else having this problem?
There is clearly an MTU issue in communication between HE and 2001:4998:4:1::2001. I could reproduce it from a different tunnel server with the following$ telnet 2001:4998:4:1::2001 80
Trying 2001:4998:4:1::2001...
Connected to 2001:4998:4:1::2001.
Escape character is '^]'.
GET /dh/ap/default/130215/y_200_a.png HTTP/1.0
Host: l.yimg.com

After typing the empty line, I do not see a response. Looking at network traffic I saw a TCP segment with relative sequence numbers 2857:2952. The previous 2856 bytes never arrived. That number of bytes exactly matches two segments of the size negotiated with MSS options. Using a lower MTU or just reducing the MSS eliminated the problem.

For me l.yimg.com resolves to a different IP, which is not affected.

hawk82

Okay. MSS clamping isn't supported on pfSense yet it appears.
http://redmine.pfsense.org/issues/2129

I tried playing around with the MTU settings on the tunnel options page in pfSense but that didn't seem to help. Tried 1492 and 1480. I didn't try to reboot my pfSense box though. I'll try that later tonight.

kasperd

Quote from: hawk82 on July 18, 2013, 06:15:23 AMI tried playing around with the MTU settings on the tunnel options page in pfSense but that didn't seem to help.
I can explain why that is never going to solve the problem. First of all MSS is by default computed based on the MTU of the very first link in each direction. Since the tunnel is not the first hop in either direction, it will not affect the MSS.

Packets send from your end will presumably go over a native IPv6 link for the first hop from computer to firewall/router/tunnel endpoint. Then the second link on the path is the tunnel. Packets in the other direction will have to go through lots of hops before reaching the tunnel.

When MSS results in a too large segment size, as it does in your case, then PMTU must take over. There are multiple reasons why the setting you are trying to adjust won't help there either. First of all it won't help because the problem shows up before the packets even make it to your tunnel. Packets are being sent from y.img.com with a total length of 1500 bytes. At some point on the path before it goes through the tunnel, it hits a link with an MTU smaller than 1500 bytes. At that point an ICMPv6 too big error is supposed to be send back to the server.

Whatever happens, that ICMPv6 message never makes it back to the server. It may be that the router never sends the ICMPv6 error in the first case. In the facebook case, that you mentioned, that was apparently what happened. It looked like facebook didn't cache MTU sizes, and thus were sending too many large packets to that router, which was rate limiting the ICMPv6 errors. It may also be that there is a misconfigured filter somewhere, which drops ICMPv6 packets before reaching the server. Such a filter might be on the router or the server, but it could also be anywhere between them.

We may be able to figure out which link on the path has a smaller MTU, and we may also be able to find out if the problem is caused by that router not sending an ICMPv6 error or by a filter somewhere else drops it. But in either case, knowing this probably won't help us solve the problem.

The second reason your MTU setting wouldn't help is that you are adjusting the wrong end of the tunnel. There is an MTU setting on each end of the tunnel responsible for packets entering from that end of the tunnel. (There might be broken implementations which also enforce the MTU setting on packets, they receive through the tunnel, which is not solving any problems, but potentially introducing new problems if the MTU settings on the two ends of the tunnel are different.)

The MTU setting on the tunnel is there to avoid relying on PMTU discovery on the IPv4 path between the endpoints of the tunnel. But since the MTU problem you are racing isn't on the IPv4 path, tweaking the MTU settings on the tunnel is not going to help.

I believe the most reliable solution is MSS clamping. Since I implemented MSS clamping in my gateway, that has significantly reduced the number of problems I have experienced. (I usually have to tweak my settings to reproduce MTU problems others are reporting.)

The reason MSS clamping works where the MTU setting doesn't is because though the MSS options only depend on the first link in each direction, the packets are still going through the tunnel. At that point either endpoint can clamp the MSS, which simply means they modify the TCP SYN packets as they are in transit. The receiving end see a different MSS value from what was sent, and it works out just fine. Each endpoint will think they themselves support 1500 byte packets, but they have to use smaller packets due to the other end not supporting such large packets.

MSS clamping is a workaround, it is not supposed to be needed. But broken setups with buggy MTU handling are widespread, and MSS clamping is a very efficient workaround.

If your tunnel endpoint does not support MSS clamping, you do have other options. I think you can include an MTU option in the router advertisements send on the LAN. If pfSense supports that, then you will be reducing the MTU of the first link on the path, which will then be used to compute a different MSS. If pfSense cannot do that either, I am running out of ideas. You can still configure a lower MTU on each device on the LAN, but that is not desirable.

It might be possible for HE to implement MSS clamping on the tunnel servers. It would be neat if you could configure the tunnel with say an MTU of 1480 bytes and MSS clamping to 1220 bytes. But of course users shouldn't be forced to have MSS clamping on the tunnel server side. There are reasons why some users may want to rely on PMTU discovery or to just do the MSS clamping themselves on their own end of the tunnel.

passport123

#4
>> I have a pfsense firewall with a tunnelbroker IPv6 tunnel going to the ashburn endpoint. In the past few weeks, trying to load any Yahoo page that has l.yimg.com on it, fails to load.

I ran into this same problem when I was using OpenBSD as my firewall / router.  I'm not saying this is a pfsense or an OpenBSD issue, I think yahoo has some issues in the DNS for that particular server.

I resolved the issue by putting the following two lines in the hosts file of the box that ran my http proxy:

# prevent yahoo price quotes hang
216.115.104.242      l.yimg.com


Not the most elegant solution out there, but now at least I can access yahoo quotes. ::)




(btw, I know that the problem documented above avoids DNS, but when I was having the problem, yahoo's DNS for l.yimg.com had CNAMEs three levels deep, and about 75% of the time the second level CNAME did not resolve.  That is the problem I wanted to solve and why I did the solution I did, I just bypassed yahoo's DNS for l.yimg.com with an IP address that I knew worked.)

kasperd

Quote from: passport123 on July 18, 2013, 12:51:22 PMwhen I was having the problem, yahoo's DNS for l.yimg.com had CNAMEs three levels deep, and about 75% of the time the second level CNAME did not resolve.
Then that is a completely different problem from the MTU issue I saw.

What you saw probably wasn't an IPv6 related problem, since it appears Yahoo still don't support DNS over IPv6. Also the last level of CNAME shouldn't have any impact since the AAAA records are send along with the CNAME record, so no extra roundtrips are needed.

Did you see lookup failures when trying to look up an already cached record, or only when the previously cached record had expired?

passport123

Yes, it did look like a different problem, that's why I edited my comment to give more details.  I didn't want to mislead anyone.

Yahoo may not support DNS over IPv6, but they do give out IPv6 addresses for their hosts (once you wind you way through the layers of CNAMEs).

When I mentioned that the second level CNAME did not resolve, I was traversing the DNS records manually, one level at a time, trying to see where a problem might be.   

Regarding your question about caching - I don't know.    When the CNAME refused to resolve I just moved on, instead of digging into my DNS server's cache to see what was there.  When I saw the CNAME not resolving over the period of a few days (probably close to two weeks), that's when I decided to put the entry into the hosts file, and get on with my life.....    :)

Jim Whitby

Quote from: passport123 on July 19, 2013, 11:35:46 AM
Yes, it did look like a different problem, that's why I edited my comment to give more details.  I didn't want to mislead anyone.

Yahoo may not support DNS over IPv6, but they do give out IPv6 addresses for their hosts (once you wind you way through the layers of CNAMEs).

When I mentioned that the second level CNAME did not resolve, I was traversing the DNS records manually, one level at a time, trying to see where a problem might be.   

Regarding your question about caching - I don't know.    When the CNAME refused to resolve I just moved on, instead of digging into my DNS server's cache to see what was there.  When I saw the CNAME not resolving over the period of a few days (probably close to two weeks), that's when I decided to put the entry into the hosts file, and get on with my life.....    :)

Not to be buttin in where I don't know whats *really* going on, but...
CNAME trasversing seems to be simple enough.

As to yahoo DNS.  I did a dig on l.yimg.com and got this:

; <<>> DiG 9.9.2-P2 <<>> l.yimg.com aaaa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10720
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 2, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;l.yimg.com.                    IN      AAAA

;; ANSWER SECTION:
l.yimg.com.             7187    IN      CNAME   fd-geoycs-l.gy1.b.yahoodns.net.
fd-geoycs-l.gy1.b.yahoodns.net. 287 IN  CNAME   ds-fo-anyycs-l.ay1.b.yahoodns.net.
ds-fo-anyycs-l.ay1.b.yahoodns.net. 300 IN AAAA  2001:4998:f00b:1fb::c:1103
ds-fo-anyycs-l.ay1.b.yahoodns.net. 300 IN AAAA  2001:4998:f00b:1fb::c:1101
ds-fo-anyycs-l.ay1.b.yahoodns.net. 300 IN AAAA  2001:4998:4:1::2001
ds-fo-anyycs-l.ay1.b.yahoodns.net. 300 IN AAAA  2001:4998:4:1::2000

;; AUTHORITY SECTION:
ay1.b.yahoodns.net.     98938   IN      NS      yf1.yahoo.com.
ay1.b.yahoodns.net.     98938   IN      NS      yf2.yahoo.com.

;; Query time: 57 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Jul 20 18:03:42 2013
;; MSG SIZE  rcvd: 270

Dig on ipv4 shows the same, except for the ipv4 addresses.
All CNAMES were resolved.

From a  functional point of view.... l.yimg.com seems to be either an ad site or image site for ads on yahoo.

What am I not understanding?

passport123

>> What am I not understanding?

The time that elapsed between the time that I did the CNAME traversal (weeks ago) and now.

Perhaps the problem I saw has been resolved in the interim time period.

dfc

I have experienced this from time to time as well. I usually encounter the problem when I am visiting a flickr.com URL. The issue is sporadic and I can usually fix it by hitting reload enough times. I just browsed around  flcikr and got one of the timeouts with this URL:

http://l.yimg.com/zz/combo?kx/yucs/uh3/uh/js/1/uh-min.js&kx/yucs/uh3/uh/js/607/menu_utils_v3-min.js&kx/yucs/uh3/uh3_top_bar/js/242/top_bar_v3-min.js

When I use wget -6 no matter how many times I try I can not get the timeout to occur. I am curious if there is something in the request headers thatis triggering the problem. I will try and do some more digging tonight and post pcaps if I can.



kasperd

Quote from: dfc on July 31, 2013, 05:05:02 PMI am curious if there is something in the request headers thatis triggering the problem.
That is possible. For example there could be an MTU issue in the other direction.

It could be that your browser is sending a request with some cookies that push the request size above the PMTU. If PMTU discovery is broken your computer will just keep retransmitting that TCP packet and never get an acknowledgement. If you request the exact same URL with wget, there might be no cookies, and the request may be smaller than the PMTU.

There are many other ways differences in the request could affect the outcome. A packet dump of each case would be needed to point out exactly why one is working and the other is not.