• Welcome to Hurricane Electric's IPv6 Tunnel Broker Forums.

Facebook failing... why?

Started by Buroa, June 03, 2013, 05:27:05 AM

Previous topic - Next topic

kasperd

Quote from: kasperd on June 07, 2013, 12:54:35 PMBut it is worth trying a lower MSS setting to see if that affects the connectivity.
With an MSS setting of 1220 it worked every time so far. With an MSS setting of 1208 it fails most of the time. What I am seeing is definitely an MTU issue. Where exactly the MTU issue is located, is not clear yet, but it appears something involved in the communication cannot deal with packets above 1280 bytes.

cconn

Quote from: kasperd on June 07, 2013, 01:03:53 PM
Quote from: kasperd on June 07, 2013, 12:54:35 PMBut it is worth trying a lower MSS setting to see if that affects the connectivity.
With an MSS setting of 1220 it worked every time so far. With an MSS setting of 1208 it fails most of the time. What I am seeing is definitely an MTU issue. Where exactly the MTU issue is located, is not clear yet, but it appears something involved in the communication cannot deal with packets above 1280 bytes.

is not the minimum MTU on IPv6 supposed to be 1280 for compliancy?

kasperd

Quote from: kasperd on June 07, 2013, 01:03:53 PMWith an MSS setting of 1220 it worked every time so far. With an MSS setting of 1228 it fails most of the time. What I am seeing is definitely an MTU issue. Where exactly the MTU issue is located, is not clear yet, but it appears something involved in the communication cannot deal with packets above 1280 bytes.
I managed to grab a packet trace of both one of those few cases where it worked as well as one where it did not work. The first  packets in both cases looked the same.

C->S: SYN
S->C: SYN+ACK
C->S: ACK
C->S: Client hello
S->C: ACK of client hello
S->C: Last packet of server hello
C->S: ACK of SYN+ACK packet

The last packet where the client ACKs the SYN+ACK packet again signals to the server that the client has received a packet, but there was a gap in between, so the last received packet cannot be acknowledged yet. Presumably the server retransmits the packet with the data from the gap, which is lost again. At this point the connection stalls.

In the one case where it did work the repeated ACK of the SYN+ACK packet was followed by more packets from the server. The server would send the first part of the server hello again, this time segmented into two segments. With the MSS setting I tested (1228) the server would send one segment with a total length of 1280 bytes and another segment with the last 8 bytes of payload.

The extra ACK indicating the loss of a packet is send in both cases, so presumably facebook does realize a packet has been lost and retransmits it. But sometimes it does not arrive. I guess that is because the retransmitted packet is the same size, as before, which is still too large. Most likely that is because the server did not receive an ICMPv6 message indicating the packet was too big.

The reason facebook does not receive that ICMPv6 message may very well be, that the router with the 1280 byte link MTU is rate limiting the ICMPv6 messages. So the number of successful connections is now capped by the ICMPv6 rate limit on that router. Possibly facebook is sending way too many packets above 1280 bytes.

I can see three ways facebook could improve the situation.

  • They can start caching the discovered PMTU more aggressively, to limit the number of too big packets being sent. That way they wont be hitting that rate limit all the time.
  • They can tweak the TCP stack to behave a bit more intelligent in case of packet loss. Instead of retransmitting packets at the same size, try to split the lost packet in halves, even if no ICMPv6 error was received. Only retransmit packets with a size above 1280 bytes, if previous ACKs have confirmed the PMTU is large enough for the larger size.
  • They can lower the MSS value on their own end to 1220.

The PMTU problem I noticed is definitely real. But that by no means proves that the problem other people are experiencing is an MTU issue as well.

kasperd

Quote from: cconn on June 07, 2013, 02:17:44 PM
Quote from: kasperd on June 07, 2013, 01:03:53 PM
Quote from: kasperd on June 07, 2013, 12:54:35 PMBut it is worth trying a lower MSS setting to see if that affects the connectivity.
With an MSS setting of 1220 it worked every time so far. With an MSS setting of 1208 it fails most of the time. What I am seeing is definitely an MTU issue. Where exactly the MTU issue is located, is not clear yet, but it appears something involved in the communication cannot deal with packets above 1280 bytes.
is not the minimum MTU on IPv6 supposed to be 1280 for compliancy?
The 1208 in my post was a typo. It should have read 1228. With 60 bytes used for IPv6 and TCP headers those numbers correspond to MTU of 1280 and 1288 respectively.

Routers are not required to forward packets larger than 1280 bytes. So not forwarding the 1288 byte packets is standards compliant. But in that case an ICMPv6 error has to be sent back to the sender of the packet, and that sender has to perform PMTU discovery and retransmit as smaller packets.

For some connections, the retransmit as smaller packets never happen. I guess rate limiting of the ICMPv6 packets combined with lack of caching of PMTU results is to blame.

passport123

It appears that this is widespread:
http://mailman.nanog.org/pipermail/nanog/2013-June/058805.html


Facebook broken over v6?


On 6/7/13 12:21 PM, Jeroen Massar wrote:
> On 2013-06-07 09:12, Aaron Hughes wrote:
>>
>> Anyone else getting connection hangs and closes to Facebook?
>
> Yes, and from a lot of vantage points, thus it is not your local network
> that is at fault, seems that there are some IP addresses which are being
> returned by some Facebook DNS servers that are actually not properly
> provisioned and thus do not respond.
>
> More to it at:
http://www.sixxs.net/forum/?msg=general-9511818
>
> I've informed the Facebook peoples (and bcc'd them on this email), and
> apparently they are digging into it from the response I've received.
>
> Note that using www.v6.facebook.com apparently works fine as that IP is
> not affected and is not geo-balanced or something like that thus is
> always (afaik) the same. Thus if you like sharing your life and
> everything you do, that is the thing to use.

It's affecting anyone running dual stack, as the server responds, hangs, times
out and then it tries again on v6.  At least in the latest FF and Safari
browsers, I've not tried chrome.

I've cc'd this over to Nanog, as I've not seen anything about it there, and
I'm sure others are seeing it.

www.v6.facebook.com works fine as a workaround for the time being.


passport123


passport123

From the nanog thread:

http://mailman.nanog.org/pipermail/nanog/2013-June/058819.html

Doug Porter dsp at fb.com
Sat Jun 8 20:29:43 UTC 2013

We're actively investigating the v6 issues.  We need more data
though.  If you're experiencing problems, please email me a
tcpdump/pcap or any other debug data you think will help.

Thanks,
--
dsp


PigLover

I updated my tunnel to use MTU 1480 and things got better, but not right.  Instead of hanging completely some facebook activities just don't finish correctly (e.g., it will display a user's FB home page, but if you scroll to the bottom it won't grow into older messages).  Its very odd.

I've actually had to disable v6 router advertisement on the subnet most of family uses.  With only V4 FB works perfectly.

Will be glad when they get it fixed so I can turn it back on.

PaulosV

Actually, I'm not getting the hangups anymore, and the Wireshark does not report anything unusual. It seems that they have resolved the issue - at last.

Will wait a few more days before jumping high though. The odds of this happening twice are slim, but certainly not non-existent.

kasperd

Quote from: PigLover on June 09, 2013, 04:00:48 PMI updated my tunnel to use MTU 1480 and things got better, but not right.
Have you tried reducing MSS to 1220 on all packets passing through your tunnel endpoint?

passport123

Quote from: PigLover on June 09, 2013, 04:00:48 PM
...Will be glad when they get it fixed so I can turn it back on.

I'm not seeing any IPv6 issues anymore.  Hopefully the problem has been fixed.