It emits fragments in order. The code to both send and receive them is quite clean, in my opinion.
Sending in order is a bit simpler than sending in reverse order. Code for receiving needs to be prepared to receive fragments in any order. So for simplicity of the code, sending in order is preferable.
But you may want to optimize the reassembly for the case where packets are not reordered by the network. Less consumption of CPU and/or RAM in the reassembly code may be desired. So you could decide to do two alternate code paths. A fast path, which is used as long as fragments are received in reverse order, and a slow path, which is used when fragments are received out of order. This would require more code on the receiving end, but with a performance improvement in the common case.
Something similar is found in TCP options parsing. Linux has optimized code to handle the case where the TCP options are exactly 12 bytes long, and the first four bytes are exactly 0x0101080A. This doesn't simplify the code, because it still need code to handle arbitrary ordering of options. But the code will be faster on the large number of packets with that exact sequence of bytes. This optimization is even recommended in RFC 1323.
It doesn't occur every time for me, perhaps for 1 in 10 queries.
It depends on the PMTU information being cached on the sender. If the cache has no PMTU information for your IP the first fragment will be 1500 bytes. That fragment is bounced by the tunnel server. At that point your IP address will be put in the cache with a PMTU of 1480 bytes. But rather than retransmitting the packet on receipt of the ICMPv6 error, it sends an invalid response.
From that point on, it will work until the PMTU cache entry expires.
The FreeBSD/BIND combo I used emitted 1280 byte maximum length UDP.
That actually sounds like a very sensible default behaviour. I think more systems should do that.
The result is still one packet and one fragment, just of slightly different sizes.
I assume you mean one UDP packet fragmented in two IPv6 fragments.
I also pointed out to the Netalyzr folks it would be very cool if they could devise a test to send the fragments both in order and out of order, and see if it makes any difference. While it doesn't in this case, I suspect with some firewalls and NAT implementations it will.
It's quite likely, it will make a difference in some cases.