The problem with the hypotesis than it's buffer bloat what causes this problem is that it fails to explain why 6in4 traffic is treated differently than the rest.
The delay happens so early in the trace, that it can more or less only be explained by buffering. If there is a measurable difference between the protocols, it most likely means that the device which is buffering uses multiple queues. So what you want to find out next may be, how many queues are there, what criteria decides the queue a packet goes into, and perhaps if buffer memory is allocated statically to each queue or reallocated dynamically between them.
I recently noticed on my own connection, that queues appeared to be split by IPv4 address. My protocol 41 traffic was using a different IPv4 address than my other traffic. And since I was mostly pushing 6in4 traffic, it was only IPv6 traffic that got slowed down while IPv4 traffic went through unaffected.
For instance, under the following test which intentionally puts more traffic than what the connection can handle, the IPv6 ping reports far worser statistics than the native IPv4 test, the Internet connection was otherwise unload (save for the occasional traffic from NTP, etc...). Note than the 2 command were run on parallel.
root@mario-laptop:/home/mario# ping6 -q -s 1400 -i 0.02 -c 400 2001:470:1f0e:a81::1 & ping -q -s 1400 -i 0.02 -c 400 184.105.253.10 &
[1] 5167
[2] 5168
root@mario-laptop:/home/mario# PING 184.105.253.10 (184.105.253.10) 1400(1428) bytes of data.
PING 2001:470:1f0e:a81::1(2001:470:1f0e:a81::1) 1400 data bytes
--- 184.105.253.10 ping statistics ---
400 packets transmitted, 290 received, 27% packet loss, time 9946ms
rtt min/avg/max/mdev = 120.460/263.358/522.778/91.864 ms, pipe 22
--- 2001:470:1f0e:a81::1 ping statistics ---
400 packets transmitted, 44 received, 89% packet loss, time 11033ms
rtt min/avg/max/mdev = 154.642/1583.399/6152.360/1307.172 ms, pipe 220
That does look strange. There is no obvious explanation for those numbers.
Also, it seems than sometimes running a BitTorrent client even with a marginal bandwidth usage (But several native (IPv4) TCP connections and UDP connection-like usage (µTP)) will make the IPv6 performance fall.
Possibly connection tracking plays a role in this picture, but this is just a random guess.
I can only conjecture as for the reason of this weird behavior but I suspect than my ISP may attempt to throttle down protcol-41 traffic when it notices congestion. However, as I said, when almost all traffic goes through then IPv6 tunnel then no performance degradation happens (Some ms more for ping, however, but no packet loss or anything like that. The tunnel don't seems to be the bottleneck either, the traffic takes up all the contracted bandwidth).
The traffic may be put into different priority bands. That would explain why one queue sees more latency and packet loss than another. The queues might simply not have the same priority. The queue which is used for your protocol 41 packets does exhibit symptoms of buffer bloat. And when there are multiple queues, it is entirely possible they are not all affected by buffer bloat.
For bulk traffic like bittorrent transfers, a bit of buffer bloat is not bad. As long as interactive traffic doesn't have to go through the same queue. So it may be the problem is misclassification of traffic rather than bloat per se.
If your protocol 41 traffic gets classified as bulk traffic while other customers' traffic gets classified as interactive and adversely affects your protocol 41 traffic, then you should complain loudly. OTOH, if the problem only kicks in once you reach the bandwidth limit specified in your subscription, and you can avoid problems by staying within that limit, then it would be more productive to stay within that limit. If that is the case, then my previous suggestion should still work just the same. By controlling where the bottleneck is and how traffic is prioritized at the bottleneck, you can get a better result. Once you have moved the bottleneck to where you can control it, it no longer matters how the previous bottleneck would treat traffic.
Do you know how may I tunnel the 6in4 tunnel itself through a OpenVPN tunnel so to avoid protocol-specific interference from my ISP?.
I do not have that much experience with VPNs. In principle what you ask for should be easy, in practice it may very well depend on the VPN implementation. If you have VPN and 6in4 tunnel both able to pass packets, I can try to come with suggestions on how you get traffic routed the way you want it. VPN and 6in4 could end up next to each other or with VPN inside 6in4 or 6in4 inside VPN. The routing table has a lot to say in, which of those will be the case.
I think doing this and then repeating the test may give further insights into the roots of this problem.
Sure, the information you could find by experimenting with 6in4 inside VPN could potentially tell us something more about the root of the problem.