• Welcome to Hurricane Electric's IPv6 Tunnel Broker Forums.

no IPv4 negative caching for ALIAS records pointing at IPv6-only targets

Started by benaryorg, November 16, 2023, 09:07:16 PM

Previous topic - Next topic

benaryorg

The title kinda points out the issue, but it's a little complicated so let me explain a hypothetical setup (using dns.he.net):

; several records for servers' FQDNs:
alpha.fqdn.example.com  3600    IN      A       192.0.2.1
bravo.fqdn.example.com  3600    IN      A       192.0.2.2
; except charlie, charlie doesn't have IPv4, good for charlie!
;charl.fqdn.example.com  3600    IN      A
alpha.fqdn.example.com  3600    IN      AAAA    2001:db8::1
bravo.fqdn.example.com  3600    IN      AAAA    2001:db8::2
charl.fqdn.example.com  3600    IN      AAAA    2001:db8::3

; now the records for prod
www.example.com  3600    IN      ALIAS   alpha.fqdn.example.com
www.example.com  3600    IN      ALIAS   bravo.fqdn.example.com
www.example.com  3600    IN      ALIAS   charl.fqdn.example.com
www.example.com  3600    IN      MX      ; imagine an MX entry here

; for illustrative purposes below:
www4.example.com  3600    IN      ALIAS   alpha.fqdn.example.com
www4.example.com  3600    IN      ALIAS   bravo.fqdn.example.com

; this is the problem though:
www6.example.com  3600    IN      ALIAS   charl.fqdn.example.com



A small disclaimer as to why this structure; over 90% of my entire infrastructure are IPv6 only, very few servers *actually* have IPv4, which means virtually every workload has its own IPv6 address, directly as an FQDN.
Then the actual domains in production just link over to the FQDN entries.
However in some instances, like any domain that hosts a website and a receiving mailserver I can't use a CNAME and I'm very grateful for the ALIAS records, I depend on those.



Okay now that that's out of the way, the tech stuff regarding the issue.

When firing dig alpha.fqdn.example.com A everything will be fine, you'll get your record and that's that. When firing dig charl.fqdn.example.com AAAA similarly everything will be fine. Firing dig www.example.com AAAA all is well too, as is dig www4.example.com A. I am not sure about dig www.example.com A actually, I haven't tested that, but I think it'll be fine too.
Now the problem is dig www6.example.com A vs. dig charl.fqdn.example.com A.
The second one is easy; there is no A record so dns.he.net returns the SOA record with a TTL for negative caching. Good.
The first one however will return nothing at all which seems to throw off my (more or less default) unbound.

Since my setup uses DNS64 I setup a forwarder chain to have both local caching and a recursor with IPv4, otherwise some TLDs won't resolve very well, so I have the device I'm using which asks the local forwarder (unbound) which asks the remote recursor (unbound) which queries the HE servers, and that whole deal really exacerbates the lack of negative caching.
Since glibc does resolve both AAAA and A on every request, as some RFC recommends if memory serves, the entire deal hangs for about 2 seconds while it's trying to resolve the A record of an IPv6 only domain (like www6.example.com in the example). Meanwhile if I instruct curl to only use v6 it works instantly, and using the FQDN directly also works instantly, both thanks to caching.

I am honestly not too deep into DNS, so I don't know whether the SOA should be returned in this instance or not, in any case the effect is measurable; my local S3 storage only has one radosgw instance so I could just replace the ALIAS record with a CNAME and the 2s time to first byte were gone right away.



So my questions here are;

- is this intended behaviour?
- does anyone know any fancy tricks to get unbound to cache this, or otherwise speed that up? (systemd-resolved seems happier with it, so I imagine there are ways to tweak this)
- any other ideas/solutions that would work with both the example and the disclaimer?



If you need any real records for testing: dig benary.org A (ALIAS) is the one returning no SOA, dig bgp.cloud.bsocat.net A on the other hand does, both also return AAAA if queried.

snarked

ALIAS is a non-standard RRtype which is not universally supported.  If what you are attempting to do is load balancing, SRV and NAPTR records can do that.  The only problem is that web clients typically don't fetch such records.

For what definition of ALIAS that exists, it is supposed to "take the place" of a CNAME at a zone's apex where CNAME is not permitted.  Therefore, it should be a "Singleton" - so I don't see how multiple ALIAS records can exist for the same label, nor exist at other than a zone apex.

Another way to do load-balancing is to have multiple address records per label, and rely on DNS randomization as to the order the addresses are presented.  However, there is no control over how often a particular record will be selected, thus it may tend to prefer the quickest responders over time.

benaryorg

Quote from: snarked on November 19, 2023, 08:52:56 AMALIAS is a non-standard RRtype which is not universally supported.  If what you are attempting to do is load balancing, SRV and NAPTR records can do that.  The only problem is that web clients typically don't fetch such records.

For what definition of ALIAS that exists, it is supposed to "take the place" of a CNAME at a zone's apex where CNAME is not permitted.  Therefore, it should be a "Singleton" - so I don't see how multiple ALIAS records can exist for the same label, nor exist at other than a zone apex.

Another way to do load-balancing is to have multiple address records per label, and rely on DNS randomization as to the order the addresses are presented.  However, there is no control over how often a particular record will be selected, thus it may tend to prefer the quickest responders over time.

To clarify, I do not care about load-balancing at all. I care about fault tolerance, which means the result of any query to such a record should be multiple AAAA records, as is the case, as this allows clients to failover or use other application level mechanisms to use the resulting list of addresses in a meaningful way.

The problem at hand, however, is entirely unrelated to having multiple ALIAS records at all, in fact. The problem arises with a single ALIAS record pointed at a single domain serving a single AAAA record, resulting in no SOA being presented when querying for an A record instead, while the original domain when queried returns exactly that SOA.

Take the following setup:

alpha.fqdn.example.com  3600    IN      AAAA    2001:db8::1
example.org             3600    IN      ALIAS   alpha.fqdn.example.com

A request for an A record to alpha.fqdn.example.com will yield a different result than a query for an A record against example.org.