• Welcome to Hurricane Electric's IPv6 Tunnel Broker Forums.

ns1.he.net. 216.218.130.2 issue? (or transport, or ???)

Started by MichaelPaoli, October 27, 2016, 12:50:14 AM

Previous topic - Next topic

MichaelPaoli

I'm seeing some asymmetric problematic behavior with ns1.he.net. [216.218.130.2] that I'm not seeing on the other he.net. DNS servers.
Not 100% sure if it's issue with server, or possibly some transport issue between server and client, but at least from same client,
definitely see the issue only on ns1.he.net. [216.218.130.2].

Here's set of examples where I'm seeing issue.  I bump the SOA serial number on digitalwitness.org., see logging of notifies and transfer(s).
I then check the SOA serial numbers - all matched (no exceptions):
$ (d=digitalwitness.org.; for NS in $(dig +nocdflag +short -t NS "$d"); do for IP in $(dig +nocdflag +short "$NS" A "$NS" AAAA); do echo "[$IP $NS]": $(dig +nocdflag +noall +answer @"$IP" digitalwitness.org. SOA); done; done) 2>&1 | fgrep -v 1477550116
$
I then look at the RRSIG record count (as some DNSSEC (pre-)checks have failed) ...:
$ (d=digitalwitness.org.; for NS in $(dig +nocdflag +short -t NS "$d"); do for IP in $(dig +nocdflag +short "$NS" A "$NS" AAAA); do echo $(for rrtype in RRSIG DNSKEY; do echo "$rrtype:"$(dig +nocdflag +noall +answer @"$IP" digitalwitness.org. "$rrtype" | wc -l); done) "[$IP $NS]"; done; done) | sort
RRSIG:0 DNSKEY:2 [216.218.130.2 ns1.he.net.]
RRSIG:9 DNSKEY:2 [198.144.194.235 ns0.digitalwitness.org.]
RRSIG:9 DNSKEY:2 [198.144.194.238 ns1.digitalwitness.org.]
RRSIG:9 DNSKEY:2 [2001:470:1f04:19e::2 ns1.digitalwitness.org.]
RRSIG:9 DNSKEY:2 [2001:470:200::2 ns2.he.net.]
RRSIG:9 DNSKEY:2 [2001:470:300::2 ns3.he.net.]
RRSIG:9 DNSKEY:2 [2001:470:400::2 ns4.he.net.]
RRSIG:9 DNSKEY:2 [2001:470:500::2 ns5.he.net.]
RRSIG:9 DNSKEY:2 [2001:470:66:76f::2 ns0.digitalwitness.org.]
RRSIG:9 DNSKEY:2 [216.218.131.2 ns2.he.net.]
RRSIG:9 DNSKEY:2 [216.218.132.2 ns3.he.net.]
RRSIG:9 DNSKEY:2 [216.66.1.2 ns4.he.net.]
RRSIG:9 DNSKEY:2 [216.66.80.18 ns5.he.net.]
$
Note the missing (0 count) on the first NS server results in that list.  (intermittently sometimes I see it come through as 1, or possibly 2)
When I start checking a bit closer with dig(1), etc., things get slightly stranger:
$ dig +norecurse @216.218.130.2 digitalwitness.org. RRSIG

; <<>> DiG 9.9.5-9+deb8u7-Debian <<>> +norecurse @216.218.130.2 digitalwitness.org. RRSIG
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOTIMP, id: 13089
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1680
;; QUESTION SECTION:
;digitalwitness.org.            IN      RRSIG

;; Query time: 14 msec
;; SERVER: 216.218.130.2#53(216.218.130.2)
;; WHEN: Thu Oct 27 00:25:47 PDT 2016
;; MSG SIZE  rcvd: 47

$
Sometimes it will show show "Truncated, retrying in TCP mode" and then successfully retrieve all (or almost all?) of the RRSIG records.
If I try explicitly with TCP, results also vary - sometimes I get all or nearly all the records, ... sometimes I get very much nothing:
$ dig +noall +norecurse +tcp +answer @216.218.130.2 digitalwitness.org. RRSIG
$
(no results at all returned)
And then retrying (seems intermittent), I'll get all (or nearly all) of the records:
$ dig +noall +norecurse +tcp +answer @216.218.130.2 digitalwitness.org. RRSIG
[very long listing of all or most all the RRSIG records]
Using tracepath shows pmtu of 1500 and no particular issues:
$ tracepath -n -p 53 216.218.130.2
1?: [LOCALHOST]                                         pmtu 1500
1:  198.144.194.233                                     116.810ms
1:  198.144.194.233                                     117.157ms
2:  216.131.94.209                                       50.632ms asymm  3
3:  173.205.60.193                                       45.465ms asymm  4
4:  89.149.129.137                                       46.115ms asymm  5
5:  173.205.54.170                                       46.908ms asymm  9
6:  80.239.167.174                                       45.395ms
7:  184.105.213.65                                       54.517ms
8:  216.218.130.2                                        50.487ms reached
     Resume: pmtu 1500 hops 8 back 8
$
And all appears fine - and consistently fine - when I check against the other he.net NS servers.

So ... where from here?  Are folks seeing similar asymmetries and/or intermittent issues with ns1.he.net. 216.218.130.2 ?

snarked

Note that the dns.he.net web page indicates that DNSSEC implementation is being explored.  Therefore, there is no guarantee that DNSSEC is working with HE's servers.  I don't see any part of your complaint dealing with a non-DNSSEC DNS issue.