• Welcome to Hurricane Electric's IPv6 Tunnel Broker Forums.

(crosspost) Problem with quagga BGP and Hurricane Electric

Started by lkenter, August 27, 2009, 01:07:10 PM

Previous topic - Next topic

lkenter

Hello,

I'm trying to setup a BGP session with HE but I cannot seem to get it to work. I've contacted support and but we cannot find what is causing this problem so maybe somebody in the forums has a few pointers.

I've installed Debian with quagga  0.99.10-1lenny1 from the repository (and have also compiled the latest 0.99.14 version, but this gave the same problem) and setup my HE tunnel so I can ping6 the whole world and the other way around. My quagga config looks as follows:

log file /var/log/quagga/bgpd.log
!
debug bgp events
debug bgp updates
debug bgp fsm
!
router bgp 35383
bgp router-id 195.X.X.X
no bgp default ipv4-unicast
neighbor 2001:470:15:80::1 remote-as 6939
neighbor 2001:470:15:80::1 passive
!
address-family ipv6
network 2001:67c:18::/48
neighbor 2001:470:15:80::1 activate
exit-address-family
!
line vty
!
end

When I start quagga I can see the following connection when connected to the BGP daemon:

show bgp summary

Neighbor                     V    AS  MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
2001:470:15:80::1      4  6939       0       2             0          0    0      never       OpenConfirm


And my bgpd.log shows the following:

2009/08/26 02:15:44 BGP: [Event] BGP connection from host 2001:470:15:80::1
2009/08/26 02:15:44 BGP: [Event] Make dummy peer structure until read Open packet
2009/08/26 02:15:44 BGP: 2001:470:15:80::1 [FSM] TCP_connection_open (Active->OpenSent)
2009/08/26 02:15:44 BGP: 2001:470:15:80::1 [Event] Transfer temporary BGP peer to existing one
2009/08/26 02:15:44 BGP: 2001:470:15:80::1 [Event] Accepting BGP peer delete
2009/08/26 02:15:44 BGP: 2001:470:15:80::1 [FSM] Receive_OPEN_message (OpenSent->OpenConfirm)
2009/08/26 02:15:56 BGP: Import timer expired.
2009/08/26 02:16:11 BGP: Performing BGP general scanning
2009/08/26 02:16:11 BGP: scanning IPv4 Unicast routing tables
2009/08/26 02:16:11 BGP: scanning IPv6 Unicast routing tables
2009/08/26 02:16:11 BGP: Import timer expired.
2009/08/26 02:16:26 BGP: Import timer expired.
2009/08/26 02:16:41 BGP: Import timer expired.
2009/08/26 02:16:44 BGP: 2001:470:15:80::1 [FSM] Timer (keepalive timer expire)
2009/08/26 02:16:44 BGP: 2001:470:15:80::1 [FSM] bgp_ignore called
2009/08/26 02:16:56 BGP: Import timer expired.
2009/08/26 02:17:11 BGP: Performing BGP general scanning
2009/08/26 02:17:11 BGP: scanning IPv4 Unicast routing tables
2009/08/26 02:17:11 BGP: scanning IPv6 Unicast routing tables
2009/08/26 02:17:11 BGP: Import timer expired.
2009/08/26 02:17:26 BGP: Import timer expired.
2009/08/26 02:17:41 BGP: Import timer expired.
2009/08/26 02:17:44 BGP: 2001:470:15:80::1 [FSM] Timer (keepalive timer expire)
2009/08/26 02:17:44 BGP: 2001:470:15:80::1 [FSM] bgp_ignore called
2009/08/26 02:17:56 BGP: Import timer expired.
2009/08/26 02:18:11 BGP: Performing BGP general scanning
2009/08/26 02:18:11 BGP: scanning IPv4 Unicast routing tables
2009/08/26 02:18:11 BGP: scanning IPv6 Unicast routing tables
2009/08/26 02:18:11 BGP: Import timer expired.
2009/08/26 02:18:26 BGP: Import timer expired.
2009/08/26 02:18:41 BGP: Import timer expired.
2009/08/26 02:18:44 BGP: 2001:470:15:80::1 [FSM] Timer (holdtime timer expire)
2009/08/26 02:18:44 BGP: 2001:470:15:80::1 [FSM] Hold_Timer_expired (OpenConfirm->Idle)
2009/08/26 02:18:44 BGP: 2001:470:15:80::1 [FSM] Hold timer expire
2009/08/26 02:18:56 BGP: Import timer expired.
2009/08/26 02:18:59 BGP: 2001:470:15:80::1 [FSM] Timer (start timer expire).
2009/08/26 02:18:59 BGP: 2001:470:15:80::1 [FSM] BGP_Start (Idle->Connect)
2009/08/26 02:18:59 BGP: 2001:470:15:80::1 [FSM] TCP_connection_open_failed (Connect->Active)


I've setup another BGP peer on a remote location and with this setup I could get a session up and runnning. I did notice that I cannot connect to port 179 at the HE router. This is a normal thing according to the HE support, but I stil think it might be the reason my session isn't coming up.

to get arround this I added the neighbor passive option. But this didn't help

While preparing this email I just noticed that when I shutdown quagga I still seem to send Reset packages:

02:42:51.437031 IP6 2001:470:15:80::1.45211 > 2001:470:15:80::2.179: S 3110697190:3110697190(0) win 16384 <mss 1420>
02:42:51.437170 IP6 2001:470:15:80::2.179 > 2001:470:15:80::1.45211: R 0:0(0) ack 3110697191 win 0


but....

ipv6-gateway:~# netstat -6na
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State     
tcp6       0      0 :::80                   :::*                    LISTEN     
tcp6       0      0 :::53                   :::*                    LISTEN     
tcp6       0      0 :::22                   :::*                    LISTEN     
tcp6       0      0 :::25                   :::*                    LISTEN     
tcp6       0      0 ::1:953               :::*                    LISTEN     
udp6      0      0 :::53                   :::*                               
raw6      0      0 :::58                   :::*                    7       



I have nothing listening on port 179 so how can this happen?!

I hope someone can help me out with this. or send me the details of there quagga setup.

Regards,
Lucas


an_ipv6_user

It's a bug in Quagga on any sort of Linux.  Google for details.

broquea

They also just came out with 0.99.15 (or v1.0RC2) with some fixes for BGPd. Might want to give that a try.

lkenter

Hi,

Thanks for your replies!

However I just installed quagga version 0.99.15, but I keep the same problem.


Quote from: tigerfishdaisy on August 29, 2009, 08:48:15 AM
It's a bug in Quagga on any sort of Linux.  Google for details.


Can you be a bit more specific, I've googled (quite a lot  :-\ ) but have not found any bugs that explain this behaviour.

Regards

maestroevolution

Quote from: lkenter on August 27, 2009, 01:07:10 PM

I've setup another BGP peer on a remote location and with this setup I could get a session up and runnning. I did notice that I cannot connect to port 179 at the HE router. This is a normal thing according to the HE support, but I stil think it might be the reason my session isn't coming up.

I have nothing listening on port 179 so how can this happen?!

I hope someone can help me out with this. or send me the details of there quagga setup.

Regards,
Lucas


HE support is incorrect...  If it's not listening on port 179, it's not running BGP.  Telnetting to port 179 is my first test that the far side router is configured.

I'm surprised that your netstat doesn't show it listening on 179... Is it possible that quagga is listening only on IPv4?  Have you tried running netstat -nat | grep 179?

Regards,

Joel
(JNCIE, CCIP, CCNP, CCDP, and others...)

maestroevolution

Looking at the details of your debugs, it looks like the BGP session is being torn down due to a lack of a keepalive.  I believe Cisco defaults to a keepalive every minute.  It looks like Quagga is expecting a keepalive every minute, and tearing down the connection when it doesn't receive one. (OpenConfirm at 02:15:44, torn down at 02:16:44 for lack of keepalive.

Also based on the logs... TCP comes up, but the BGP session is never fully established.  The output of show bgp summary lists last up/down as never, with the current state of Openconfirmed.  TCP port 179 is establishing, but BGP isn't actually doing anything.  This does make the keepalive issue moot, as BGP isn't doing anything anyway.

Is HE's side a real router, or are they running BGP on a *nix box? 

A wireshark capture of the connection would be useful in seeing how far the session gets.

Joel

broquea

Strange then, that the other bgp sessions deny telnet to port 179 on other tunnel addresses configured on the router, yet the BGP sessions are alive and working....Nary a filter rule in place that would deny access to 179 on the tunnel-server either.

At any rate, everything is configured correctly on our side, and the only thing I've noted with some (not all) quagga machines that try and establish BGP with our Cisco BGP tunnel-servers, is that it needs to be killed and started again (quagga's bgpd) and then magically works. I'm fairly certain this has been tried in this case, and I can keep rebuilding both the tunnel and BGP session on our side until I turn Cisco green-blue in the face, none of the other sessions on the same router are having any issues.

Also, from the tunnel-server:

telnet 2001:470:15:80::2 bgp
Trying 2001:470:15:80::2, 179 ...
% Connection refused by remote host


Also #2, Does debian have the same daemon control file as on ubuntu? If so, does /etc/daemons have bgpd=yes set in it?

Also #3, maybe also add a nei 2001:470:15:80::1 update-source 2001:470:15:80::2 in case you've also got other v6 tunnels doing bgp or other ipv6 addresses configured on that machine?

lkenter

Hello,

Thanks for your reactions!

and I've tried the update source line allready, but with no success.

The reason you received a connection was probably because the deamon was down. I've tested a lot and didn't turn it on after my latest failure...

now when you connect from somewhere on the globe you should see:

lucas@patman:~$ telnet 2001:470:15:80::2 179
Trying 2001:470:15:80::2...
Connected to 2001:470:15:80::2.
Escape character is '^]'.
Connection closed by foreign host.


Is it perhaps possible to remove the filter on port 179? I know it should work with the passive option, but I think there is something going wroing there because it tries to connect back to that port.

Regards,




broquea

Quote from: lkenter on September 02, 2009, 02:42:16 PM
Is it perhaps possible to remove the filter on port 179? I know it should work with the passive option, but I think there is something going wroing there because it tries to connect back to that port.

Regards,

We aren't filtering anything, there is no filter in place. I've got 47 eBGP sessions configured, 46 established, and yours is the only one with this error in the logs (actually the only error in the logs). Trying to connect from this router, we see:

>telnet 2001:470:15:80::2 179    
Trying 2001:470:15:80::2, 179 ... Open

[Connection to 2001:470:15:80::2 closed by foreign host]


Trying to connect from well, anywhere else, namely linux workstations, we get:

$ telnet 2001:470:15:80::2 179
Trying 2001:470:15:80::2...
Connected to 2001:470:15:80::2.
Escape character is '^]'.
Connection closed by foreign host.


Tried clearing the session once more, and it goes from Idle to OpenSent, and back and forth.

lkenter

YES I DID IT!!!

ipv6-gateway-bgp# show bgp sum
BGP router identifier 195.81.39.58, local AS number 35383
RIB entries 3879, using 242 KiB of memory
Peers 1, using 2520 bytes of memory

Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
2001:470:15:80::1
               4  6939    1618       8        0    0    0 00:00:49     2015


After having a better look at a capture file I noticed that my packages were sent with TTL=1

Googling for this I found that I needed the following extra line in the configuration of my network interface:

for debian  etc/network/interfaces

auto he-ipv6
iface he-ipv6 inet6 v4tunnel
 address 2001:470:15:80::2
 netmask 64
 endpoint 216.66.84.54
 local 196.81.38.5
 ttl 255
 up ip route add ::/0 dev he-ipv6 metric 1
 post-up sysctl -w net.ipv6.conf.all.forwarding=1

Not everyting is yet working, but this means I can go to sleep now with happy dreams ;-)

Thanks for al your support!

jimb

That's interesting.  The man page for "ip" on my distro (gentoo) shows the default TTL to be 64. 

Quote       ttl N  set  a  fixed  TTL  N on tunneled packets.  N is a number in the
              range 1--255. 0 is a special value meaning that packets  inherit
              the  TTL value.  The default value for IPv4 tunnels is: inherit.
              The default value for IPv6 tunnels is: 64.


Confirmed with this:
{root@gtoojimb/pts/2}~# cat /proc/sys/net/ipv6/conf/he6/hop_limit
64

(don't know how to get it out of the ip command)

On deb it's one?

lkenter

cat /proc/sys/net/ipv6/conf/ipv6-he/hop_limit shows 64 for me as well even now that I added the ttl 255 line in /etc/network/interfaces.

Don't understand it, but I'm willing to accept it anyway :-)

maestroevolution

Quote from: lkenter on September 05, 2009, 03:22:54 PM
cat /proc/sys/net/ipv6/conf/ipv6-he/hop_limit shows 64 for me as well even now that I added the ttl 255 line in /etc/network/interfaces.

Don't understand it, but I'm willing to accept it anyway :-)

(Sorry for delay in getting back; been busy)

eBGP (external BGP) defaults to a TTL of 1, as it expects its remote peer to be directly connected, normally via POS/Gig/TenGig or whatever.

To run eBGP when not directly connected (such as through a firewall), you need to increase the ttl to accommodate the intermediate routers/firewalls from decrementing the ttl.

With tunneling protocols, your mileage may vary.  I've seen a few firewalls decrement the TTL of both the outer packet (IPv4, protocol 41) and the inner packet (IPv6, TCP, port 179).  This is the exception, not the rule, though.  IMHO, the inner packet is payload and shouldn't be touched, but I once troubleshot an issue where this was the cause: a device was decrementing the outer and inner headers as it routed the packet.

My two bits,

Joel

jimb

Quote from: maestroevolution on September 11, 2009, 11:41:42 AM
Quote from: lkenter on September 05, 2009, 03:22:54 PM
cat /proc/sys/net/ipv6/conf/ipv6-he/hop_limit shows 64 for me as well even now that I added the ttl 255 line in /etc/network/interfaces.

Don't understand it, but I'm willing to accept it anyway :-)

(Sorry for delay in getting back; been busy)

eBGP (external BGP) defaults to a TTL of 1, as it expects its remote peer to be directly connected, normally via POS/Gig/TenGig or whatever.

To run eBGP when not directly connected (such as through a firewall), you need to increase the ttl to accommodate the intermediate routers/firewalls from decrementing the ttl.

With tunneling protocols, your mileage may vary.  I've seen a few firewalls decrement the TTL of both the outer packet (IPv4, protocol 41) and the inner packet (IPv6, TCP, port 179).  This is the exception, not the rule, though.  IMHO, the inner packet is payload and shouldn't be touched, but I once troubleshot an issue where this was the cause: a device was decrementing the outer and inner headers as it routed the packet.

My two bits,

Joel
Even odder then, since it seems that this would be something that should be configured in quagga rather than on the interface itself.  If quagga is setting up its options such that the TTL is set to 1 for its transmitted packets, or reconfiguring the interface to a TTL of 1 (doubt it ... that'd be rude), why would changing the TTL in the interface set up fix things?

Also, I'm curious.  Which firewalls have you seen which decrement a transiting tunnel packet's payload packet TTL?  That'd be good to know for future reference.

maestroevolution

Quote
Even odder then, since it seems that this would be something that should be configured in quagga rather than on the interface itself.  If quagga is setting up its options such that the TTL is set to 1 for its transmitted packets, or reconfiguring the interface to a TTL of 1 (doubt it ... that'd be rude), why would changing the TTL in the interface set up fix things?

Also, I'm curious.  Which firewalls have you seen which decrement a transiting tunnel packet's payload packet TTL?  That'd be good to know for future reference.

I believe the setting is set under quagga, not the interface itself.  I think someone else posted a man page output of adjusting ttl on a per interface basis, but I was referring to the earlier "I DID IT!" post, where (IIRC), I saw an ebgp-multihop value set.

Regarding firewalls that decrement TTL of internal tunnels, the two I've personally seen were 1) UTstarcom (fka Commworks, fka 3com) PDSN, and an older PIX (ASA?) that was somehow configured to inspect tunnelled traffic.  I was told it would wait for all of the outer packets in order to validate the inner packet, and then decrement the TTL of both headers.

Both of these cases were with IP-in-IP tunnels, though, not 6-in-4.  As with all things that do deep inspection of packets, your mileage may vary.

Regards,

Joel