in devops

network timeouts

Every developer knows that heart sinking moment when you find out your website is down or when you inadvertently perform an action in the production db rather than the development one.  Unfortunately I was that guy this week – though thankfully nothing quite so catastrophic.

Our digital signage platform www.neonburn.com has a nifty feature which allows you to search a collection of royalty free images, provided by the guys over pixabay.com. This feature gets you nice looking backgrounds for your slides in an instant. This is a killer feature that gets you knocking out your digital signs in no time – when it works.

Anyway, this is a tech blog so where is the tech?? Well, the problem was that on the server we couldn’t download any image from pixabay. The search was working great in the browser but when our server tried to download the image selected by the user it was timing out.

Now the debug. First things first, I tred to use curl -O to download the image from the terminal. Nope. Next I tried to use curl to download the BBC logo from their site. Working. Shit. Right so I’m beginning to think that I’ve been barred from Pixabay – so much so that I sent them a support request! I try ping pixabay.com from the server and it works. Next I try traceroute pixabay.com to see the route packets are taking to the server. This showed that Pixabay servers were form the same provider (Hetzner) as ours. Mmmmmm.

Next I attempted to curl a file from hetzner.de. Didn’t work! Oh but now we’re getting somewhere but I’m sort of shitting myself as I’m getting out my depth.

It’s at this point where, let’s call it magic but some might say luck, I have a hunch and I’m not even sure where the fuck the hunch came from. The hunch? Maybe this is something to do with IPv6, I’m still confused as to why I thought it might be, I hardly even knew what IPv6 was. But bang!

So it turns out that both sites I’ve had problems accessing are communicating using IPv6 (I’ll be honest I didn’t even know that my server had an IPv6 address setup). Breakthrough. However, now I’m really really out my depth. I now need to try and figure out how this whole IPv6 thing works in general and with CentOS.

Now here is what I learned: my pings and traceroutes were working because both ping and traceroute commands are for use with IPv4 address. If you need to check these things with IPv6 network you must use ping6 and traceroute6 respectively otherwise you are being misled! Lesson #1. The same goes for the ip command which is used to list the allocated ip addresses or list the kernel routes, you must use the -6 switch. For example you use ip -6 route show, to see the kernel routing for IPv6 comms. Lesson #2.

Even at this point I still could not see what was causing the problem. Next I started looking at the configuration for eth0 interface which is found in /etc/sysconfig/network-scripts/ifcfg-eth0, on Hetzner CentOS anyway. Everything seemed fine, it contained both the static IPv4 and IPv6 addresses as described in the Hetzner docs. Bummer. I’d read about issues if the IPv6 loopback was not included in the config but running ip addr to list the network interfaces confirmed that loopback address was there. Running out of ideas now…..

The final check was to make sure that the kernel was routing the IPv6 packets via the correct default gateway and that somehow the rules had not become messed up. First though I checked that I could indeed ping6 the default gateway which is at fe80::1. Yep, working fine. I next ran ip -6 route to 2a01:4f8:b0:a097::2 which is the IP of pixabay and yept it seemed to be selecting the correct gateway – incidentally you can use nslookup -query=AAAA pixabay.com to get its IPv6 address.

That was me I was out! Completely stumped. At this point I simply commented out the lines in the ifcfg-eth0 file that setup the IPv6 address and restarted the network service service network restart to pick up the changes. This basically disabled IPv6 and in turn forced comms via IPv4 and everything worked as before. Not exactly satisfactory but you need to give up at some point. I bet this sounds like a shit blog post now…..

Final task was to mail Hetzner support, who to their credit got back to me within 10 mins. I suspected that they would tell me to get lost though as it was looking like a problem with my OS install. However to my surprise they told me exactly what the problem was!! Should have emailed earlier.

Turns out that when they moved my server to a new data center 6 weeks ago they removed the IPv6 subnet I was on but it was kept alive for 6 weeks to allow people to request a new one. It said all this in the email they sent me about the server move but since I didn’t even know I was using IPv6 I just glossed over it. Whoops, schoolboy error. Still, I think I’ll just live with my IPv4 setup for the meantime to save wasting any more time!

At the end of the day I wouldn’t have learned half the stuff about the IPv6 if I hadn’t faced this issue. So there are even positives in disasters, even more so when you write them down like this

Write a Comment

Comment