Over the years, we’ve really enjoyed the various projects created by Ryan MacDonald in terms of helping our customers have more reliable and more secure servers.
While we do customize the implementation of APF as well as BFD (making some core changes to allow us to integrate APF into our other managed security offerings), one of the issues we run into from time to time with APF is that if local DNS resolution is not working when the server is rebooted, a server will hang at starting APF.
Most of the time this issue can be resolved by making sure local DNS resolution is perfect on reboot.
Sometimes an easy fix is editing /etc/resolv.conf to make use of the data center’s recommended name servers; other times it might be using the free DNS service of OpenDNS or Google.
What about the co-location customer or customer of a small mom and pop data center who cannot assist with how the server integrates with the data center network on a local DNS basis
Well, then you are stuck with a decision of whether to use APF or not, or reboot the server without APF; then someone has to remember to start up APF once local DNS resolution is working.
The latter part is not a good option for security, and the former doesn’t fit for customers who want an integrated picture that includes APF and BFD.
I did ask Ryan about the issue about a year ago (this problem doesn’t crop up often), and Ryan’s reply is accurate and makes sense:
This is a long standing issue that is more to do with accepting host names in the trust rules, that if there is any network issues they are not resolvable and iptables has no built in timeout feature for resolving DNS.
Until now, we’ve worked around the issue using standard techniques, one of which is mentioned above (change what is in /etc/resolv.conf), until recently where one of our long standing customers got a good deal on a MAC-based server; and put it in a data center where everything is “you are on your own other than ping, power, and pipe (which is more typical than not).
While it would be extremely easy and convenient to put the burden of this problem on the customer and the data center they picked, I was determined to not to go through status quo motions.
The crux of the matter was local DNS resolution. How could I test it where the test itself would not hang or otherwise take long to run?
Creating the test is easy, comment out all entries of /etc/resolv.conf
Yet, the common tools recommended over the years such as host and nslookup to check for DNS issues (local and external) do not appear to have a way to control time out other than how often it will retry. Comment out the entries in /etc/resolv.conf and then run host or nslookup against a valid public domain name, then wait… and wait… and wait… and wait… Hmm… isn’t that what is causing the APF problem?
Well all this thought about “digging into local DNS resolution” brought me to the dig command. dig is a part of the bind-utils package available for CentOS through yum installation (yum install bind-utils -y).
Looking at the dig man pages, I saw you could pass a time out and number of tries and retries as part of the command structure. Would this work quickly enough to test local DNS resolution?
Using dig +time=1 +tries=1 +retry=0 yahoo.com, I was able to test response times on various servers when local DNS was down as well as up. The response time appeared to be acceptable within less than or equal to two seconds.
I wrote a test script to test my theory and create a framework for what might be used to restart APF if it was down because it was never set up to start on reboot (to avoid hanging on reboot).
DNS_CHECK=`/usr/bin/dig +time=1 +tries=1 +retry=0 yahoo.com | /bin/grep ‘timed out’`
DNS_FAILED=’;; connection timed out; no servers could be reached’
if [ “$DNS_CHECK” != “$DNS_FAILED” ]; then
echo “local DNS is working”
echo “local DNS is not working”
Running through the test proved the concept, but now what… think, think, think. While I could write a wrapper and put it in /etc/rc.local (which is part of the boot up sequence), I would prefer to have the server running a little bit longer, even if by a few minutes; which also meant avoiding using a sleep command.
Then I remembered that we recently created a System Integrity Monitor — S.I.M. — add on module just a few weeks ago to test for IP addresses which became unbound from the network interface. I wrote about that adventure in the elusive hunt to protect against IP blackouts.
Now that I had the code to check if local DNS is working, incorporating the code as part of another S.I.M. add on module was easy.
While the real test will come from our client with the MAC doing a controlled reboot of his server (we have APF turned off for reboot), testing the add on module is as simple as doing the following:
- Everything running fine — APF up, local DNS working –> run sim -s
- service apf stop to shut down APF –> run sim -s
- service apf stop to shut down APF and edit /etc/resolv.conf to comment all name servers –> run sim -s
- service apf stop to shut down APF and edit /etc/resolv.conf to uncomment all name servers previously commented out –> run sim -s
When you are happy everything is working, and want to have APF come up after reboot (not during when it might hang), then run “chkconfig –levels 0123456 apf off” to turn off apf from starting on reboot.
Since Ryan MacDonald is kind enough to share S.I.M. with the world, I thought it would be nice to share dni_apf.mod with other R-fx Network fans.
Please contact us if you have any questions.