HOWTO Implement GeoDNS using BIND
:: DB-IP :: DB-IP.continent :: DB-IP.region :: IP2Location :: MaxMind :: MaxMind.area :: MaxMind.continent :: MaxMind.region ::
:: Hosted by Mythic Beasts :: Powered by Cloudflare :: GeoIP.acl Downloads ::
This HOWTO documents an elegant Linux BASH script and a new unified Python script that can be used to help configure BIND to be geo-aware. The script utilises the information contained within the freely downloadable GeoIP CSV file, published monthly by MaxMind, to generate a downloadable GeoIP.acl include file for BIND. No patching of the BIND source code is required for this to work (unlike other methods that have been documented online) thus making it easier to manage GeoIP updates to BIND as and when MaxMind publish updated versions of their GeoIP CSV file or the ISC release newer versions of BIND. If you are seeking to implement geo-aware DNS with BIND on the IPv6 network, you will probably find this extremely useful.
Licensing & Copyright
The copyrighted material on this page is made available to anyone wishing to use, modify, copy, or redistribute it subject to the terms and conditions of the GNU General Public License. The scripts published on this page are distributed in the hope that they will be useful, but WITHOUT ANY WARRANTY expressed or implied, including the implied warranties of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. For further information, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
:: a single unified ACL include file for BIND sourced from MaxMind that spans 98.137% & 0.084% of the IPv4 & IPv6 global unicast address space ::
:: | latest update from MaxMind contains 251 countries spanning 485,064 IPv4 & 477,598 IPv6 networks | :: |
:: a single unified ACL include file for BIND sourced from IP2Location that spans 100.000% & 100.000% of the IPv4 & IPv6 global unicast address space ::
:: | latest update from IP2Location contains 250 countries spanning 346,456 IPv4 & 1,661,178 IPv6 networks | :: |
:: a single unified ACL include file for BIND sourced from DB-IP that spans 100.000% & 100.000% of the IPv4 & IPv6 global unicast address space ::
:: | latest update from DB-IP contains 251 countries spanning 622,536 IPv4 & 614,356 IPv6 networks | :: |
New all-in-one Python script to auto-generate a single unified GeoIP.acl file for BIND that spans both the IPv4 and IPv6 address space!
This has been on my to-do list for a while now; I just had to find the time to write it. Having witnessed the adoption of the scripts documented on this page over the past several years, across various open source projects, I felt the time has finally arrived to unify them into a single script solution that runs on most modern day Linux distributions. That script is now documented below.
Simply change to the directory where you would like the GeoIP.acl file to be created and then invoke this script. It will source all the necessary GeoIP data directly from the MaxMind website and create a single ACL file containing country specific ACL entries for both the IPv4 and IPv6 address space.
#!/bin/env python3
#
# All-in-one Python script for auto-generating the GeoIP.acl file for BIND,
# the Berkeley Internet Name Domain.
#
# It sources GeoIP data from MaxMind/IP2Location/DB-IP and processes it to
# generate a unified GeoIP.acl file, in the current working directory,
# containing country specific ACLs for both the IPv4 and IPv6 address space.
#
# For the latest version, including any updates and/or bug fixes, visit
# https://geoip.site/GeoIP.py
#
# Copyright © 2020 Mark Hedges <[email protected]>
#
# Redistribution and use in source and binary forms, with or without
# modification, are permitted provided that the following conditions are met:
#
# * Redistributions of source code must retain the above copyright
# notice, this list of conditions and the following disclaimer.
#
# * Redistributions in binary form must reproduce the above copyright
# notice, this list of conditions and the following disclaimer in the
# documentation and/or other materials provided with the distribution.
#
# * Neither the name of the copyright holder nor the
# names of its contributors may be used to endorse or promote products
# derived from this software without specific prior written permission.
#
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
# AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
# IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
# ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
# LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
# SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
# INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
# CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
# ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
# POSSIBILITY OF SUCH DAMAGE.
import re
import gzip
import pandas
import socket
import struct
import fnmatch
import zipfile
import requests
from sys import argv
from mpmath import mag
from datetime import date
from tempfile import mkstemp
from os import environ, remove
USER_AGENT = 'https://geoip.site/GeoIP.py'
def fetch(url):
'''
Helper function for fetching a file.
'''
filename = mkstemp()[1]
try:
open(filename, 'wb').write(requests.get(url, headers={'User-Agent': USER_AGENT}).content)
except:
remove(filename)
raise
return filename
def MaxMind(url, kwargs):
'''
Custom function for processing MaxMind GeoLite2 Country & City CSV databases.
'''
def read(pattern, **kwargs):
pattern = f'GeoLite2-{db}-CSV_*/GeoLite2-{db}-{pattern}.csv'
files = fnmatch.filter(i.namelist(), pattern)
if len(files) != 1:
remove(filename)
raise Exception('Failed to locate exactly one file matching:\n %s\nwithin:\n %s' % (pattern, url))
return pandas.read_csv(i.open(files[0]), **kwargs)
db = kwargs['db']
key = kwargs['key']
dtype = {x: str for x in key}
dtype['postal_code'] = str
filename = fetch(url)
i = zipfile.ZipFile(filename)
locations = read('Locations-en', keep_default_na=False)
acls = {}
for v in '46':
seen = set()
for geoname_id in ['geoname_id', 'registered_country_geoname_id', 'represented_country_geoname_id']:
blocks = read('Blocks-IPv' + v, dtype=dtype).fillna(0)
blocks[geoname_id] = blocks[geoname_id].astype(int)
df = pandas.merge(locations, blocks, left_on='geoname_id', right_on=geoname_id)[key + ['network']]
if df.empty:
continue
df['key'] = df[key].apply(lambda x: ':'.join(filter(None, x)) or 'ZZ', axis=1)
for acl in df.key.unique():
networks = df.loc[acl == df.key].network.to_list()
acls.setdefault(acl, []).extend(x for x in networks if x not in seen)
seen.update(networks)
remove(filename)
return acls
'''
Define source of GeoIP data and how to process it.
'''
PROVIDERS = {
'DB-IP': [
[
dict(range=[0, 1], key=[2]),
'https://download.db-ip.com/free/dbip-country-lite-{}.csv.gz'.format(date.today().strftime('%Y-%m')),
None,
],
],
'DB-IP.continent': [
[
dict(range=[0, 1], key=[2]),
'https://download.db-ip.com/free/dbip-city-lite-{}.csv.gz'.format(date.today().strftime('%Y-%m')),
None,
],
],
'DB-IP.region': [
[
dict(range=[0, 1], key=[3, 4]),
'https://download.db-ip.com/free/dbip-city-lite-{}.csv.gz'.format(date.today().strftime('%Y-%m')),
None,
],
],
'IP2Location': [
[
dict(range=[0, 1], key=[2]),
'https://download.ip2location.com/lite/IP2LOCATION-LITE-DB1.CSV.ZIP',
'IP2LOCATION-LITE-DB1.CSV',
],
[
dict(range=[0, 1], key=[2]),
'https://download.ip2location.com/lite/IP2LOCATION-LITE-DB1.IPV6.CSV.ZIP',
'IP2LOCATION-LITE-DB1.IPV6.CSV',
],
],
'MaxMind': [
[
MaxMind,
'https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-Country-CSV&suffix=zip&license_key=' + environ['MAXMIND_LICENSE_KEY'],
dict(db='Country', key=['country_iso_code']),
],
],
'MaxMind.area': [
[
MaxMind,
'https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City-CSV&suffix=zip&license_key=' + environ['MAXMIND_LICENSE_KEY'],
dict(db='City', key=['country_iso_code', 'subdivision_1_iso_code', 'subdivision_2_iso_code', 'metro_code']),
],
],
'MaxMind.continent': [
[
MaxMind,
'https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-Country-CSV&suffix=zip&license_key=' + environ['MAXMIND_LICENSE_KEY'],
dict(db='Country', key=['continent_code']),
],
],
'MaxMind.region': [
[
MaxMind,
'https://download.maxmind.com/app/geoip_download?edition_id=GeoLite2-City-CSV&suffix=zip&license_key=' + environ['MAXMIND_LICENSE_KEY'],
dict(db='City', key=['country_iso_code', 'subdivision_1_iso_code']),
],
],
}
'''
Upper limit for IPv4 global unicast address space, excluding multicast.
This value also acts as a boundary for considering when integers are to
be treated as IPv4 or IPv6 addresses; IPv4 for integers less than this,
IPv6 for integers greater than or equal to this.
'''
IPv4 = 7 * 1 << 29
'''
Network and mask for 2000::/3 IPv6 global unicast address space.
'''
IPv6n = 1 << 125
IPv6m = 7 * IPv6n
def _i(s):
'''
Converts a string to an integer IP address. If no '.' or ':'
characters are found, assumes that the string represents a number
and returns the direct integer conversion, otherwise proceeds with
conversion on the presumption that the existence of '.' characters
indicates an IPv4 address and ':' characters an IPv6 address.
'''
if ':' in s:
x, y = struct.unpack('!2Q', socket.inet_pton(socket.AF_INET6, s))
return y | x << 64
elif '.' in s:
return struct.unpack('!I', socket.inet_aton(s))[0]
else:
return int(s)
def _a(x):
'''
IP range aggregator and splitter function.
First merges adjacent ranges together. Then
splits those ranges on network boundaries.
'''
for k, v in x.items():
i, v = 1, sorted(map(_r, v))
while i < len(v):
b1, e1 = v[i]
b0, e0 = v[i-1]
if b1 == e0 + 1:
v[i] = (b0, e1)
del v[i-1]
continue
i += 1
x[k] = []
for r in v:
x[k].extend(_s(*r))
return x
def _r(x):
'''
Given a network range, as a string containing a '-' character,
or a network address and netmask, as a string containing a '/'
character, returns the actual IP range as a two element tuple
containing the begin and end integers for that network.
'''
if '-' in x:
return tuple(map(_i, x.split('-')))
elif '/' in x:
n, m = map(_i, x.split('/'))
return n, n-1+2**((32 if n < IPv4 else 128)-m)
def _s(b, e):
'''
Recursive IP range splitter function.
'''
if 'x' not in locals():
x = []
if e < b:
b, e = e, b
if e < IPv4:
'''Range is IPv4'''
s = 32
else:
'''Presume range is IPv6 and force beginning to be at least the IPv4 boundary'''
s, b = 128, max(b, IPv4)
l = mag(e-b+1)-1
m = 2**s-2**l
n = m & e
if n == m & b:
_v(b) and x.append((b, s-l))
else:
x.extend(_s(b, n-1))
x.extend(_s(n, e))
return x
def _v(n):
'''
Rudimentary IP address validation function.
Checks IP address resides in global unicast address space.
'''
return n < IPv4 or IPv6n == n & IPv6m
def main():
ARG = argv[1] if len(argv) > 1 else None
PROVIDER = PROVIDERS.get(ARG)
if not PROVIDER:
print('First argument must be one of the following:\n')
for s in sorted(PROVIDERS):
print(' * ' + s)
exit(1)
ACLs, RE = {}, re.compile('[^\w-]+')
for META, URL, FILE in PROVIDER:
if callable(META):
ACLs = META(URL, FILE)
break
l = URL.lower()
filename = fetch(URL)
key = META['key']
range = META['range']
begin, end = range
try:
if l.endswith('.gz'):
i = gzip.open(filename)
elif l.endswith('.zip'):
i = zipfile.ZipFile(filename).open(FILE)
df = pandas.read_csv(i, header=None, keep_default_na=False, dtype={x: str for x in key + range})
df['key'] = df[key].apply(lambda x: ':'.join(filter(None, map(lambda x: RE.sub('.', x), x))), axis=1)
df['range'] = df[range].apply(lambda x: '-'.join(x), axis=1)
if ARG == 'DB-IP.region':
df.key = df.key.str.replace('.Og.', '.og.')
df = df[['key', 'range']]
for acl in df.key.unique():
ACLs.setdefault(acl if len(acl) > 1 else 'ZZ', []).extend(df.loc[acl == df.key].range.to_list())
except:
raise
finally:
remove(filename)
with open('GeoIP.acl', 'w') as file:
for k, v in sorted(_a(ACLs).items()):
file.write('acl ' + k + ' {\n')
for n, m in sorted(v):
file.write('\t{}/{};\n'.format(
socket.inet_ntop(socket.AF_INET, struct.pack('!I', n)) if n < IPv4 else
socket.inet_ntop(socket.AF_INET6, struct.pack('!2Q', n >> 64, n & (1 << 64) - 1)),
m)
)
file.write('};\n\n')
if __name__ == '__main__':
main()
BUG FIX ANNOUNCEMENT
If you have accessed this page before the 1st of January 2010, and thus are using these scripts as they were published on this page before this date, changes have since been made to them to address a couple of discovered issues.
- The first is a change to the fastest recursive script. The change is nothing major but effectively reduces execution time slightly by splitting IP ranges when generating the GeoIP.acl file rather than splitting IP ranges when creating the CBE (Country,Begin,End) CSV file. The change is purely in relation to where the range splitting takes place, resulting in grep pattern matching against fewer lines, thus marginally reducing the execution time of the script.
- The second fix has been made to all scripts and was discovered when noticing that the recursive awk function could not correctly split extremely large IP ranges, with an order of magnitude exceeding about 231. For example, giving the script the range 0 to 2147483647 would result in it printing 0.0.0.0/0 rather than 0.0.0.0/1. I located this issue to a rounding anomaly with the printf function within awk and the solution is to simply ensure that all occurrences of the logarithmic division calculation in each script are truncated to a whole number using the int function. This bug has probably not caused people too much grief because the ranges supplied within the MaxMind GeoIP CSV file are nowhere near a magnitude of 231 (the largest IP range listed as of writing is of magnitude 226, representing the network 28.0.0.0/6 in the United States). Nevertheless, this was a bug and has now been fixed in the scripts published below.
Overview
I was recently asked by my employer to bring our DNS in-house from UltraDNS where we originally hosted all our domain names. Due to various requirements within the company, they were utilising UltraDNS's geo-targetting feature to enable internet users in different areas of the world to resolve hosts on their domains to varying IP addresses, depending on the geographical (country) location of these users.
Having already been exposed to BIND's views feature some years ago, I googled on how it would be possible to make BIND geo-aware. There is not much documentation about this online but I found one such solution which involved patching the BIND source code. All well and good but, in all honesty, this seemed like using a sledge hammer to crack a nut. Besides, our company does not like patching (hacking) source code unless there is a real requirement to do so as it normally entails maintenance by having to refit changes into revisions of the BIND source code as and when the ISC release newer versions of BIND.
I analysed the patching BIND method further and the solution still uses two fundamental things to achieve a geo-aware DNS setup; BIND's views feature and the freely downloadable GeoIP data available from MaxMind. It was then I realised that to make BIND geo-aware, all that is required is to reformat the data in the MaxMind GeoIP CSV file into something which BIND likes, and will accept in its configuration file. The easiest and most manageable way to achieve this is by using the BIND Access Control List clause, but here lies the problem. The MaxMind GeoIP CSV file operates in IP ranges whereas BIND ACLs operate on IP networks, in classic net/mask notation. So, basically, I had to formulate a method to transform MaxMind IP ranges into BIND ACLs. This method is attainable by using the Linux BASH script(s) shown below.
The result is the automatic creation of a single and maintainable GeoIP.acl include file that can be instantly added into any already running BIND DNS server, without the requirement for source code patching and recompilation, producing a geo-aware production-ready DNS server in a matter of minutes.
Linux BASH script(s) to fetch, unzip, reformat and generate the GeoIP.acl include file for BIND
There are two different BASH scripts documented below which will generate the GeoIP.acl include file for BIND. The second is an improvement over the first but I've left it documented anyway as it was my original implementation. The first uses an iterative BASH loop (slower) whereas the second uses a recursive AWK function (much faster). Both achieve exactly the same thing by employing different programming constructs. For speed and efficiency, I recommend using the second recursive script.
NOTE: By default, some distributions of Linux use a non-GNU version of AWK which lacks the bitwise AND function. In this instance, GAWK must be installed (the GNU version of AWK) for the scripts below to function correctly (thanks to Ruben for pointing this out).
Each script will attempt to download the latest MaxMind GeoIP CSV file (which is actually a ZIP file). Once downloaded, it will use this file and reprocess it each time it is executed. Removing the ZIP file and then rerunning the script will force it to perform another fetch from MaxMind. Once the ZIP file has been fetched, each script will unzip it, reformat the enclosed GeoIP CSV file (taking several passes to do this if the iterative version is used) and then generate the file GeoIP.acl which is the include file that can be added into BIND's configuration to make it geo-aware.
Iterative Version (slowest)
#!/bin/bash
[ -f GeoIPCountryCSV.zip ] || wget -T 5 -t 1 https://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip
echo -n "Creating initial CBE (Country,Begin,End) CSV file..."
unzip -p GeoIPCountryCSV.zip GeoIPCountryWhois.csv | awk -F \" '{print $10","$6","$8}' > cbe0.csv
echo -ne "DONE\nSplitting CBE CSV file..."
lc0=0; lc1=$(wc -l cbe0.csv | awk '{print $1}')
while [ $lc0 -lt $lc1 ]
do
lc0=$lc1; echo -ne "\n$lc0\t"
awk -F , '{m = 2^32-2^int(log($3-$2+1)/log(2)); n = and(m,$3); if (n == and(m,$2)) print; else printf "%s,%u,%u\n%s,%u,%u\n",$1,$2,n-1,$1,n,$3}' cbe0.csv > cbe1.csv
mv -f cbe1.csv cbe0.csv; lc1=$(wc -l cbe0.csv | awk '{print $1}')
echo -ne "+$[$lc1-$lc0]\t"; [ $lc0 -lt $lc1 ] && echo -n "OK"
done
echo -ne "DONE\nGenerating BIND GeoIP.acl file..."
(for c in $(awk -F , '{print $1}' cbe0.csv | sort -u)
do
echo "acl $c {"
grep "^$c," cbe0.csv | awk -F , '{printf "\t%u.%u.%u.%u/%u;\n",$2/2^24%256,$2/2^16%256,$2/2^8%256,$2%256,32-int(log($3-$2+1)/log(2))}'
echo -e "};\n"
done) > GeoIP.acl
rm -f cbe0.csv
echo "DONE"
exit 0
Here's this script in action!
$ ./GeoIP.sh
--00:00:00-- https://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip
=> `GeoIPCountryCSV.zip'
Resolving geolite.maxmind.com... 64.246.48.99
Connecting to geolite.maxmind.com|64.246.48.99|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1,556,500 (1.5M) [application/zip]
100%[================================================================================>] 1,556,500 820.41K/s
00:00:02 (818.35 KB/s) - `GeoIPCountryCSV.zip' saved [1556500/1556500]
Creating initial CBE (Country,Begin,End) CSV file...DONE
Splitting CBE CSV file...
106184 +31276 OK
137460 +23038 OK
160498 +11755 OK
172253 +6413 OK
178666 +3544 OK
182210 +1905 OK
184115 +949 OK
185064 +463 OK
185527 +202 OK
185729 +94 OK
185823 +38 OK
185861 +19 OK
185880 +5 OK
185885 +2 OK
185887 +0 DONE
Generating BIND GeoIP.acl file...DONE
Recursive Version (fastest)
#!/bin/bash
[ -f GeoIPCountryCSV.zip ] || wget -T 5 -t 1 https://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip
echo -n "Creating CBE (Country,Begin,End) CSV file..."
unzip -p GeoIPCountryCSV.zip GeoIPCountryWhois.csv | awk -F \" '{print $10","$6","$8}' > cbe.csv
echo -ne "DONE\nGenerating BIND GeoIP.acl file..."
(for c in $(awk -F , '{print $1}' cbe.csv | sort -u)
do
echo "acl $c {"
grep "^$c," cbe.csv | awk -F , 'function s(b,e,l,m,n) {l = int(log(e-b+1)/log(2)); m = 2^32-2^l; n = and(m,e); if (n == and(m,b)) printf "\t%u.%u.%u.%u/%u;\n",b/2^24%256,b/2^16%256,b/2^8%256,b%256,32-l; else {s(b,n-1); s(n,e)}} s($2,$3)'
echo -e "};\n"
done) > GeoIP.acl
rm -f cbe.csv
echo "DONE"
exit 0
Here's this script in action!
$ ./GeoIP.sh
--00:00:00-- https://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip
=> `GeoIPCountryCSV.zip'
Resolving geolite.maxmind.com... 64.246.48.99
Connecting to geolite.maxmind.com|64.246.48.99|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1,556,500 (1.5M) [application/zip]
100%[================================================================================>] 1,556,500 820.41K/s
00:00:02 (818.35 KB/s) - `GeoIPCountryCSV.zip' saved [1556500/1556500]
Creating CBE (Country,Begin,End) CSV file...DONE
Generating BIND GeoIP.acl file...DONE
Both of these scripts will generate the file GeoIP.acl in the current working directory which looks something like this:
acl A1 {
64.46.32.0/23;
64.46.35.0/24;
64.46.40.64/26;
64.46.42.0/23;
64.46.47.0/24;
66.38.243.0/24;
67.15.183.0/25;
69.10.130.128/26;
69.10.139.0/25;
69.10.140.192/26;
...
acl GB {
2.6.190.56/29;
9.20.0.0/17;
12.129.72.32/29;
23.0.0.0/9;
25.0.0.0/8;
32.58.57.0/29;
32.58.58.0/28;
32.58.59.0/29;
32.60.34.96/27;
51.0.0.0/8;
...
217.204.159.96/29;
217.204.159.104/30;
217.204.159.112/28;
217.204.159.128/25;
217.204.160.0/19;
217.204.192.0/18;
217.205.0.0/16;
217.206.0.0/15;
217.237.189.240/29;
217.243.204.144/29;
};
...
217.194.132.0/24;
217.194.145.144/29;
217.194.146.192/26;
217.194.147.240/28;
217.194.149.32/28;
217.194.149.168/29;
217.194.156.0/26;
217.194.157.48/28;
217.194.157.144/29;
217.194.157.168/29;
};
How do these scripts work?
I wont go into the technicalities of how these scripts work (this is left as an exercise for the reader) but the first iterative script creates a new CSV file containing 3 fields (Country,Begin,End) and then repeatedly searches for and splits these IP ranges on network boundaries so we are left with a CSV file that has exactly the same coverage of IPs as before but has been processed so that the IP ranges reside on values that allow for each range to be expressed concisely in net/mask notation. The final part of the script then uses this CSV file to generate the GeoIP.acl include file.
The second recursive script achieves the same result faster by creating a new CSV file as before, containing 3 fields (Country,Begin,End), and then performs recursive range splitting "on the fly" within awk itself, for each country, to generate the GeoIP.acl include file.
Once either of these scripts have finished running, you can slot the newly created GeoIP.acl file straight into your existing BIND configuration file, by adding the line:
include "/path/to/GeoIP.acl";
to named.conf. It will then be possible to create custom geo-views within BIND, like this:
view "north_america" {
match-clients { US; CA; MX; };
recursion no;
zone "example555.com" {
type master;
file "pri/example555-north-america.db";
};
};
view "south_america" {
match-clients { AR; CL; BR; PY; PE; EC; CO; VE; BO; UY; };
recursion no;
zone "example555.com" {
type master;
file "pri/example555-south-america.db";
};
};
view "other" {
match-clients { any; };
recursion no;
zone "example555.com" {
type master;
file "pri/example555-other.db";
};
};
If you decide to cron these scripts within your BIND name server(s), do remember to reload named (normally achieved by running the command service named reload on RedHat/CentOS) so the new ACL definitions within the GeoIP.acl file are loaded into BIND's memory.
Summary
I hope this article proves useful for others (that's why I have documented it). Interestingly, my original implementation of this was by using a PHP script coupled with MySQL, loading the MaxMind CSV file into a database table, and then running SELECT, UPDATE and INSERT queries to split up the IP ranges. Whilst this worked, it depended on having PHP and MySQL installed and configured. The above scripts achieve exactly the same thing but only using BASH commands and utilities, such as awk, grep and sort, which in my view, is far cleaner!
Incidently, it is actually possible to produce the GeoIP.acl file without using grep or any intermediate CSV file (shown below). These scripts may be used instead but with markedly longer execution times and, because of this, an echo statement, outputting the current country code to standard error, has been introduced into their main loops to give an indication of progress while the scripts are running.
Recursive Versions (smallest)
#!/bin/bash
[ -f GeoIPCountryCSV.zip ] || wget -T 5 -t 1 https://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip
unzip GeoIPCountryCSV.zip || exit 1
(for c in $(awk -F \" '{print $10}' GeoIPCountryWhois.csv | sort -u)
do
echo "$c" >&2
echo "acl $c {"
awk -F \" 'function s(b,e,l,m,n) {l = int(log(e-b+1)/log(2)); m = 2^32-2^l; n = and(m,e); if (n == and(m,b)) printf "\t%u.%u.%u.%u/%u;\n",b/2^24%256,b/2^16%256,b/2^8%256,b%256,32-l; else {s(b,n-1); s(n,e)}} c == $10 {s($6,$8)}' c=$c GeoIPCountryWhois.csv
echo -e "};\n"
done) > GeoIP.acl
rm -f GeoIPCountryWhois.csv
exit 0
We can marginally reduce the execution time of the above script by adjusting its awk line to match the current country using a regular expression, as opposed to setting the awk variable c and then checking if c == $10, as follows:
#!/bin/bash
[ -f GeoIPCountryCSV.zip ] || wget -T 5 -t 1 https://geolite.maxmind.com/download/geoip/database/GeoIPCountryCSV.zip
unzip GeoIPCountryCSV.zip || exit 1
(for c in $(awk -F \" '{print $10}' GeoIPCountryWhois.csv | sort -u)
do
echo "$c" >&2
echo "acl $c {"
awk -F \" 'function s(b,e,l,m,n) {l = int(log(e-b+1)/log(2)); m = 2^32-2^l; n = and(m,e); if (n == and(m,b)) printf "\t%u.%u.%u.%u/%u;\n",b/2^24%256,b/2^16%256,b/2^8%256,b%256,32-l; else {s(b,n-1); s(n,e)}} '"/,\"$c\",/"' {s($6,$8)}' GeoIPCountryWhois.csv
echo -e "};\n"
done) > GeoIP.acl
rm -f GeoIPCountryWhois.csv
exit 0
Do note, however, that I personally prefer the previous grep method as it is much faster than these two scripts because it initially reformats the data within the CSV file into something that allows for fast regex pattern matching on the country field (by moving this field to the beginning of each line) allowing awk to take care of the more complicated task of IP range splitting that operates on the begin (2nd) and end (3rd) integer IP fields.
Geo-aware BIND for the IPv6 network? Without patching? Absolutely!
Over the last decade, IPv6 has become more and more mainstream. More recently, as of the 3rd of February 2011, IANA allocated the last remaining 5 IPv4 /8 blocks to each RIR, thus completely exhausting the IANA pool, meaning there is now no further free IPv4 address space available for allocation. Due to this, I predict demand for adoption of IPv6 is now likely to rise over the coming years.
As much as I have not yet seen any requirement for geo-aware DNS serving on the IPv6 network, I would imagine this will gradually become needed as services begin to migrate away from IPv4 to IPv6. BIND already handles IPv6 addresses within its ACLs so I have published further scripts below that allow the creation of a GeoIPv6.acl include file containing IPv6 net/mask entries, using the freely downloadable GeoIPv6 CSV file available from MaxMind.
It was a challenge to come up with a working solution using the same principles as in the above scripts, but across a much larger address space. This is because IPv6 uses a 128 bit address space, compared to IPv4 being only 32 bits. The scripts above get away with using simple BASH utilities such as awk for doing the necessary IP range splitting with 32 bits but, as I found out, awk is unable to handle numbers which are up in the realms of 64 bits and beyond. So I've had to pull various different Linux utilities into play here to achieve this.
In order to handle large numbers up to and beyond 64 bits in magnitude, one has to look at other programming languages and the libraries they offer. After evaluating today's available languages like Python (which handles large numbers out the box) and PHP (which can only handle large numbers with an additional library installed), I decided to go with Perl. Perl has, on most standard installs, the bignum library that is available and ready to go. This library is transparent and as soon as it is included into a script, all number processing will automatically use it. It has all the necessary operations like bitwise AND that the above scripts make use of. However, when writing the Perl script below, I ran into an inconsistency with the log function whilst using the bignum library and, for anything above 64 bits, bignum also exhibits major rounding anomalies. To avoid this curveball, I decided to bring the common Linux arbitrary precision calculator bc into play to take over both of these roles. Together, Perl and bc offer the accuracy and speed required to split decimal IP ranges with magnitudes of 64 bits and beyond.
So, here are the scripts. The first script is, as before, a standard BASH script (called GeoIPv6.sh). It is much the same as before but rather than piping the filtered grep lines to awk, it pipes them to a newly created Perl script instead. It also contains some further adjustments at the top to download the latest GeoIPv6 CSV file from MaxMind's servers, as well as an optional pipe of the Perl script output to sed to abbreviate IPv6 addresses to their "double-colon (::) notation" equivalent.
#!/bin/bash
[ -f GeoIPv6.csv.gz ] || wget -T 5 -t 1 https://geolite.maxmind.com/download/geoip/database/GeoIPv6.csv.gz
echo -n "Creating CBE (Country,Begin,End) CSV file..."
gunzip -c GeoIPv6.csv.gz | awk -F \" '{print $10","$6","$8}' > cbe.csv
echo -e "DONE\nGenerating BIND GeoIPv6.acl file..."
(for c in $(awk -F , '{print $1}' cbe.csv | sort -u)
do
echo "$c" >&2
echo "acl ${c}v6 {"
grep "^$c," cbe.csv | ./GeoIPv6.pl | sed 's \(:0\)\+/ ::/ '
echo -e "};\n"
done) > GeoIPv6.acl
rm -f cbe.csv
echo "DONE"
exit 0
The Perl script I have called GeoIPv6.pl, with the following contents:
#!/usr/bin/perl
use strict;
use bignum;
use IPC::Open2; open2(*BCOUT,*BCIN,'bc -l');
sub rs {
my ($b,$e) = @_;
print BCIN "scale=40; l($e-$b+1)/l(2)\n";
my ($l) = split('\.',<BCOUT>);
my $m = 2**128-2**$l;
my $n = $m & $e;
if ($n == ($m & $b)) {
my @x; for (my $p = 112; $p > 0; $p -= 16) {
print BCIN "scale=0; $b/2^$p\n";
push(@x,<BCOUT>%65536);
}
printf "\t%x:%x:%x:%x:%x:%x:%x:%x/%u;\n",$x[0],$x[1],$x[2],$x[3],$x[4],$x[5],$x[6],$b%65536,128-$l;
} else {
rs($b,$n-1); rs($n,$e);
}
}
while (<STDIN>) {chomp($_); my ($c,$b,$e) = split(',',$_); rs($b,$e)}
This Perl script effectively reads from standard input in precisely the same way as the original awk script does (expecting each line to be in the format of a CBE (Country,Begin,End) CSV file) but, unlike awk, can perform IP range splitting on 128 bit decimal numbers, printing IPv6 net/mask entries to standard output. Note the use of a dual pipe to the Linux arbitrary precision calculator bc to manage the logarithmic division calculation and also to accurately truncate values before they are passed to the printf function (done by a small for loop that places these entries into an array). Most importantly, note that we must increase the default scale of 20 within bc to at least 40 to be able to accurately cope with the logarithmic division calculation. Observe:
$ echo 'l(2^128-1)/l(2)' | bc -l
128.00000000000000000132
$ echo 'scale=20; l(2^128-1)/l(2)' | bc -l
128.00000000000000000132
$ echo 'scale=39; l(2^128-1)/l(2)' | bc -l
128.000000000000000000000000000000000000088
$ echo 'scale=40; l(2^128-1)/l(2)' | bc -l
127.9999999999999999999999999999999999999956
The reason we also choose to open a dual pipe to bc within Perl is to stop the forking of a separate bc process each time we need to perform a division calculation (forking a new process is costly in terms of CPU time). By opening up a dual pipe to a single persistent bc process, we can simply throw and retrieve each calculation into and out off it quickly. The IPC::Open2 Perl module is required to do dual pipes and this may need to be installed on your system.
Once these two scripts have been created, it will be possible to run ./GeoIPv6.sh to generate the GeoIPv6.acl include file for BIND. Note that the execution time here will be far greater than before, since we are using Perl with bignum support, and passing division calculations to a separate persistent bc process. As such, the BASH script has been modified to output the current country code being processed to standard error to indicate progress. Once the script has completed execution, the GeoIPv6.acl include file will have been created in the current working directory, which looks something like this:
acl ADv6 {
2001:4df8::/32;
};
acl AEv6 {
2001:8f8::/32;
2a00:d30::/32;
2a00:f28::/32;
};
acl AMv6 {
2001:1bb0::/32;
2001:4d00::/32;
2a00:f38::/32;
2a00:1290::/32;
2a00:1500::/32;
2a02:d18::/32;
...
acl GBv6 {
2001:630::/32;
2001:678:4::/47;
2001:67c:18::/48;
2001:67c:90::/48;
2001:67c:b4::/48;
2001:67c:c0::/48;
2001:67c:d4::/48;
2001:6f8::/32;
2001:710::/32;
2001:768::/32;
...
2a02:ce8::/32;
2a02:da0::/32;
2a02:df8::/32;
2a02:e38::/32;
2a02:e68::/32;
2a02:eb0::/32;
2a02:ef8::/32;
2a02:f70::/32;
2a02:fb0::/32;
2a02:fb8::/32;
};
...
2001:43d8::/32;
2001:43f8:20::/48;
2001:43f8:30::/48;
2001:43f8:40::/48;
2001:43f8:50::/48;
2001:43f8:70::/45;
2001:43f8:90::/48;
2001:43f8:a0::/48;
2001:43f8:d0::/48;
};
acl ZWv6 {
2001:42b0::/32;
};
Faster Python Implementation of the above Perl Script
I have recently been learning Python at my current place of employment. After much procrastination, below is my Python implemention of my original Perl script above. When used within the BASH script, execution time to generate the GeoIPv6.acl include file is reduced to about one quarter that of when the Perl script is used. This is most notably because it is a self-contained script that does not depend on making external calls to the common Linux arbitrary precision calculator bc (an external process) as it utilises the mag function from the additional Python library mpmath to determine the magnitude of the IPv6 ranges it is potentially having to split.
#!/usr/bin/python
from sys import stdin
from mpmath import mag
def s(b,e):
l = mag(e-b+1)-1
m = 2**128-2**l
n = m & e
if n == m & b:
print '\t%x:%x:%x:%x:%x:%x:%x:%x/%u;' % tuple([b/2**p%65536 for p in xrange(112,-1,-16)]+[128-l])
else:
s(b,n-1)
s(n,e)
for r in (map(int,l.split(',')[1:3]) for l in stdin): s(*r)
To use this Python script instead, just place the above inside an executable file called GeoIPv6.py and then change the line:
grep "^$c," cbe.csv | ./GeoIPv6.pl | sed 's \(:0\)\+/ ::/ '
in GeoIPv6.sh to:
grep "^$c," cbe.csv | ./GeoIPv6.py | sed 's \(:0\)\+/ ::/ '
Performance versus Maintainability (pros/cons for/against this ACL method compared to BIND source code patching)
John 'Warthog9' Hawley, the chief administrator of www.kernel.org (a high-traffic site which implemented BIND GeoDNS on the 19th of September 2008 via patching), recently contacted me about this HOWTO with some interesting points concerning the implications of using this ACL method over BIND source code patching. I will briefly discuss this here, as it will affect which route you take when implementing GeoDNS within BIND.
In a nutshell, patching BIND for GeoDNS support results in a DNS server that can answer queries at an extremely rapid rate compared with this ACL method (I have confirmed this; it is quite easy to test; see below). This is because the MaxMind binary database is a binary search tree data structure, and so the worst case maximum number of lookups required to determine the country location of an IPv4 address will be 32 iterations (and most times, far less than this). Similarly, for their IPv6 binary database, this number changes to 128 iterations. As you can imagine, patching the MaxMind GeoIP C library directly into BIND to achieve GeoDNS will result in a server which is able to process, lookup and answer DNS queries with very few CPU cycles. As such, if your DNS servers are high-traffic servers, responding to many DNS requests per second, it would be advisable to go with the source code patching route.
Alternatively, if maintainability is of more importance to you, the ACL method described in this HOWTO is still a viable option, but with the consequence of a substantial performance hit. According to John (who has been chatting with Paul Vixie, the primary author and architect of BIND until release 8), the ACL feature was never designed with the intention to store and hold the number of ACL entries that the above scripts generate, for GeoDNS purposes. This I can believe, as the scripts above (for IPv4) produce an ACL definition file containing over 200,000 ACL entries, which BIND has to load and subsequently store in its memory once launched. I am not fully aware of the data structures used within BIND to store ACLs, but they will be far less efficient than the simple binary search tree that MaxMind offer with their binary GeoIP databases. It is for this reason that the ACL method described in this HOWTO will result in a far slower DNS server, depending on how many views you create and the ACLs assigned to them.
To give you an idea of just how much of a performance hit this ACL method induces, I have a small low-power server on my network running a CentaurHauls VIA Nehemiah CPU @ 1 GHz (2000 BogoMips) with a 192.168.0.0/16 IP address (see RFC 1918; all other hosts on my LAN are in this network so none of them would be a match in any of the above ACLs). When loading BIND with the GeoIP.acl include file, and creating a catch-all view that matches any client (not using any of the ACLs in the GeoIP.acl include file), the DNS response time tends to be about 2 ms. If, however, another view is created before this catch-all one in named.conf, and the clause:
match-clients { A1; A2; AD; AE; AF; AG; AI; AL; AM; AN; ... VI; VN; VU; WF; WS; YE; YT; ZA; ZM; ZW; };
is added to this view (forcing it to attempt a match across every single ACL definition inside the GeoIP.acl file), the response time sores to around 85 ms. In other words, the amount of work that we have now asked BIND to do, in order for it to verify if any of the ACLs are a match for a client with IP address in 192.168.0.0/16, has resulted in it slowing down by a factor of 40 (a rough guestimate figure only) which is a substantial performance hit that needs to be considered. For this reason, if using the ACL method described in this HOWTO, try and limit the number of views you create and the number of ACLs assigned to them as this will lower the amount of work BIND has to do when answering DNS queries made to it.
In short, you should determine if speed (source code patching) or maintainability (ACL include file) is of more importance to you and be fully aware of the pros and cons of each method of GeoDNS implementation within BIND. As a systems administrator, use your head to decide which method to go with. As www.kernel.org is a global site, ranked around 10,000 across all sites on the internet (according to Alexa), John has done the right thing and gone with the patching method when deploying BIND GeoDNS servers for Kernel.org.