One of the most desirable features of MongoDB is its ability to automatically recover from a failed node. Setting this up with AWS instances can raise challenges, depending on how you're addressing your machines. We see many examples online of people using IP addresses in their configurations, like this:
Stick that in your .bashrc or similar file, and then "meta" will list all parameters, and e.g. "meta local-ipv4" will give you your local ip.
The first thing you want to set up is your DNS server. Of course you must have redundancy, with at least a primary and a backup, spread across different regions. For the purposes of this demonstration, however, I just had a single DNS server. Obviously, do not do this in production!
Here are the significant changes when setting up your bind server:
1. In the {{options}} block, ensure your "listen-on" block will work for all servers who need DNS. In my case, I set it to "any". You may need to do something more restrictive.
2. {{allow-recursion}} also needs to be set for the same servers.
3. As in Ducea's article, add your keys and your key blocks. You'll need to copy these keys to your instance, but be careful with leaving copies around, and ensure the keys' permissions are as strict as possible.
4. Add blocks for your internal and external zones. You can probably get by with just an internal zone, but you may find it more convenient to have both zones.
5. Each of the zones will need to reference the key.
6. The purpose of the "control" block in there is so that I can use rndc to control my named server. Although not technically necessary, it becomes useful if you need to edit your zone files after any dynamic updates have been made. You'll need to run "rndc freeze" before, and "rndc thaw", otherwise the hanging .jnl files will make named complain. You'll also need to add an /etc/rndc.conf file, which has at a minimum a "key" and an "options" block. See here for details on setting it up on CentOS systems.
/etc/named.conf:
-------
In this example, I chose an obviously not public domain name to avoid confusion. A feature not in this example, but you will need, are allow-transfer lines for dealing with communication with your other nameserver(s). Also note, that in this case I used the same keys for both dynamic updates from the other hosts and for rdnc updates. You may decide to do otherwise. Finally, you may want to add reverse-lookup zones as well.
External Zone File: palomino.mongo.zone
Internal Zone File: int.palomino.mongo.zone
You don't need to put any other hosts in here but your DNS server(s). All the other hosts will get in there dynamically. The local ip of this DNS server is what you'll need to bootstrap your DNS when you start your other hosts. Start your DNS server.
Next, for each mongod host, I used a script similar to Ducea's, with the exception that it will take the local ip of the DNS server in addition to the hostname. Also, I insert the DNS server's IP into the /etc/resolv.conf. This is probably a bit of a hack. I don't doubt there's a more efficient way to do this. In this example, I am setting the user-data of the ec2 instance to "host:DNS-local-ip", so, for example --user-data "ar1:10.1.2.3". Hopefully this will make more sense later.
update_dns.sh:
This file can be anywhere, I happened to put it in /etc/named, where I also put the keys. This will be run by root, and contains secrets, so you should chmod 700 those three files.
The Mongo instances:
I used the excellent guide on the MongoDB site to set up a single instance. Go through many of the steps the entire process, including Launching an Instance, Configure Storage, and up to Install and Configure MongoDB. Now, for this example, we are only setting up a ReplicaSet. Sharding will be done at another time. The main thing we want to do here is to be able to create a generic image that you can kick off and automatically set its DNS and connect to the rest of the Replica Set, with as little human interaction as possible.
As in the example given, in /etc/mongod.conf, the only changes I made were:
So that when this mongod starts up, it will already be configured to be a member of rs_a. Some other changes that can be made here would be to generate these changes dynamically from user-data sent to the instance, similar to the dns update script above, but for this case, I left it hard-coded.
You can go ahead and start up mongod, connect to it and ensure it functions. The final piece of the puzzle is to ensure our dns update script runs when the host starts up. At first, I had it running as part of /etc/rc.local, as in Ducea's page. However, I soon found out that means that DNS would then get updated _after_ mongod has already started, and then mongod couldn't find any of the new hosts.
So, to ensure that dns is updated before mongod starts, I added these lines to /etc/init.d/mongod, right at the beginning of the start() function:
At this point, we are almost ready to save our image. If you did a test and ran mongod, your /data directory will already have journal and/or other files in there, which might have data in them we don't necessarily want on startup. So, stop mongod and remove all your data:
Now your instance is ready to make an image. You can stop it now (otherwise it would reboot when you create the image). All you need now is the id of your instance. Create your image:
Once you get an image id from that, we can very easily set up our replica set. All we need to bootstrap this is the local IP of your DNS server. For this example, my hostnames will be ar1.palomino.mongo, ar2.palomino.mongo, and ar3.palomino.mongo (public), and ar1.int.palomino.mongo, ar2.int.palomino.mongo, and ar3.int.palomino.mongo . In this example, my DNS server's local IP address is 10.160.121.203
This will launch 3 instances of the mongod server, which are expecting to be in a replset named "rs_a" (from the setting we added to /etc/mongod.conf above). When they start up, right before mongod starts, they'll set the hostnames provided, connect to the DNS server and set their public and private IPs.
Now if you use the nameserver that you set up, you should be able to connect with one of the hostnames, for example ar1.palomino.mongo
mongo ar1.palomino.mongo
set up your replica set:
--------
Now, to demonstrate our automatic new host configuration, let's completely take out one of our hosts and replace it with a brand new instance:
Now, just spin up another instance that will replace ar1:
> ec2-run-instances ami-cda2fa88 -n 1 -k engineering-west -d ar1:10.160.121.203 -t m1.large -z us-west-1b -g mm-database
With no further action, your instance will acquire the new hostname, get polled by the new primary ar2, and become part of the Replica Set again:
Now, this exercise had a lot missing from it: most notably multiple DNS servers for redundancy, and sharding. But I hope that this was helpful in seeing how you can use dns servers to make failover of a mongodb host a little closer to automatic. I've run this scenario several times, and I did have some issues the first time I was setting up the replica set on the cluster. Once that got going, though, the failover and promotion of the brand new host seemed to work fine. I welcome your comments.
cfg ={ _id :"rs_a", members :[{_id :0, host :"10.2.3.4", priority :1},{_id :1, host :"10.2.3.5", priority :1},{_id :2, host :"10.2.3.6", priority :0}]}
Which is fine for giving examples, or maybe creating a single cluster that you control the hardware. But in the ephemeral world we live in with virtual instances where IPs change constantly, this is just not feasible. You need to use hostnames, and have DNS set up. Well, that's easier said than done, and so I've tried to come up with a proof of concept of automatically assigning hostnames for new instances in AWS. I hope that some of my experiences will be valuable in setting up your MongoDB clusters.
Here is the scenario: a replication set with three nodes set up in AWS using ec2 instances. One of the nodes goes down. You want to be able to bring that same node back - or a replacement node - with a minimum of configuration changes. To make this happen, you need to prepare for this event beforehand. All of this is done on Amazon's default OS, which is based on CentOS.
The MongoDB pages on setting up instances in AWS here and here
are very valuable in setting up most of it, however the details of how to set up hostnames could use some elaboration. For one thing, it doesn't really help to use the public domain addresses into your zone files - if you bring up a new instance, you'll get a new public domain name, and you'll need to go back into your zone file and update it. I wanted to minimize the reconfiguration necessary when bringing a new instance up to replace one that went down.
My next reference is Marius Ducea's very helpful article How To update DNS hostnames automatically for your Amazon EC2 instances. One of the great things I discovered here was the really useful tool Amazon provides to get info about the instance that you're on. Basically, curl http://169.254.169.254/latest/meta-data/ to get a list of available parameters, and then add that parameter on to the end of the url to get its value. I made a little bash function that I used often while going through this:
meta (){ curl http://169.254.169.254/latest/meta-data/$1echo}
//// named.conf//// Provided by Red Hat bind package to configure the ISC BIND named(8) DNS// server as a caching only nameserver (as a localhost DNS resolver only).//// See /usr/share/doc/bind*/sample/ for example named configuration files.// options {// listen-on port 53 { 127.0.0.1; }; listen-on port 53{ any;}; listen-on-v6 port 53{::1;}; directory "/var/named"; dump-file "/var/named/data/cache_dump.db"; statistics-file "/var/named/data/named_stats.txt"; memstatistics-file "/var/named/data/named_mem_stats.txt"; allow-query { localhost;}; allow-recursion { any;}; recursion yes; dnssec-enable yes; dnssec-validation yes; dnssec-lookaside auto;/* Path to ISC DLV key */ bindkeys-file "/etc/named.iscdlv.key";}; controls { inet 127.0.0.1 allow { localhost;} keys { palomino.mongo.;};}; key palomino.mongo. { algorithm HMAC-MD5; secret "SEcreT FRom Your Generated Key file";}; logging { channel default_debug { file "data/named.run"; severity dynamic;};}; zone "."IN{ type hint; file "named.ca";}; include "/etc/named.rfc1912.zones"; zone "int.palomino.mongo"IN{ type master; file "int.palomino.mongo.zone"; allow-update { key palomino.mongo.;}; allow-query { any;};// this should be local network only//allow-transfer { 10.160.34.184; };}; zone "palomino.mongo"IN{ type master; file "palomino.mongo.zone"; allow-update { key palomino.mongo.;}; allow-query { any;};//allow-transfer { 10.160.34.184; };};
$ORIGIN . $TTL 86400;1 day palomino.mongoIN SOA dns1.palomino.mongo. mgross.palomino.mongo. (2012032919; serial 21600; refresh (6 hours)3600; retry (1 hour)604800; expire (1 week)86400; minimum (1 day)) NS dns1.palomino.mongo. A local.ip.of.this.instance $ORIGIN palomino.mongo. $TTL 60;1 minute dns1 A public.ip.of.this.instance
$ORIGIN . $TTL 86400;1 day int.palomino.mongoIN SOA dns1.palomino.mongo. mgross.palomino.mongo. (2012032917; serial 21600; refresh (6 hours)3600; retry (1 hour)604800; expire (1 week)86400; minimum (1 day)) NS dns1.palomino.mongo. A local.ip.of.this.instance $ORIGIN int.palomino.mongo. $TTL 60;1 minute dns1 A local.ip.of.this.instance
#!/bin/bashUSER_DATA=`/usr/bin/curl -s http://169.254.169.254/latest/user-data`HOSTNAME=`echo$USER_DATA|cut-d : -f1`DNS_IP=`echo$USER_DATA|cut-d : -f2`line="nameserver $DNS_IP"# this should skip if there is no DNS_IP sentif!grep"$line"/etc/resolv.conf 1>/dev/null thensed-i""-e"/nameserver/i$line"/etc/resolv.conf fiDNS_KEY=/etc/named/Kpalomino.mongo.+157+29687.private DOMAIN=palomino.mongo #set also the hostname to the running instancehostname$HOSTNAME.$DOMAINPUBIP=`/usr/bin/curl -s http://169.254.169.254/latest/meta-data/public-ipv4`cat<<EOF |/usr/bin/nsupdate -k$DNS_KEY-v server dns1.int.$DOMAIN zone $DOMAIN update delete $HOSTNAME.$DOMAIN A update add $HOSTNAME.$DOMAIN60 A $PUBIP send EOF LOCIP=`/usr/bin/curl -s http://169.254.169.254/latest/meta-data/local-ipv4`cat<<EOF |/usr/bin/nsupdate -k$DNS_KEY-v server dns1.int.$DOMAIN zone int.$DOMAIN update delete $HOSTNAME.int.$DOMAIN A update add $HOSTNAME.int.$DOMAIN60 A $LOCIP send EOF
fork =true dbpath=/data replSet = rs_a
start(){echo"Updating dns: "/etc/named/update_dns.sh echo-n $"Starting mongod: "
service mongod stop cd/data rm-rf*
ec2-create-image <instance-id>--name mongod-replset-server --description"Autoboot image of mongod server"
[mgross@dev ~]$ for i in123>do> ec2-run-instances <ami-image-id>-n1-k<your-keypair-name>-dar$i:10.160.121.203 -t m1.large -z<your-availability-zone>-g<your-security-group-id>>done
> rs.config()null> cfg ={ ... _id :"rs_a", ... members:[ ... {_id :0, host :"ar1.int.palomino.mongo", priority :1}, ... {_id :1, host :"ar2.int.palomino.mongo", priority :1}, ... {_id :2, host :"ar3.int.palomino.mongo", priority :0} ... ] ... }{"_id":"rs_a","members":[{"_id":0,"host":"ar1.int.palomino.mongo","priority":1},{"_id":1,"host":"ar2.int.palomino.mongo","priority":1},{"_id":2,"host":"ar3.int.palomino.mongo","priority":0}]}> rs.initiate(cfg);{"info":"Config now saved locally. Should come online in about a minute.","ok":1}> rs.status();{"set":"rs_a","date": ISODate("2012-03-30T04:20:49Z"),"myState":2,"members":[{"_id":0,"name":"ar1.int.palomino.mongo:27017","health":1,"state":2,"stateStr":"SECONDARY","optime":{"t":1333081239000,"i":1},"optimeDate": ISODate("2012-03-30T04:20:39Z"),"self":true},{"_id":1,"name":"ar2.int.palomino.mongo:27017","health":1,"state":6,"stateStr":"UNKNOWN","uptime":2,"optime":{"t":0,"i":0},"optimeDate": ISODate("1970-01-01T00:00:00Z"),"lastHeartbeat": ISODate("2012-03-30T04:20:49Z"),"pingMs":0,"errmsg":"still initializing"},{"_id":2,"name":"ar3.int.palomino.mongo:27017","health":1,"state":5,"stateStr":"STARTUP2","uptime":4,"optime":{"t":0,"i":0},"optimeDate": ISODate("1970-01-01T00:00:00Z"),"lastHeartbeat": ISODate("2012-03-30T04:20:49Z"),"pingMs":0,"errmsg":"initial sync need a member to be primary or secondary to do our initial sync"}],"ok":1} PRIMARY> rs.status();{"set":"rs_a","date": ISODate("2012-03-30T04:21:46Z"),"myState":1,"members":[{"_id":0,"name":"ar1.int.palomino.mongo:27017","health":1,"state":1,"stateStr":"PRIMARY","optime":{"t":1333081239000,"i":1},"optimeDate": ISODate("2012-03-30T04:20:39Z"),"self":true},{"_id":1,"name":"ar2.int.palomino.mongo:27017","health":1,"state":2,"stateStr":"SECONDARY","uptime":59,"optime":{"t":1333081239000,"i":1},"optimeDate": ISODate("2012-03-30T04:20:39Z"),"lastHeartbeat": ISODate("2012-03-30T04:21:45Z"),"pingMs":0},{"_id":2,"name":"ar3.int.palomino.mongo:27017","health":1,"state":2,"stateStr":"SECONDARY","uptime":61,"optime":{"t":1333081239000,"i":1},"optimeDate": ISODate("2012-03-30T04:20:39Z"),"lastHeartbeat": ISODate("2012-03-30T04:21:45Z"),"pingMs":0}],"ok":1} PRIMARY>
[root@ar1 ~]# shutdown -h now We can see that ar2 has been elected primary:[ec2-user@ar2 ~]$ mongo MongoDB shell version: 2.0.4 connecting to: test PRIMARY> rs.status(){"set":"rs_a","date": ISODate("2012-03-30T04:25:24Z"),"myState":1,"syncingTo":"ar1.int.palomino.mongo:27017","members":[{"_id":0,"name":"ar1.int.palomino.mongo:27017","health":0,"state":8,"stateStr":"(not reachable/healthy)","uptime":0,"optime":{"t":1333081239000,"i":1},"optimeDate": ISODate("2012-03-30T04:20:39Z"),"lastHeartbeat": ISODate("2012-03-30T04:23:38Z"),"pingMs":0,"errmsg":"socket exception"},{"_id":1,"name":"ar2.int.palomino.mongo:27017","health":1,"state":1,"stateStr":"PRIMARY","optime":{"t":1333081239000,"i":1},"optimeDate": ISODate("2012-03-30T04:20:39Z"),"self":true},{"_id":2,"name":"ar3.int.palomino.mongo:27017","health":1,"state":2,"stateStr":"SECONDARY","uptime":267,"optime":{"t":1333081239000,"i":1},"optimeDate": ISODate("2012-03-30T04:20:39Z"),"lastHeartbeat": ISODate("2012-03-30T04:25:24Z"),"pingMs":14}],"ok":1}
PRIMARY> rs.status(){"set":"rs_a","date": ISODate("2012-03-30T04:34:09Z"),"myState":1,"syncingTo":"ar1.int.palomino.mongo:27017","members":[{"_id":0,"name":"ar1.int.palomino.mongo:27017","health":1,"state":2,"stateStr":"SECONDARY","uptime":77,"optime":{"t":1333081239000,"i":1},"optimeDate": ISODate("2012-03-30T04:20:39Z"),"lastHeartbeat": ISODate("2012-03-30T04:34:08Z"),"pingMs":0},{"_id":1,"name":"ar2.int.palomino.mongo:27017","health":1,"state":1,"stateStr":"PRIMARY","optime":{"t":1333081239000,"i":1},"optimeDate": ISODate("2012-03-30T04:20:39Z"),"self":true},{"_id":2,"name":"ar3.int.palomino.mongo:27017","health":1,"state":2,"stateStr":"SECONDARY","uptime":792,"optime":{"t":1333081239000,"i":1},"optimeDate": ISODate("2012-03-30T04:20:39Z"),"lastHeartbeat": ISODate("2012-03-30T04:34:08Z"),"pingMs":19}],"ok":1} PRIMARY>