Friday, December 26, 2014

Orchestating Docker with dockerized Mesos/Marathon (Consul?)

There are a lot of different solutions on the market to manage a cluster which supports docker, among them, apache mesos is one of the more matured and popular choice.  and on top of mesos, there're several framework choices too, marathon appears to be a very common choice.  Also Google kubernetes,... also worth looking into too.

Apache Mesos

To build an Mesos cluster, we'll need to start multiple Mesos masters which control the cluster and deploy job to the slaves.  Mesos recommended 3 masters and a quorum of 2 (total number of masters/2 + 1).  We'll also setup zookeeper for Mesos to do leader election (mesos master/slave need to connect to zk to know the current leader as slave only take requests from current leader. if connection is lost, slave would not response to any requests)  Zk recommend 5 nodes with a quorum of 3.  Marathon (a mesos framework) can also utilize zk for leader election.  So, we'll be installing 5 zk/mesos/marathon on 5 VMs.  (we probably can start a few mesos masters and marathon less)
Instead of installing all those components on the VM directly, we'll be using docker.  The host O/S would be CentOS in this example.

CentOS

a bit useful/simple instructions
Changing hostname
sudo hostnamectl set-hostname <hostname.domainname>
sudo hostnamectl status

Docker

To install docker, 
sudo yum install docker
To setup proxy for docker, add the following to /etc/sysconf/docker
HTTP_PROXY=<proxy url with protocol & port>
http_proxy=$HTTP_PROXY
HTTPS_PROXY=$HTTP_PROXY
https_proxy=$HTTP_PROXY
export HTTP_PROXY HTTPS_PROXY http_proxy https_proxy
To setup docker so you don't need sudo
sudo gpasswd -a ${USER} docker
sudo service docker restart
if you're adding current user to the docker group, you'll have to logout and login again.  
A few useful Docker command
attaching to a running container
docker attach <container id> 
docker exec -it <container id> bash
delete all containers except those are running
docker rm `docker ps -a -q`
docker ps will list out all running containers, docker ps -a will list out those exited too.  you can "restart" a continer with docker start <container id>
if the container is named (with --name in docker run), you can use the name instead of the contain id.  however, you can't have two containers with the same name, even though one is not running.  that's very useful so you won't  ended up with a lot of "unused" containers taking up spaces.
get the logs from the container
docker logs <container id>
docker logs -f <container id>
if there's log files persisted by the server, we should use a data volume instead.  

Master Machine

Zookeeper

create a simple bash shell script
#!/bin/bash
docker run -d \
 --name=zk \
 --net=host \
 -e ZOOKEEPER_ID=1 \
 -e ZOOKEEPER_SERVER_1=mesos-m1 \
 -e ZOOKEEPER_SERVER_2=mesos-m2 \
 -e ZOOKEEPER_SERVER_3=mesos-m3 \
 -e ZOOKEEPER_SERVER_4=mesos-m4 \
 -e ZOOKEEPER_SERVER_5=mesos-m5 \
 digitalwonderland/zookeeper
Zookeeper node joined in a cluster needs to have a unique id (1-255), thus, each of the zk node needs to update the shell script.  
Probably don't need to use --net=host here (it's more for mesos master, details to follow), if removing --net=host, need to add 
-p 2888:2888 \
-p 3888:3888 \
-p 2181:2181 \
to the docker run command to map the port from the host to container.

Mesos Master

create a simple bash shell script
#!/bin/bash
docker run -d \
 --name=mesos \
 --restart=always \
 --net=host \
 -e MESOS_LOG_DIR=/var/log/mesos \
 -e MESOS_ZK=zk://mesos-m1:2181,mesos-m2:2181,mesos-m3:2181,mesos-m4:2181,mesos-m5:2181/mesos \
 -e MESOS_QUORUM=3 \
 -e MESOS_WORK_DIR=/var/lib/mesos \
 -e MESOS_CLUSTER=mycluster \
 redjack/mesos-master
Using redjack's mesos-master container rather than the one from mesos as it has no documentation at all.
ZK points to all 5 zk nodes and set quorum to 3.  
Setting docker networking mode to host, so the container shares the host's tcp stack and thus, when the mesos master register itself to zk, it'll use the host's IP address.  By default, without --net=host, docker will create a network interface for the container which is not reachable from outside the host, registering the IP of the container would not work when mesos master tries to communicate to another master.  (docker will also generate a hostname which mesos will pick it up by default, but can be override by using MESOS_HOSTNAME.)  
all mesos masters can also be easily installed on the same machine.  taking out --net=host will use the default docker networking mode, each container will have its own IP address.  We'll have to link those containers together so they can resolve the ip for different container.  however, this setup is more for testing and not much use for real production.
To have a mixture of co-located masters and on different hosts, we'll have to set the port mesos is using by setting MESOS_PORT.
Setting restart to always will make sure the mesos master will keep restarting itself so in case of failure, the cluster won't fall out of quorum easily.  also, we can use on-failure=10, to limit the number of times it keep retrying.  Setting that to always will also make sure the container will be started after a server reboot, not sure about on-failure.
https://issues.apache.org/jira/browse/MESOS-2014 - it is recommended that the master to be restarted right away.
work directory and log directory should be bind mounted to a data volume so it can be persisted and backed up.

Mesosphere Marathon

create a simple bash shell script
#!/bin/bash
docker run -d \
 --restart=always \
 --name=marathon \
 --net=host \
 -e MARATHON_MASTER=zk://mesos-m1:2181,mesos-m2:2181,mesos-m3:2181,mesos-m4:2181,mesos-m5:2181/mesos \
 -e MARATHON_ZK=zk://mesos-m1:2181,mesos-m2:2181,mesos-m3:2181,mesos-m4:2181,mesos-m5:2181/marathon \
 mesosphere/marathon
MARATHON_MASTER is pointing to zk mesos' path to find the leading master.  
MARATHON_ZK is pointing to zk marathon's path to do leader election for marathon itself.
more config here

Iptables

By default, iptables is blocking all incoming traffics.
#!/bin/bash
# zookeeper
iptables -I INPUT -p tcp --dport 2888 -j ACCEPT
iptables -I INPUT -p tcp --dport 3888 -j ACCEPT
iptables -I INPUT -p tcp --dport 2181 -j ACCEPT
# mesos/marathon
iptables -I INPUT -p tcp --dport 5050 -j ACCEPT
iptables -I INPUT -p tcp --dport 8080 -j ACCEPT
service iptables save

Slave Machine

Mesos Slave

we can install mesos slave to the master machine too.  esp the non-leader master, there's actually not much going on.  
create a simple bash shell script
#!/bin/sh
docker run -d \
 --privileged=true \
 --net=host \
 --name=mesos \
 -e MESOS_LOG_DIR=/var/log \
 -e MESOS_MASTER=zk://mesos-m1:2181,mesos-m2:2181,mesos-m3:2181,mesos-m4:2181,mesos-m5:2181/mesos \
 -e MESOS_EXECUTOR_REGISTRATION_TIMEOUT=5mins \
 -e MESOS_ISOLATOR=cgroups/cpu,cgroups/mem \
 -e MESOS_CONTAINERIZERS=docker,mesos \
 -v /run/docker.sock:/run/docker.sock \
 -v /sys:/sys \
 -v /proc:/proc \
 redjack/mesos-slave
This mesos slave is set to use docker container and will use the docker installed on the host (so the mesos job will create a container on the host but not in the mesos container (which container in container is possible)).  since it's bind mounting to host's /proc, /sys & /run/docker.sock (the unix socket of docker owned by root), it needs to be privileged.  
probably don't need --net=host, without it, will need to add -p 5051:5051.
again, probably a good idea to bind mount a data volume for logs.

Iptables

#!/bin/bash
# mesos
iptables -I INPUT -p tcp --dport 5051 -j ACCEPT
service iptables save
When we deploy application to the slave, it will require to open more ports for the application.  We can either set in marathon for a port range to use and open all of them, or try to automatically add/remove a port when application deploy/undeploy.

Usage

Now we can access the Mesos interface
It'll automatically forward to the leader master.  and you should see the slaves connected and one of the marathon registered as framework
To access Marathon's console, go to 

To Be Continued.
Deploy apps!