Dockerizing a whole physical Linux server

Reading time ~6 minutes

Docker is usually used in microservice architectures because containers are lightweight (compared to VMs at least), easy to configure, communicate with each other efficiently, and can be deployed very quickly. However Docker can be perfectly used if you want to Dockerize a full physical/VPS server into one single container. Let me show you how and why.

Context

I recently had to work on a project developed by people who had left the company before I arrive. I never had the opportunity to meet those guys and contacting them was not an option either. Unfortunately most of the project lacked a documentation. And in addition to all this, only some parts of the project were managed in a VCS (Git here). Of course there was no dev or staging server: everything was located on a production server… Start seeing the problem now?

It was a web scraping project doing quite a lot of complex things. Technical stack of the prod server was more or less the following:

  • Ubuntu 16.04
  • Nginx
  • Postgresql 9.3
  • Python/Django
  • Python virtualenvs
  • Gunicorn
  • Celery
  • RabbitMQ
  • Scrapy/Scrapyd

First attempt: failed

I tried hard reverse engineering this production server. My ultimate goal was to isolate each application, Dockerize it, and make these containers communicate with each other.

But I failed!

I successfully Dockerized Nginx, the Django app, and the Celery asynchronous tasks management. But I was still struggling with Scrapy and Scrapyd. I think it was mainly due to the fact that changes had been made directly to the Scrapy and Scrapyd source files by the former developers (that’s to say in the python/site-package directory itself!) without any documentation. In addition to that, some of the Python libraries used at the time were very specific Python libs which are not available today anymore, or not in the correct version (you can forget about pip freeze and pip install -r requirements.txt here).

Second attempt: failed

I eventually gave up building a microservice system based on the production server. But still I had to secure the existing production server before it experiences troubles. Database was backed up then but nothing else on the server was backed up.

I thought about making a snapshot of the whole server by using a tool like CloneZilla or a simple rsync command like this one. But a mere backup would not allow me to work easily on new features of the project.

So I thought about converting the physical server to a VMware virtual machine by using their VMware vCenter Converter but the VMware download link was broken and so few people talked about this tool on Internet that I got scared and gave up.

Last of all I tried this Dockerization solution based on Blueprint but could not make it work and Blueprint seemed to be a discontinued project.

Third attempt: successful

Actually the solution was pretty simple: I decided to Dockerize the whole prod server by myself - except the Postgresql data - so I have a backup of the server and I can commit new features to this Docker container whenever I want without being afraid of breaking the server forever. Here is how I did it:

1. Install and setup Docker on the server

  1. Install Docker following this guide.
  2. Login to your Docker Hub account: docker login
  3. Create a docker network (if needed): docker network create --subnet=172.20.0.0/16 my_network

2. Create a Docker image of your server

Go to the root of the server:

cd /

Create the following Dockerfile file based on Ubuntu 16.04 LTS (without the part dedicated to Nginx and Rabbitmq of course):

FROM ubuntu:xenial

# Copy the whole system except what is specified in .dockerignore
COPY / /

# Reinstall nginx and rabbitmq because of permissions issues in Docker
RUN apt remove -y nginx
RUN apt install -y nginx
RUN apt remove -y rabbitmq-server
RUN apt install -y rabbitmq-server

# Launch all services
COPY startup.sh /
RUN chmod 777 /startup.sh
CMD ["bash","/startup.sh"]

Create a .dockerignore file that will mention all the files or folders you want to exclude from the COPY above. This is where you need to use your intuition. Remove as many files as possible so the Docker image’s size is not too big, but do not exclude files that are vital to you application. Here is my example that you should customize based on your own server:

# Remove folders mentioned here:
# https://wiki.archlinux.org/index.php/Rsync#As_a_backup_utility
/dev 
/proc
/sys
/tmp
/run
/mnt
/media
/lost+found

# Remove database's data
/var/lib/postgresql

# Remove useless heavy files like /var/lib/scrapyd/reports.old
**/*.old
**/*.log
**/*.bak

# Remove docker
/var/lib/lxcfs
/var/lib/docker
/etc/docker
/root/.docker
/etc/init/docker.conf

# Remove the current program
/.dockerignore
/Dockerfile

Create a startup.sh script in order to launch all the services and set database connection redirection. This is my script here but yours will be totally different of course:

# Redirect all traffic from 127.0.0.1:5432 to 172.20.0.1:5432
# so any connection to Postgresql keeps working without any other modification.
# Requires the --privileged flag when creating container:
sysctl -w net.ipv4.conf.all.route_localnet=1
iptables -t nat -A OUTPUT -p tcp -s 127.0.0.1 --dport 5432 -j DNAT --to-destination 172.20.0.1:5432
iptables -t nat -A POSTROUTING -j MASQUERADE

# Start RabbitMQ.
rabbitmq-server -detached

# Start Nginx.
service nginx start

# Start Scrapyd
/root/.virtualenvs/my_project_2/bin/python /root/.virtualenvs/my_project_2/bin/scrapyd >> /var/log/scrapyd/scrapyd.log 2>&1 &

# Use Python virtualenvwrapper
source /root/.profile

# Start virtualenv and start Django/Gunicorn
workon my_project_1
cd /home/my_project_1
export DJANGO_SETTINGS_MODULE='my_project_1.settings.prod'
gunicorn -c my_project_1/gunicorn.py -p /tmp/gunicorn.pid my_project_1.wsgi &

# Start Celery
export C_FORCE_ROOT=True
celery -A my_project_1 beat &
celery -A my_project_1 worker -l info -Q queue1,queue2 -P gevent -c 1000 &

# Little hack to keep the container running in foreground
tail -f /dev/null

As you can see, I’m using iptables redirections so that all connections to the Postgresql database (port 5432) keep working without any additional change in the configuration files. Indeed my database was initially located on localhost, but it is now located on the Docker host whose ip is 172.20.0.1 (I moved everything to the Docker container except database). Redirections at the kernel level are pretty convenient when you don’t know where all the configuration files are located, and this is necessary when you cannot modify those config files (like in a compiled application you don’t have the source code of).

Now launch the image creation and wait… In my case, the image was about 3Go and was created in about 5 minutes. Make sure you have enough free space on your server before launching this command:

docker build -t your_repo/your_project:your_tag .

If you’ve got no error here, congrats you’ve done the hardest part! Now test you image and see if everything works fine. If not, then you need to adapt one of the 3 files above.

3. Save the newly created Docker image

Just push the new image to Docker Hub:

docker push your_repo/your_project:your_tag

4. Add features to your server image

Now, if you need to work on this server image, you can do the following:

  1. launch a container based on this image with docker run (do not forget to specify network name, ip address, port forwarding, and add the --privileged flag in order for the sysctl command to work in startup.sh)
  2. work in the container
  3. commit changes in the container to a new Docker image with docker commit
  4. push the new image to Docker Hub with docker push and deploy it to staging or production

Conclusion

This solution literally saved my life and is a proof that Docker is great not only for microservice architecture but for whole server containerization as well. Dockerizing a whole server can be a perfect option if you need to secure an existing prod server like mine here with no documentation, no GitHub repo, no initial developers…

The first image creation can be pretty big, but then every commit should not be that heavy thanks to the Docker layer architecture.

Is it a hack? Maybe it is, but it works like a charm!

I would love to here other devs’ opinion on this.

Existe aussi en français

API Rate Limiting With Traefik, Docker, Go, and Caching

Limiting API usage based on advanced rate limiting rule is not so easy. In order to achieve this behind the NLP Cloud API, we're using a combination of Docker, Traefik (as a reverse proxy) and local caching within a Go script. When done correctly, you can considerably improve the performance of your rate limiting and properly throttle API requests without sacrificing speed of the requests. Continue reading