Traefik Reverse Proxy with Docker Compose and Docker Swarm

My last article about Docker Swarm was the first of a series of articles I wanted to write about the stack used behind NLP Cloud. NLP Cloud is an API based on spaCy and HuggingFace transformers in order to propose Named Entity Recognition (NER), sentiment analysis, text classification, summarization, and much more. One challenge is that each model runs inside its own container, and new models are added to the cluster on a regular basis. So we need a reverse proxy which is both efficient and flexible in front of all these containers.

The solution we chose is Traefik.

I thought it would be interesting to write an article about how we implemented Traefik and why we chose it over standard reverse proxies like Nginx.

Why Traefik

Traefik is still a relatively new reverse proxy solution compared to Nginx or Apache, but it’s been gaining a lot of popularity. Traefik’s main advantage is that it seamlessly integrates with Docker, Docker Compose and Docker Swarm (and even Kubernetes and more): basically your whole Traefik configuration can be in your docker-compose.yml file which is very handy, and, whenever you add new services to your cluster, Traefik discovers them on the fly without having to restart anything.

So Traefik makes maintainability easier and is good from a high-availability standpoint.

It is developed in Go while Nginx is coded in C so I guess it makes a slight difference in terms of performance, but nothing that I could perceive, and in my opinion it is negligible compared to the advantages it gives you.

Traefik takes kind of a learning curve though and, even if their documentation is pretty good, it is still easy to make mistakes and hard to find where the problem is coming from, so let me give you a couple of ready-to-use examples below.

Install Traefik

Basically you don’t have much to do here. Traefik is just another Docker image you’ll need to add to your cluster as a service in your docker-compose.yml:

version: '3.8'
services:
    traefik:
        image: traefik:v2.3

There are several ways to integrate Traefik but, like I said above, we are going to go for the Docker Compose integration.

Basic Configuration

90% of the Traefik’s configuration is done through Docker labels.

Let’s say we have 3 services:

More details about spaCy NLP models here and FastAPI here.

Here is a basic local staging configuration routing the requests to the correct services in your docker-compose.yml:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
    corporate:
        image: <your corporate image>
        labels:
            - traefik.http.routers.corporate.rule=Host(`localhost`)
    en_core_web_sm:
        image: <your en_core_web_sm model API image>
        labels:
            - traefik.http.routers.en_core_web_sm.rule=Host(`api.localhost`) && PathPrefix(`/en_core_web_sm`)
    en_core_web_lg:
        image: <your en_core_web_lg model API image>
        labels:
            - traefik.http.routers.en_core_web_lg.rule=Host(`api.localhost`) && PathPrefix(`/en_core_web_lg`)

You can now access your corporate website at http://localhost, your en_core_web_sm model at http://api.localhost/en_core_web_sm, and your en_core_web_lg model at http://api.localhost/en_core_web_lg.

As you can see it’s dead simple.

It was for our local staging only, so we now want to do the same for production in a Docker Swarm cluster:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker.swarmmode
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
        deploy:
            placement:
                constraints:
                    - node.role == manager
    corporate:
        image: <your corporate image>
        deploy:
            labels:
                - traefik.http.routers.corporate.rule=Host(`nlpcloud.io`)
    en_core_web_sm:
        image: <your en_core_web_sm model API image>
        deploy:
            labels:
                - traefik.http.services.en_core_web_sm.loadbalancer.server.port=80
                - traefik.http.routers.en_core_web_sm.rule=Host(`api.nlpcloud.io`) && PathPrefix(`/en_core_web_sm`)
    en_core_web_lg:
        image: <your en_core_web_lg model API image>
        deploy:
            labels:
                - traefik.http.services.en_core_web_lg.loadbalancer.server.port=80
                - traefik.http.routers.en_core_web_lg.rule=Host(`api.nlpcloud.io`) && PathPrefix(`/en_core_web_lg`)

You can now access your corporate website at http://nlpcloud.io, your en_core_web_sm model at http://api.nlpcloud.io/en_core_web_sm, and your en_core_web_lg model at http://api.nlpcloud.io/en_core_web_lg.

It’s still fairly simple but the important things to notice are the following:

  • We should explicitely use the docker.swarmmode provider instead of docker
  • Labels should now be put in the deploy section
  • We need to manually declare the port of each service by using the loadbalancer directive (this has to be done manually because of Docker Swarm lacking the port auto discovery feature)
  • We have to make sure that Traefik will be deployed on a manager node of the Swarm by using constraints

You now have a fully fledged cluster thanks to Docker Swarm and Traefik. Now it’s likely that you have specific requirements and no doubt that the Trafik documentation will help. But let me show you a couple of features we use at NLP Cloud.

Forwarded Authentication

Let’s say your NLP API endpoints are protected and users need a token to reach them. A good solution for this use case is to leverage Traefik’s ForwardAuth.

Basically Traefik will forward all the user requests to a dedicated page you created for the occasion. This page will take care of checking the headers of the request (and maybe extract an authentication token for example) and determine whether the user has the right to access the resource. If it has, the page should return an HTTP 2XX code.

If a 2XX code is returned, Traefik will then make the actual request to the final API endpoint. Otherwise, it will return an error.

Please note that, for performance reasons, Traefik only forwards the user request headers to your authentication page, not the request body. So it’s not possible to authorize a user request based on the body of the request.

Here’s how to achieve it:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker.swarmmode
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
        deploy:
            placement:
                constraints:
                    - node.role == manager
    corporate:
        image: <your corporate image>
        deploy:
            labels:
                - traefik.http.routers.corporate.rule=Host(`nlpcloud.io`)
    en_core_web_sm:
        image: <your en_core_web_sm model API image>
        deploy:
            labels:
                - traefik.http.services.en_core_web_sm.loadbalancer.server.port=80
                - traefik.http.routers.en_core_web_sm.rule=Host(`api.nlpcloud.io`) && PathPrefix(`/en_core_web_sm`)
                - traefik.http.middlewares.forward_auth_api_en_core_web_sm.forwardauth.address=https://api.nlpcloud.io/auth/
                - traefik.http.routers.en_core_web_sm.middlewares=forward_auth_api_en_core_web_sm
    api_auth:
        image: <your api_auth image>
        deploy:
            labels:
                - traefik.http.services.en_core_web_sm.loadbalancer.server.port=80
                - traefik.http.routers.en_core_web_sm.rule=Host(`api.nlpcloud.io`) && PathPrefix(`/auth`)

At NLP Cloud, the api_auth service is actually a Django + Django Rest Framework image in charge of authenticating the requests.

Custom Error Pages

Maybe you don’t want to show raw Traefik error pages to users. If so, it’s possible to replace error pages with your custom error pages.

Traefik does not keep any custom error page in memory, but it can use error pages served by one of your services. When contacting your service in order to retrieve the custom error page, Traefik also passes the HTTP error code as a positional argument, so you can show different error pages based on the initial HTTP error.

Let’s says we have a small static website served by Nginx that serves your custom error pages. We want to use its error pages for HTTP errors from 400 to 599. Here’s how you would do it:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker.swarmmode
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
        deploy:
            placement:
                constraints:
                    - node.role == manager
            labels:
                - traefik.http.middlewares.handle-http-error.errors.status=400-599
                - traefik.http.middlewares.handle-http-error.errors.service=errors_service
                - traefik.http.middlewares.handle-http-error.errors.query=/{status}.html
    corporate:
        image: <your corporate image>
        deploy:
            labels:
                - traefik.http.routers.corporate.rule=Host(`nlpcloud.io`)
                - traefik.http.routers.corporate.middlewares=handle-http-error
    errors_service:
        image: <your static website image>
        deploy:
            labels:
                - traefik.http.routers.corporate.rule=Host(`nlpcloud.io/errors`)

For example thanks to the example above, a 404 error would now use this page: http://nlpcloud.io/errors/404.html

HTTPS

A cool feature from Traefik is that is can automatically provision and use TLS certificates with Let’s Encrypt.

They have a nice tutorial about how to set it up with Docker so I’m just pointing you to the right resource: https://doc.traefik.io/traefik/user-guides/docker-compose/acme-tls/

Raising Upload Size Limit

The default upload size limit is pretty low for performance reasons (I think it’s 4194304 bytes but I’m not 100% sure as it’s not in their docs).

In order to increase it, you need to use the maxRequestBodyBytes directive:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker.swarmmode
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
        deploy:
            placement:
                constraints:
                    - node.role == manager
    corporate:
        image: <your corporate image>
        deploy:
            labels:
                - traefik.http.routers.corporate.rule=Host(`nlpcloud.io`)
                - traefik.http.middlewares.upload-limit.buffering.maxRequestBodyBytes=20000000
                - traefik.http.routers.backoffice.middlewares=upload-limit

In the example above, we raised the upload limit to 20MB.

But don’t forget that uploading a huge file all at once is not necessarily the best option. Instead you want to cut the file in chunks and upload each chunk independantly. I might write an article about this in the future.

Debugging

There are a couple of things you can enable to help you debug Traefik.

First thing is to enable the debugging mode which will show you tons of stuffs about what Traefik is doing.

Second thing is to enable access logs in order to see all incoming HTTP requests.

Last of all, Traefik provides a cool built-in dashboard that helps debug your configuration. It is really useful as it is sometimes tricky to understand why things are not working.

In order to turn on the above features, you could do the following:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker.swarmmode
            - --log.level=DEBUG
            - --accesslog
            - --api.dashboard=true
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
        deploy:
            placement:
                constraints:
                    - node.role == manager
            labels:
                - traefik.http.routers.dashboard.rule=Host(`dashboard.nlpcloud.io`)
                - traefik.http.routers.dashboard.service=api@internal
                - traefik.http.middlewares.auth.basicauth.users=<your basic auth user>.<your basic auth hashed password>.
                - traefik.http.routers.dashboard.middlewares=auth

In this example we enabled debugging, access logs, and the dashboard that can be accessed at http://dashboard.nlpcloud.io with basic auth.

Conclusion

As you can see, Traefik is perfectly integrated with your Docker Compose configuration. If you want to change the config for a service, or add or remove services, just modify your docker-compose.yml and redeploy your Docker Swarm cluster. New changes will be taken into account, and services that were not modified don’t even have to restart, which is great for high availability.

I will keep writing a couple of articles about the stack we use at NLP Cloud. I think next one will be about our frontend and how we are using HTMX instead of big javascript frameworks.

If any question don’t hesitate to ask!

Existe aussi en français
Container Orchestration With Docker Swarm

NLP Cloud is a service I have contributed to recently. It is an API based on spaCy and HuggingFace transformers in order to propose Named Entity Recognition (NER), sentiment analysis, text classification, summarization, and much more. It is using several interesting technologies under the hood so I thought I would create a series of articles about this. This first one will be about container orchestration and how we are implementing it thanks to Docker Swarm. Hope it will be useful!

Why Container Orchestration

NLP Cloud is using tons of containers, mainly because each NLP model is running inside its own container. Not only do pre-trained models have their own containers, but also each user’s custom model has a dedicated container. It is very convenient for several reasons:

  • It is easy to run an NLP model on the server that has the best resources for it. Machine learning models are very resource hungry: they consume a lot of memory, and it is sometimes interesting to run them on a GPU (in case you are using NLP transformers for example). It is then best to deploy them onto a machine with specific hardware.
  • Horizontal scalability can be ensured by simply adding more replicas of the same NLP model
  • High availability is made easier thanks to redundancy and automatic failover
  • It helps lowering costs: scaling horizontally on a myriad of small machines is much more cost effective than scaling vertically on a couple of big machines

Of course setting up such an architecture takes time and skills but in the end it often pays off when you’re building a complex application.

Why Docker Swarm

Docker Swarm is usually opposed to Kubernetes and Kubernetes is supposed to be the winner of the container orchestration match today. But things are not so simple…

Kubernetes has tons of settings that make it perfect for very advanced use cases, but this versatility comes at a cost: Kubernetes is hard to install, configure, and maintain. It is actually so hard that today most companies using Kuberbetes are actually using a managed version of Kubernetes, on GCP for example, and cloud providers don’t all have the same implementation of Kubernetes in their managed offer.

Let’s not forget that Google intially built Kubernetes for their internal needs, the same way that Facebook built React for their own needs too. But you might not have to manage the same complexity for your project, and so many projects could be delivered faster and be maintained more easily by using simpler tools…

At NLP Cloud, we have a lot of containers but we do not need the complex advanced configuration capabilities of Kubernetes. We do not want to use a managed version of Kubernetes either: first for cost reasons, but also because we want to avoid vendor lock-in, and lastly for privacy reasons.

Docker Swarm also has an interesting advantage: it integrates seamlessly with Docker and Docker Compose. It makes configuration a breeze and for teams already used to working with Docker it creates no additional difficulty.

Install the Cluster

Let’s say we want to have 5 servers in our cluster:

  • 1 manager node that will orchestrate the whole cluster. It will also host the database (just an example, the DB could perfectly be on a worker too).
  • 1 worker node that will host our Python/Django backoffice
  • 3 worker nodes that will host the replicated FastAPI Python API serving an NLP model

We are deliberately omitting the reverse proxy that will load balance requests to the right nodes as it will be the topic of a next blog post.

Provision the Servers

Order 5 servers where you want. It can be OVH, Digital Ocean, AWS, GCP… doesn’t matter.

It’s important for you to think about the performance of each server depending on what it will be dedicated to. For example, for the node hosting a simple backoffice you might not need huge performance. For the node hosting the reverse proxy (not addressed in this tutorial) you might need more CPU than usual. And for the API nodes serving the NLP model you might want a lot of RAM, and maybe even GPU.

Install a Linux distribution on each server. I would go for the latest Ubuntu LTS version as far as I’m concerned.

On each server, install the Docker engine.

Now give each server a human friendly hostname. It will be usefull so next time you ssh into the server you will see this hostname in your prompt, which is a good practice in order to avoid working on the wrong server… But it will also be used by Docker Swarm as the name for the node. Run the following on each server:

echo <node name> /etc/hostname; hostname -F /etc/hostname

On the manager, login to your Docker registry so Docker Swarm can pull your images (no need to do this on the worker nodes):

docker login

Initialize the Cluster and Attach Nodes

On the manager node, run:

docker swarm init --advertise-addr <server IP address>

--advertise-addr <server IP address> is only needed if your server has several IP addresses on the same interface so Docker knows which one to choose.

Then, in order to attach worker nodes, run the following on the manager:

docker swarm join-token worker

The output will be something like docker swarm join --token SWMTKN-1-5tl7ya98erd9qtasdfml4lqbosbhfqv3asdf4p13-dzw6ugasdfk0arn0 172.173.174.175:2377

Copy this output and paste it to a worker node. Then repeat the join-token operation for each worker node.

You should now be able to see all your nodes by running:

docker node ls

Give Labels to your Nodes

It’s important to label your nodes properly as you will need these labels later in order for Docker Swarm to determine on which node should a container be deployed. If you do not specify which node you want your container to be deployed to, Docker Swarm will deploy it on any node available. This is clearly not what you want.

Let’s say that your backoffice requires few resources and is basically stateless. So the latter can be deployed to any cheap worker node. Your API is stateless too but, on the contrary, it is memory hungry and requires specific hardware dedicated to machine learning, so you want to deploy it only to any machine learning worker node. Last of all, your database is not stateless so it always has to be deployed to the very same server: let’s say this server will be our manager node (but it could very well be a worker node too).

Do the below on the manager.

The manager will host the database so give it the “database” label:

docker node update --label-add type=database <manager name>

Give the “cheap” label to the worker that has poor performances and that will host the backoffice:

docker node update --label-add type=cheap <backoffice worker name>

Last of all, give the “machine-learning” label to all the workers that will host NLP models:

docker node update --label-add type=machine-learning <api worker 1 name>
docker node update --label-add type=machine-learning <api worker 2 name>
docker node update --label-add type=machine-learning <api worker 3 name>

Set Up Configuration With Docker Compose

If you used Docker Compose already you will most likely find the transition to Swarm fairly easy.

If you do not add anything to an existing docker-compose.yml file it will work with Docker Swarm but basically your containers will be deployed anywhere without your control, and they won’t be able to talk to each other.

Network

In order for containers to communicate, they should be on the same virtual network. For example a Python/Django application, a FastAPI API, and a PostgreSQL database should be on the same network to work together. We will manually create the main_network network later right before deploying, so let’s use it now in our docker-compose.yml:

version: "3.8"

networks:
  main_network:
    external: true

services:
  backoffice:
    image: <path to your custom Django image>
    depends_on:
      - database
    networks:
      - main_network
  api:
    image: <path to your custom FastAPI image>
    depends_on:
      - database
    networks:
      - main_network
  database:
    image: postgres:13
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=db_name
    volumes:
      - /local/path/to/postgresql/data:/var/lib/postgresql/data
    networks:
      - main_network

Deployment Details

Now you want to tell Docker Swarm which server each service will be deployed to. This is where you are going to use the labels that you created earlier.

Basically all this is about using the constraints directive like this:

version: "3.8"

networks:
  main_network:
    external: true

services:
  backoffice:
    image: <path to your custom Django image>
    depends_on:
      - database
    networks:
      - main_network
    deploy:
      placement: 
        constraints:
          - node.role == worker
          - node.labels.type == cheap
  api:
    image: <path to your custom FastAPI image>
    depends_on:
      - database
    networks:
      - main_network
    deploy:
      placement: 
        constraints:
          - node.role == worker
          - node.labels.type == machine-learning
  database:
    image: postgres:13
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=db_name
    volumes:
      - /local/path/to/postgresql/data:/var/lib/postgresql/data
    networks:
      - main_network
    deploy:
      placement: 
        constraints:
          - node.role == manager
          - node.labels.type == database

Resources Reservation and Limitation

It can be dangerous to ship your containers as is for 2 reasons:

  • The orchestrator might deploy them to a server that doesn’t have enough resources available (because other containers consume the whole memory available for example)
  • One of your containers might consume more resources than expected and eventually cause troubles to the host server. For example if your machine learning model happens to consume too much memory, it can cause the host to trigger the OOM protection and start killing processes in order to free some RAM. By default the Docker engine is among the very last processes to be killed by the host but if it has to happen it means that all your containers on this host will shut down…

In order to mitigate the above, you can use the reservations and limits directives:

  • reservations makes sure a container is deployed only if the target server has enough resources available. If it hasn’t, the orchestrator won’t deploy it until the necessary ressources are available.
  • limits prevents a container from consuming too many resources once it is deployed somewhere.

Let’s say we want our API container - embedding a machine learning model - to be deployed only if 5GB of RAM and half the CPU are available. Let’s also say the API can consume up to 10GB of RAM and 80% of the CPU. Here’s what we should do:

version: "3.8"

networks:
  main_network:
    external: true

services:
  api:
    image: <path to your custom FastAPI image>
    depends_on:
      - database
    networks:
      - main_network
    deploy:
      placement: 
        constraints:
          - node.role == worker
          - node.labels.type == machine-learning
      resources:
        limits:
          cpus: '0.8'
          memory: 10G
        reservations:
          cpus: '0.5'
          memory: 5G

Replication

In order to implement horizontal scalability, you might want to replicate some of your stateless applications. You just need to use the replicas directive for this. For example let’s say we want our API to have 3 replicas, here’s how to do it:

version: "3.8"

networks:
  main_network:
    external: true

services:
  api:
    image: <path to your custom FastAPI image>
    depends_on:
      - database
    networks:
      - main_network
    deploy:
      placement: 
        constraints:
          - node.role == worker
          - node.labels.type == machine-learning
      resources:
        limits:
          cpus: '0.8'
          memory: 10G
        reservations:
          cpus: '0.5'
          memory: 5G
      replicas: 3

More

More settings are available for more control on your cluster orchestration. Don’t hesitate to refer to the docs for more details.

Secrets

Docker Compose has a built-in convenient way to manage secrets by storing each secret into an external individual file. Thus these files are not part of your configuration and can even be encrypted if necessary, which is great for security.

Let’s say you want to secure the PostgreSQL DB credentials.

First create 3 secret files on your local machines:

  • Create a db_name.secret file and put the DB name in it
  • Create a db_user.secret file and put the DB user in it
  • Create a db_password.secret file and put the DB password in it

Then in your Docker Compose file you can use the secrets this way:

version: "3.8"

networks:
  main_network:
    external: true

secrets:
  db_name:
    file: "./secrets/db_name.secret"
  db_user:
    file: "./secrets/db_user.secret"
  db_password:
    file: "./secrets/db_password.secret"

services:
  database:
    image: postgres:13
    secrets:
      - "db_name"
      - "db_user"
      - "db_password"
    # Adding the _FILE prefix makes the Postgres image to automatically
    # detect secrets and properly load them from files.
    environment:
      - POSTGRES_USER_FILE=/run/secrets/db_user
      - POSTGRES_PASSWORD_FILE=/run/secrets/db_password
      - POSTGRES_DB_FILE=/run/secrets/db_name
    volumes:
      - /local/path/to/postgresql/data:/var/lib/postgresql/data
    deploy:
      placement:
        constraints:
          - node.role == manager
          - node.labels.type == database
    networks:
      - main_network

Secret files are automatically injected into the containers in /run/secrets by Docker Compose. Careful though: these secrets are located in files, not in environment variables. So you then need to manually open these files and read the secrets. The PostgreSQL image has a convenient feature: if you append the _FILE suffix to the environment variable, the image will automatically read the secrets from files.

Staging VS Production

You most likely want to have at least 2 different types of Docker Compose configurations:

  • 1 for your local machine that will be used both for the Docker images creation, but also as a staging environment
  • 1 for production

You have 2 choices. Either leverage the Docker Compose inheritance feature so you only have to write one big docker-compose.yml base file and then write an additional small staging.yml file dedicated to staging and another additional small production.yml file dedicated to production.

In the end at NLP Cloud we ended up realizing that our staging and production configurations were so different that it was easier to just maintain 2 different big files: one for staging and one for production. The main reason is that our production environment uses Docker Swarm but our staging environment doesn’t, so playing with both is pretty impractical.

Deploy

Now we assume that you have locally built your images and pushed them to your Docker registry. Let’s say we only have one single production.yml file for production.

Copy your production.yml file to the server using scp:

scp production.yml <server user>@<server IP>:/remote/path/to/project

Copy your secrets too (and make sure to upload them to the folder you declared in the secrets section of your Docker Compose file):

scp /local/path/to/secrets <server user>@<server IP>:/remote/path/to/secrets

Manually create the network that we’re using in our Docker Compose file. Please note it’s also possible not to do it and let Docker Swarm automatically create it if it’s declared in your Docker Compose file. But we noticed it’s creating erratic behaviors when recreating the stack because Docker does not recreate the network fast enough.

docker network create --driver=overlay main_network

You also need to create the volumes directories manually. The only volume we have in this tuto is for the database. So let’s create it on the node hosting the DB (i.e. the manager):

mkdir -p /local/path/to/postgresql/data

Ok everything is set, so now is time to deploy the whole stack!

docker stack deploy --with-registry-auth -c production.yml <stack name>

The --with-registry-auth option is needed if you need to pull images located on password protected registries.

Wait a moment as Docker Swarm is now pulling all the images and installing them on the nodes. Then check if everything went fine:

docker service ls

You should see something like the following:

ID             NAME                       MODE         REPLICAS   IMAGE
f1ze8qgf24c7   <stack name>_backoffice    replicated   1/1        <path to your custom Python/Django image>     
gxboram56dka   <stack name>_database      replicated   1/1        postgres:13      
3y1nmb6g2xoj   <stack name>_api           replicated   3/3        <path to your custom FastAPI image>      

The important thing is that REPLICAS should all be at their maximum. Otherwise it means that Docker is still pulling or installing your images, or that something went wrong.

Manage the Cluster

Now that your cluster is up and running, here are a couple of usefull things you might want to do to administer your cluster:

  • See all applications and where they are deployed: docker stack ps <stack name>
  • See applications on a specific node: docker node ps <node name>
  • See logs of an application: docker service logs <stack name>_<service name>
  • Completely remove the whole stack: docker stack rm <stack name>

Everytime you want to deploy a new image to the cluster, first upload it to your registry, and just run the docker stack deploy command again on the manager.

Conclusion

As you can see, setting up a Docker Swarm cluster is far from complex, especially when you think about the actual complexity that has to be handled under the hood in such distributed sytems.

Of course many more options are available and you will most likely want to read the documention. Also we did not talk about the reverse proxy/load balancing aspect but it’s an important one. In a next tutorial we will see how to achieve this with Traefik.

At NLP Cloud our configuration is obviously much more complex than what we showed above, and we had to face several tricky challenges in order for our architecture to be both fast and easy to maintain. For example, we have so many machine learning containers that manually writing the configuration file for each container was not an option, so new auto generation mechanisms had to be implemented.

If you are interested in having more in-depth details please don’t hesitate to ask, it will be pleasure to share.

Existe aussi en français
Crawling large volumes of web pages

Crawling and scraping data from the web is a funny thing. It’s fairly easy to achieve and it’s giving you immediate results. However, scaling from a basic crawler (thanks to a quick Python script for example) to a full speed large volume crawler, is hard to achieve. I’ll try to tell you about a couple of typical challenges one faces when building such a web crawler.

Concurrency

Concurrency is absolutely central to more and more modern applications and it’s especially true to applications that are heavily relying on network access like web crawlers. Indeed, as every HTTP request you’re triggering is taking a lot of time to return, you’d better launch them in parallel rather than sequentially. Basically it means that if you’re crawling 10 web pages taking 1 second each, it will roughly take 1 second overall rather than 10 seconds.

So concurrency is critical to web crawlers, but how to achieve it?

The naive approach, which works well for a small application, is to code a logic that triggers jobs in parallel, wait for all the results, and process them. Typically in Python you would spawn several parallel processes, and in Golang (which is better suited for this kind of thing) you would create goroutines. But handling this manually can quickly become a hassle: as your RAM and CPU resources are limited there’s no way you can crawl millions of web pages in parallel, so how do you handle job queues, and how can you handle retries in case some jobs are failing (and they will for sure) or in case your server is stopping for some reason?

The most robust approach is to leverage a messaging system like RabbitMQ. Every new URL parsed by your application should now be enqueued in RabbitMQ, and every new page your application needs to crawl should be dequeued from RabbitMQ. The amount of concurrent requests you want to reach is just a simple setting in RabbitMQ.

Of course, even when using a messaging system, the choice of the underlying programming language remains important: triggering 100 parallel jobs in Go will cost you much less resources than in Python for example (which is partly why I really like Go!).

Scalability

At some point, no matter how lightweight and optimized your web crawler is, you’ll be limited by hardware resources.

The first solution is to upgrade your server (which is called “vertical scalability”). It’s easy but once you’re reaching a certain level of RAM and CPU, it’s cheaper to favor “horizontal scalability”.

Horizontal scalability is about adding several modest servers to your infrastructure, rather than turning one single server into a supercomputer. Achieving this is harder though because your servers might have to communicate about a shared state, and a refactoring of your application might be needed. Good news is that a web crawler can fairly easily become “stateless”: several instances of your application can be run in parallel and shared information will most likely be located in your messaging system and/or your database. It’s easy then to increase/decrease the number of servers based on the speed you want to achieve. Each server should handle a certain amount of concurrent requests consumed from the messaging server. It’s up to you to define how many concurrent requests each server can handle depending on its RAM/CPU resources.

Containers orchestrators like Kubernetes make horizontal scalability easier. It’s easy to scale up to more instances simply by clicking a button, and you can even let Kubernetes auto-scale your instances for you (always set limits though in order to control the costs).

If you want to have a deeper understanding of the scalability challenges, you should read this amazing book by Martin Kleppmann: Data Intensive Applications.

Data Intensive Applications book

Report Errors Wisely

Tons of ugly things can happen during a crawl: connectivity issues (on the client side and on the server side), network congestion, target page too big, memory limit reached,…

It’s crucial that you handle these errors gracefully and that you report them wisely in order not to get overwhelmed by errors.

A good practice it centralize all errors into Sentry. Some errors are never sent to Sentry because we’re not considering them as critical and we don’t want to be alerted for that. For example, we want to know when an instance is reaching memory issues, but we don’t want to know when a URL cannot be downloaded because of a website timing out (this kind of error is business as usual for a crawler). It’s up to you to fine tune which errors are worth being reported urgently and which one are not.

File Descriptors and RAM Usage

When dealing with web crawlers, it’s worth being familiar with file descriptors. Every HTTP request you’re launching is opening a file descriptor, and a file descriptor is consuming memory.

On Linux systems, the max number of open file descriptors in parallel is capped by the OS in order to avoid breaking the system. Once this limit is reached you won’t be able to concurrently open any new webpage.

You might want to increase this limit but proceed carefully as it could lead to excessive RAM usage.

Avoid Some Traps

Here are 2 typical tricks that drastically help improve performance when crawling large volumes of data:

  • Abort if excessive page size: some pages are too big and should be ignored not only for stability reasons (you don’t want to fill your disk with it) but also for efficiency reasons.
  • Fine tune timeouts wisely: a web request might time out for several reasons and it’s important that you understand the underlying concept in order to adopt different levels of timeouts. See this great Cloudflare article for more details. In Go you can set a timeout when creating a net/http client, but a more idiomatic (and maybe more modern) approach is to use contexts for that purpose.

DNS

When crawling millions of web pages, the default DNS server you’re using is likely to end up rejecting your requests. Then it’s interesting to start using a more robust DNS server like the Google or Cloudflare ones, or even rotate resolution requests among several DNS servers.

Refresh Data

Crawling data once is often of little interest. Data should be refreshed asynchronously on a regular basis using periodic tasks, or synchronously upon a user request.

A recent application I worked on refreshed data asynchronously. Every time we were crawling a domain, we were storing the current date in the database, and then everyday a periodic task was looking for all the domains we had in db that needed to be refreshed. As Cron was too limited for our needs, we were using this more advanced Cron-like tool for Go applications: https://github.com/robfig/cron.

Being Fair

Crawling the web should be done respectfully. It basically means 2 things:

  • don’t crawl a web page if its robots.txt file disallows it
  • don’t hammer a single web server with tons of requests: set a very low concurrency level when you’re crawling several pages from a single domain, and pause for a moment between 2 requests

Conclusion

Setting up a large volume crawler is a fascinating journey which requires both coding and infrastructure considerations.

In this post I’m only scratching the surface but hopefully some of these concepts will help you in your next project! If you have comments or questions, please don’t hesitate to reach out to me, I’ll be pleased.

Existe aussi en français
Build a PWA with push notifications thanks to Vue.js and Django

Setting up a Progressive Web App (PWA) is dead simple with Vue.js, and especially since Vue CLI v3. However implementing push notifications can be pretty tricky.

Vue.js is going to be used for the frontend side, Python/Django and Django Rest Framework for the backend, and Google Firebase Messaging as the messaging intermediary. The latter is necessary as it will be the third party in charge of pushing the notifications to the device. I know it’s pretty disappointing being forced to use such an additional layer in our stack but there is no other choice. Of course there are alternatives to Firebase, like Pusher for example.

Firebase will have to be integrated into several parts of your code:

  • in the frontend for the browser to listen to Firebase for new notifications
  • in the frontend again on the page where you want to ask the user for his permission to enable notifications and, if he agrees, get a notification token from Firebase and send it to the backend to store it in DB. If a user uses several browsers (e.g. Chromium mobile on his smartphone, and Firefox desktop on his PC), several tokens will be associated with him in DB, and notifications will be received in several locations at the same time.
  • in the backend to receive the notification token from frontend and store it in DB
  • in the backend to send push notifications to a user by sending a message to the Firebase API. Firebase will take care of retrieving your message and routing it to the right associated browser.

Please keep in mind that the PWA standard is still evolving and not yet equally implemented in all browsers/platforms. For example push notifications are not yet implemented on iOS as of this writing!

Vue.js PWA

Install the Vue.js CLI thanks to the following npm command (install NPM first if needed):

npm i -g @vue/cli

Create a new PWA project:

vue create <My Project Name>

Select the “Manually select features” option and then select “Progressive Web App (PWA) support”:

Vue CLI v3

Select all the other options you need and wait for Vue CLI to create the project. Please notice that Vue CLI automatically creates a registerServiceWorker.js in the src directory and imports it at the top of your main.js. This file will take care of creating a service-worker.js at the root of your website during production build. The latter is needed in order for the browser to detect your website as a PWA.

In your public directory create a manifest.json file which will describe your PWA: the name of your app, app icons for various screen sizes, colors… Important things are the start_url which is the URL to open by default when launching the PWA on your smartphone, and gcm_sender_id which is the ID that all web apps using Firebase should use (don’t change it then). You can specify much more information in this file, just have a look at the docs. You can also use this helper if you like. It should look like the following:

{
  "name": "My App Name",
  "short_name": "My App Short Name",
  "icons": [{
      "src": "./img/icons/android-chrome-192x192.png",
      "sizes": "192x192",
      "type": "image/png"
    },
    {
      "src": "./img/icons/android-chrome-512x512.png",
      "sizes": "512x512",
      "type": "image/png"
    },
    {
      "src": "./img/icons/apple-touch-icon-60x60.png",
      "sizes": "60x60",
      "type": "image/png"
    },
    {
      "src": "./img/icons/apple-touch-icon-76x76.png",
      "sizes": "76x76",
      "type": "image/png"
    },
    {
      "src": "./img/icons/apple-touch-icon-120x120.png",
      "sizes": "120x120",
      "type": "image/png"
    },
    {
      "src": "./img/icons/apple-touch-icon-152x152.png",
      "sizes": "152x152",
      "type": "image/png"
    },
    {
      "src": "./img/icons/apple-touch-icon-180x180.png",
      "sizes": "180x180",
      "type": "image/png"
    },
    {
      "src": "./img/icons/apple-touch-icon.png",
      "sizes": "180x180",
      "type": "image/png"
    },
    {
      "src": "./img/icons/favicon-16x16.png",
      "sizes": "16x16",
      "type": "image/png"
    },
    {
      "src": "./img/icons/favicon-32x32.png",
      "sizes": "32x32",
      "type": "image/png"
    },
    {
      "src": "./img/icons/msapplication-icon-144x144.png",
      "sizes": "144x144",
      "type": "image/png"
    },
    {
      "src": "./img/icons/mstile-150x150.png",
      "sizes": "150x150",
      "type": "image/png"
    }
  ],
  "start_url": ".",
  "display": "standalone",
  "background_color": "#000000",
  "theme_color": "#210499",
  "gcm_sender_id": "103953800507"
}

Please note that your site should be HTTPS in order for the browser to read the manifest.json and behave like a PWA.

If everything goes fine, the PWA should now be easily installable on your smartphone. Visit your website with a modern mobile browser like Chrome. If the browser detects the manifest.json it should automatically propose you to install this PWA as a phone application (still not supported by all the browsers as of this writing).

Firebase Set Up

In order for your PWA to support push notifications, you should pair with an external service like Firebase Cloud Messaging (FCM). Please note that FCM is a subset of Firebase but you don’t need any of the other Firebase features (like DB, hosting…).

So please create a Firebase account, go to your Firebase console, create a project for your website, and retrieve the following information from your project settings (careful, there are multiple tabs to open, and this is not obvious to get all the information at once):

  • Project ID
  • Web API Key
  • Messaging Sender ID
  • Server Key
  • create a web push certificate and then retrieve the Public Vapid Key generated

Django Backend

I’m assuming that you’re using Django Rest Framework here.

In Django, use the FCM Django third party app to make your FCM integration easier (this app will take care of automatically saving and deleting notification tokens in DB, and will provide you with a helper to easily send notifications to FCM).

Install the app with pip install fcm-django, add it to your Django apps, and set it up (feel free to adapt the below settings, the only required one is FCM_SERVER_KEY for FCM authentication):

INSTALLED_APPS = (
        ...
        "fcm_django"
)

FCM_DJANGO_SETTINGS = {
        # authentication to Firebase
        "FCM_SERVER_KEY": "<Server Key>",
        # true if you want to have only one active device per registered user at a time
        # default: False
        "ONE_DEVICE_PER_USER": False,
        # devices to which notifications cannot be sent,
        # are deleted upon receiving error response from FCM
        # default: False
        "DELETE_INACTIVE_DEVICES": True,
}

Add a route in urls.py to the FCM Django endpoint that will take care of receiving the notification token and store it in DB:

from fcm_django.api.rest_framework import FCMDeviceAuthorizedViewSet

urlpatterns = [
  path('register-notif-token/',
    FCMDeviceAuthorizedViewSet.as_view({'post': 'create'}), name='create_fcm_device'),
]

Now whenever you want to send a push notification to a user do the following (likely to be in your views.py):

from fcm_django.models import FCMDevice

user = <Retrieve the user>
fcm_devices = FCMDevice.objects.filter(user=user)
fcm_devices.send_message(
  title="<My Title>", body="<My Body>", time_to_live=604800,
  click_action="<URL of the page that opens when clicking the notification>")

It’s up to you to adapt the query on the database to define precisely whom you want to send push notifs to. Here I’m sending push notifs to all the browsers of a user, but I could also decide to send notifs to a specific browser (called “device” in the FCM Django terminology).

There are more parameters available in the send_message method, feel free to have a look at the docs but also at the docs of the underlying Python project this library is based on.

Setting up the time_to_live was necessary in my case: Firebase say there is a default time to live set but it appeared there wasn’t when I tested it (bug?) so when notifications were sent while the user device was turned off, he never received it again when turning on the device.

Implementing Push Notifications in Vue.js

Create a firebase-messaging-sw.js file in your public directory and put the following inside:

importScripts('https://www.gstatic.com/firebasejs/5.5.6/firebase-app.js');
importScripts('https://www.gstatic.com/firebasejs/5.5.6/firebase-messaging.js');

var config = {
    apiKey: "<Web API Key>",
    authDomain: "<Project ID>.firebaseapp.com",
    databaseURL: "https://<Project ID>.firebaseio.com",
    projectId: "<Project ID>",
    storageBucket: "<Project ID>.appspot.com",
    messagingSenderId: "<Messenging Sender ID>"
};

firebase.initializeApp(config);

const messaging = firebase.messaging();

You now have a valid service worker which will poll Firebase in the background listening to new incoming push notifications.

It’s time now to ask the user for his permission to send him notifications and, if he agrees, get a notification token from FCM and store it in the backend DB. Your backend will use this token to send push notifications through FCM. It’s up to you to decide on which page of your app you want to do ask the user permission. For example you could implement this on the home page of your application once the user is logged in. You could do something like this:

import firebase from 'firebase/app'
import 'firebase/app'
import 'firebase/messaging'

export default {
  methods: {
    saveNotificationToken(token) {
      const registerNotifTokenURL = '/register-notif-token/'
      const payload = {
        registration_id: token,
        type: 'web'
      }
      axios.post(registerNotifTokenURL, payload)
        .then((response) => {
          console.log('Successfully saved notification token!')
          console.log(response.data)
        })
        .catch((error) => {
          console.log('Error: could not save notification token')
          if (error.response) {
            console.log(error.response.status)
            // Most of the time a "this field must be unique" error will be returned,
            // meaning that the token already exists in db, which is good.
            if (error.response.data.registration_id) {
              for (let err of error.response.data.registration_id) {
                console.log(err)
              }
            } else {
              console.log('No reason returned by backend')
            }
            // If the request could not be sent because of a network error for example
          } else if (error.request) {
            console.log('A network error occurred.')
            // For any other kind of error
          } else {
            console.log(error.message)
          }
        })
      },
    },
  mounted() {
    var config = {
      apiKey: "<Web API Key>",
      authDomain: "<Project ID>.firebaseapp.com",
      databaseURL: "https://<Project ID>.firebaseio.com",
      projectId: "<Project ID>",
      storageBucket: "<Project ID>.appspot.com",
      messagingSenderId: "<Messenging Sender ID>"
    }
    firebase.initializeApp(config)

    const messaging = firebase.messaging()

    messaging.usePublicVapidKey("<Public Vapid Key>")

    messaging.requestPermission().then(() => {
      console.log('Notification permission granted.')
      messaging.getToken().then((token) => {
        console.log('New token created: ', token)
        this.saveNotificationToken(token)
      })
    }).catch((err) => {
      console.log('Unable to get permission to notify.', err)
    })

    messaging.onTokenRefresh(function () {
      messaging.getToken().then(function (newToken) {
        console.log('Token refreshed: ', newToken)
        this.saveNotificationToken(newToken)
      }).catch(function (err) {
        console.log('Unable to retrieve refreshed token ', err)
      })
    })
  }
}

Conclusion

Setting up push notifications within a PWA is definitely NOT straightforward! Many parts of your application should be involved and you need to understand how the third party you chose (here Firebase) is working.

Please keep in mind that PWAs are still pretty new and supported features are constantly evolving. More importantly, don’t base critical information on push notifications only as it’s less reliable than other systems like SMS or emails…

Also, don’t forget to use push notifications carefully as notification flooding can be very annoying!

I hope you liked this how-to. Please don’t hesitate to send me a feedback or add some ideas in the comments!

Existe aussi en français
Leveraging Django Rest Framework and generic views for rapid API development

As a seasoned API developer, you end up doing very repetitive tasks so you might be looking for tools that makes your development time faster. As a novice, you might be looking for a way to implement best practice and REST standards easily out of the box without too much hesitation.

In both cases, Django Rest Framework (DRF) is a great solution. It is a standard, widely used, and fully featured API framework that will not only save you a lot of time but also show you the right way to develop RESTful APIs. More particularly, DRF proposes generic views, that’s to say pre-built endpoints for your API. Let’s see how to leverage this feature to achieve rapid API development.

I put the below code in a little working Django project right here.

Concept

DRF’s Generic views are perfect for simple APIs that basically do CRUD (create, read, update, delete) on the database without too much data processing. For example, let’s say you have a product table that contains all your store products and you want to expose these products as is to customers through an API, then it’s a perfect use case for the ListAPIView (see below).

From now on I’m assuming that you installed Python, Django, DRF and that you know tha basics about Django.

Basic Example 1: Reading Data

Let’s create an API endpoint showing all the products customers. In your views.py do the following:

from rest_framework import generics
from .serializers import ProductsSerializer

class GetProducts(generics.ListAPIView):
    """Return all products."""
    serializer_class = ProductsSerializer

ProductsSerializer is the serializer that will convert your data from the database to API friendly data. This serializer should be put in serializers.py and will be in charge of retrieving data from your Product model and transforming them:

from rest_framework import serializers
from .models import Product

class ProductsSerializer(serializers.ModelSerializer):
    """Serialize products."""

    class Meta:
        model = Product
        fields = ("__all__")

Now in your urls.py create the route to this endpoint:

from django.urls import path
from .views import GetProducts

urlpatterns = [
    path('get-products/', GetProducts.as_view(), name='get_products'),
]

As you can see this is dead simple as DRF is doing many things for you under the hoods! You now have an endpoint (/get-products/) that you can consume with get HTTP requests, and that outputs all products with an API format (usually json but it depends on your settings).

Basic Example 2: Deleting Data

Now let’s create an endpoint dedicated to deleting a product for authenticated users only. It’s even simpler as it does not require to serialize data (once the product is deleted no data can be returned to user anymore).

In views.py:

from rest_framework import generics

class DeleteProduct(generics.DestroyAPIView):
    """Remove product"""
    permission_classes = (permissions.IsAuthenticated,) # Limit to authenticated users only

In urls.py

from django.urls import path
from .views import DeleteProduct

urlpatterns = [
    path('delete-product/', DeleteProduct.as_view(), name='delete_product'),
]

Now you have a /delete-product/ endpoint that you can use to delete one product at a time using delete HTTP requests, and that only accepts authenticated requests (authentication mechanism depends on your settings).

Customizing Generic Views’ Behavior

Each generic view can be customized by writing a get_queryset() method. For example let’s say you only want to show products that have an active flag set to True in db. You could do this:

from rest_framework import generics
from .serializers import ProductsSerializer
from .model import Product

class GetProducts(generics.ListAPIView):
    """Return all active products."""
    permission_classes = (permissions.IsAuthenticated,)
    serializer_class = ProductsSerializer

    def get_queryset(self):
        """Filter active products."""
        return Product.objects.filter(active=True)

get_queryset() is a common method that you have in every generic views. Some generic views also have their own methods to control more precisely the behavior of the endpoint. For example, let’s say that you don’t really want to delete products but just mark them as inactive. You could use the destroy() method:

from django.shortcuts import get_object_or_404
from rest_framework.response import Response
from rest_framework import status

class DeleteProduct(generics.DestroyAPIView):
    """Remove product"""
    permission_classes = (permissions.IsAuthenticated,)

    def destroy(self, request, pk):
        """
        By default, DestroyAPIView deletes the product from db.
        Here we only want to flag it as inactive.
        """
        product = get_object_or_404(self.get_queryset(), pk=pk)
        product.active = False
        product.save()
        return Response(status=status.HTTP_204_NO_CONTENT)

In the above example we’re first trying to look for the product that the user wants to delete. If we can’t find it we return a 404 code to user. If the product is successfully marked as inactive, we just return a 204 code to user meaning that the product was successfully deleted.

Generic views are perfect for simple use cases and it’s sometimes wiser to use the classic APIView for edge cases. For example, let’s say you want not only to return products to the user but also enrich data with other information that is not in the Product model (e.g. orders related to this product, product manufacturer, etc.). In that case, if you wanted to use generic views, you would have to define new fields in the serializer thanks to additional get_new_field() methods which can easily make your serializer very ugly…

Conclusion

As you could see, DRF’s generic views make API endpoints development very easy thanks to a bit of magic under the hood. However you should keep in mind that generic views cannot apply to every use cases as sometimes tweaking generic views is harder than developing things by yourself from scratch!

I hope you liked this little how to. I’d love to have your feedbacks!

Existe aussi en français