Storing Stripe Payment Data in the Database

It’s hard to know whether Stripe payment data should be stored in the local database or not. Many developers are wondering which kind of Stripe data they should save in their local DB. They might sometimes even be tempted not to store any data locally and only rely on Stripe API calls.

Let me show you how we’re dealing with this problem at NLP Cloud, and why.

NLP Cloud is a Natural Language Processing API based on spaCy and HuggingFace transformers in order to propose Named Entity Recognition (NER), sentiment analysis, text classification, summarization, text generation, and much more. Customers are charged monthly and their payment is processed by Stripe. It is important to us that the API and the user dashboard are lightning fast, so we want to rely on the Stripe API the less we can. We also don’t want to depend too much on Stripe in case of a data loss on their end.

Typical Scenario

Here is a standard scenario for a subscription based service like we have at NLP Cloud:

  1. A customer registers to your website
  2. You save the customer in DB
  3. You create the customer in Stripe through the Stripe API
  4. You save the Stripe customer ID in the local DB

You cannot really do better than this.

Now the fun begins.

For example you might want to keep track of your customer’s subscription in order to grant him access to some paid features, to more API requests, or, if he’s a free user, disable some features (for example). Sometimes, you are going to update a subscription by yourself, but sometimes Stripe will (for example if a payment fails multiple times, Stripe will mark the subscription as canceled). When a subscription is updated by Stripe, they will let you know through a webhook call. If you are using Stripe Portal, everything is going to be handled on the Stripe end, and any change is going to be sent to you through webhooks.

So dealing with Stripe is a bidirectional thing: sometimes you initiate a change, sometimes they do. It is easy to get out of sync!

Speed Considerations

One might be tempted to delegate as much information as possible to Stripe so Stripe is the single source of truth. In such a situation, you would need to make a call to the Stripe API very often. This is a bad idea.

For example, if your customer subscription data is in Stripe only, you will first need to call Stripe before allowing a customer to access or not a specific paid feature. It adds critical milliseconds to your website’s response time, which is not good. And if Stripe is temporarily lagging, your website is lagging too. In case of an API, this is out of the question: you cannot slow down an API call because you’re waiting for Stripe to return.

Disaster Recovery Considerations

Delegating information to Stripe without local data is risky. Even if Stripe is a solid player, you can never be sure that they’re not going to lose your data.

From a safety standpoint, it is paramount to store the customers’ data locally so you can start again you service somewhere else in case of a disaster, without losing any customer subscription (which would be terrible).

Caching Everything Locally

The strategy we follow at NLP Cloud is to cache everything related to Stripe customers and Stripe subscription locally. It is simpler than it might sound thanks to the fact that modern databases like PostgreSQL can store JSON data seamlessly with almost no performance tradeoffs.

Basically, what you should do - if you want to follow this strategy - is the following:

  1. When you create a Stripe customer with their API, save the Stripe JSON response in a JSON DB field (with PostgreSQL, use the JSONB type)
  2. When you create a Stripe subscription for this customer, do the same
  3. Whenever you need to access Stripe customer or subscription information, just query the customer or the subscription DB fields

Here is an example of data INSERT in a PostgreSQL JSONB field:

CREATE TABLE customers (  
  id serial NOT NULL,
  stripe_customer jsonb
  stripe_subscription jsonb
);
INSERT INTO customers VALUES (1, '{id:1, ...}', '{id:1, ...}');

And here is how you could retrieve the Stripe subscription id for example:

SELECT stripe_subscription->'id' AS id FROM customers;  

2 fields in the DB and that’s it! No need to create a bunch of new fields for every customer fields and subscription fields.

Staying in Sync

In order to make sure that your local cache is perfectly in sync with Stripe you should properly listen to Stripe webhooks.

Every time you get a Stripe webhook about a customer or a subscription, update the customer field or the subscription field in DB.

If you really want to be safe, you should also be prepared for potential Stripe webhooks failures. In that case, the best strategy would be to proactively poll Stripe customers and subscription on a regular basis, in order to make sure you never end up out of sync.

Conclusion

As you can see, it is quite easy to create both a simple and robust Stripe local cache. This strategy saves a lot of development time, it is fast, safe in case of Stripe failure, and you no longer have to wonder which Stripe fields you need to store locally or not.

I hope you found this useful. If you have feedbacks, or if you think of a better strategy, please let me know!

Existe aussi en français
Production-Ready Machine Learning NLP API with FastAPI and spaCy

FastAPI is a new Python API framework that is more and more used in production today. We are using FastAPI under the hood behind NLP Cloud. NLP Cloud is an API based on spaCy and HuggingFace transformers in order to propose Named Entity Recognition (NER), sentiment analysis, text classification, summarization, and much more. FastAPI helped us quickly build a fast and robust machine learning API serving NLP models.

Let me tell you why we made such a choice, and show you how to implement an API based on FastAPI and spaCy for Named Entity Recognition (NER).

Why FastAPI?

Until recently, I’ve always used Django Rest Framework for Python APIs. But FastAPI is proposing several interesting features:

  • It is very fast
  • It is well documented
  • It is easy to use
  • It automatically generates API schemas for you (like OpenAPI)
  • It uses type validation with Pydantic under the hood. For a Go developer like myself who is used to static typing, it’s very cool to able to leverage type hints like this. It makes the code clearer, and less error-prone.

FastAPI’s performances are supposed to make it a great candidate for machine learning APIs. Given that we’re serving a lot of demanding NLP models based on spaCy and transformers at NLP Cloud, FastAPI is a great solution.

Set Up FastAPI

The first option you have is to install FastAPI and Uvicorn (the ASGI server in front of FastAPI) by yourself:

pip install fastapi[all]

As you can see, FastAPI is running behind an ASGI server, which means it can natively work with asynchronous Python requests with asyncio.

Then you can run your app with something like this:

uvicorn main:app

Another option is to use one the Docker images generously provided by Sebastián Ramírez, the creator of FastAPI. These images are maintained and work out of the box.

For example the Uvicorn + Gunicorn + FastAPI image adds Gunicorn to the stack in order to handle parallel processes. Basically Uvicorn handles multiple parallel requests within one single Python process, and Gunicorn handles multiple parallel Python processes.

The application is supposed to automatically start with docker run if you properly follow the image documentation.

These images are customizable. For example, you can tweak the number of parallel processes created by Gunicorn. It’s important to play with such parameters depending on the resources demanded by your API. If your API is serving a machine learning model that takes several GBs of memory, you might want to decrease Gunicorn’s default concurrency, otherwise your application will quickly consume too much memory.

Simple FastAPI + spaCy API for NER

Let’s say you want to create an API endpoint that is doing Named Entity Recognition (NER) with spaCy. Basically, NER is about extracting entities like name, company, job title… from a sentence. More details about NER here if needed.

This endpoint will take a sentence as an input, and will return a list of entities. Each entity is made up of the position of the first character of the entity, the last position of the entity, the type of the entity, and the text of the entity itself.

The endpoint will be queried with POST requests this way:

curl "http://127.0.0.1/entities" \
  -X POST \
  -d '{"text":"John Doe is a Go Developer at Google"}'

And it will return something like this:

[
  {
    "end": 8,
    "start": 0,
    "text": "John Doe",
    "type": "PERSON"
  },
  {
    "end": 25,
    "start": 13,
    "text": "Go Developer",
    "type": "POSITION"
  },
  {
    "end": 35,
    "start": 30,
    "text": "Google",
    "type": "ORG"
  },
]

Here is how we could do it:

import spacy
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List

model = spacy.load("en_core_web_lg")

app = FastAPI()

class UserRequestIn(BaseModel):
    text: str

class EntityOut(BaseModel):
    start: int
    end: int
    type: str
    text: str

class EntitiesOut(BaseModel):
    entities: List[EntityOut]

@app.post("/entities", response_model=EntitiesOut)
def read_entities(user_request_in: UserRequestIn):
    doc = model(user_request_in.text)

    return {
        "entities": [
            {
                "start": ent.start_char,
                "end": ent.end_char,
                "type": ent.label_,
                "text": ent.text,
            } for ent in doc.ents
        ]
    }

The first important thing here is that we’re loading the spaCy model. For our example we’re using a large spaCy pre-trained model for the english language. Large models take more memory and more disk space, but give a better accuracy as they were trained on bigger datasets.

model = spacy.load("en_core_web_lg")

Later, we are using this spaCy model for NER by doing the following:

doc = model(user_request_in.text)
# [...]
doc.ents

The second thing, which is an amazing feature of FastAPI, is the ability to force data validation with Pydantic. Basically, you need to declare in advance which will be the format of your user input, and the format of the API response. If you’re a Go developer, you’ll find it very similar to JSON unmarshalling with structs. For example, we are declaring the format of a returned entity this way:

class EntityOut(BaseModel):
    start: int
    end: int
    type: str
    text: str

Note that start and end are positions in the sentence, so they are integers, and type and text are strings. If the API is trying to return an entity that does not implement this format (for example if start is not an integer), FastAPI will raise an error.

As you can see, it is possible to embed a validation class into another one. Here we are returning a list of entities, so we need to declare the following:

class EntitiesOut(BaseModel):
    entities: List[EntityOut]

Some simple types like int and str are built-in, but more complex types like List need to be explicitly imported.

For brevity reasons, the response validation can be implemented within a decorator:

@app.post("/entities", response_model=EntitiesOut)

More Advanced Data Validation

You can do many more advanced validation things with FastAPI and Pydantic. For example, if you need the user input to have a minimum length of 10 characters, you can do the following:

from pydantic import BaseModel, constr

class UserRequestIn(BaseModel):
    text: constr(min_length=10)

Now, what if Pydantic validation passes, but you later realize that there’s something wrong with the data so you want to return an HTTP 400 code?

Simply raise an HTTPException:

from fastapi import HTTPException

raise HTTPException(
            status_code=400, detail="Your request is malformed")

It’s just a couple of examples, you can do much more! Just have a look at the FastAPI and Pydantic docs.

Root Path

It’s very common to run such APIs behind a reverse proxy. For example we’re using the Traefik reverse proxy behind NLPCloud.io.

A tricky thing when running behind a reverse proxy is that your sub-application (here the API) does not necessarily know about the whole URL path. And actually that’s great because it shows that your API is loosely coupled to the rest of your application.

For example here we want our API to believe that the endpoint URL is /entities, but actually the real URL might be something like /api/v1/entities. Here’s how to do it by setting a root path:

app = FastAPI(root_path="/api/v1")

You can also achieve it by passing an extra parameter to Uvicorn in case you’re starting Uvicorn manually:

uvicorn main:app --root-path /api/v1

Conclusion

As you can see, creating an API with FastAPI is dead simple, and the validation with Pydantic makes the code very expressive (and then needs less documentation in return) and less error-prone.

FastAPI comes with great performances and the possibility to use asynchronous requests out of the box with asyncio, which is great for demanding machine learning models. The example above about Named Entity Extraction with spaCy and FastAPI can almost be considered as production-ready (of course the API code is only a small part of a full clustered application). So far, FastAPI has never been the bottleneck in our NLPCloud.io infrastructure.

If you have any question, please don’t hesitate to ask!

Existe aussi en français
Htmx and Django for Single Page Applications

We are not fond of big Javascript frameworks at NLP Cloud. NLP Cloud is an API based on spaCy and HuggingFace transformers in order to propose Named Entity Recognition (NER), sentiment analysis, text classification, summarization, and much more. Our backoffice is very simple. Users can retrieve their API token, upload their custom spaCy models, upgrade their plan, send support messages… Nothing too complex so we didn’t feel the need for Vue.js or React.js for that. We then used this very cool combination of htmx and Django.

Let me show you how it works and tell you more about the advantages of this solution.

What is htmx and why use it?

htmx is the successor of intercooler.js. The concept behind these 2 projects is that you can do all sorts of advanced things like AJAX, CSS transitions, websockets, etc. with HTML only (meaning without writing a single line of Javascript). And the lib is very lite (9kB only).

Another very interesting thing is that, when doing asynchronous calls to your backend, htmx does not expect a JSON response but an HTML fragment response. So basically, contrary to Vue.js or React.js, your frontend does not have to deal with JSON data, but simply replaces some parts of the DOM with HTML fragments already rendered on the server side. So it allows you to 100% leverage your good old backend framework (templates, sessions, authentication, etc.) instead of turning it into a headless framework that only returns JSON. The idea is that the overhead of an HTML fragment compared to JSON is negligible during an HTTP request.

So, to sum up, here is why htmx is interesting when building a single page application (SPA):

  • No Javascript to write
  • Excellent backend frameworks like Django, Ruby On Rails, Laravel… can be fully utilized
  • Very small library (9kB) compared to the Vue or React frameworks
  • No preprocessing needed (Webpack, Babel, etc.) which makes the development experience much nicer

Installation

Installing htmx is just about loading the script like this in your HTML <head>:

<script src="https://unpkg.com/htmx.org@1.2.1"></script>

I won’t go into the details of Django’s installation here as this article essentially focuses on htmx.

Load Content Asynchronously

The most important thing when creating an SPA is that you want everything to load asynchronously. For example, when clicking a menu entry to open a new page, you don’t want the whole webpage to reload, but only the content that changes. Here is how to do that.

Let’s say our site is made up of 2 pages:

  • The token page showing the user his API token
  • The support page basically showing the support email to the user

We also want to display a loading bar while the new page is loading.

Frontend

On the frontend side, you would create a menu with 2 entries. And clicking an entry would show the loading bar and change the content of the page without reloading the whole page.

<progress id="content-loader" class="htmx-indicator" max="100"></progress>
<aside>
    <ul>
        <li><a hx-get="/token" hx-push-url="true"
                hx-target="#content" hx-swap="innerHTML" 
                hx-indicator="#content-loader">Token</a></li>
        <li><a hx-get="/support"
                hx-push-url="true" hx-target="#content" hx-swap="innerHTML"
                hx-indicator="#content-loader">Support</a></li>
    </ul>
</aside>
<div id="content">Hello and welcome to NLP Cloud!</div>

In the example above the loader is the <progress> element. It is hidden by default thanks to its class htmx-indicator. When a user clicks one of the 2 menu entries, it makes the loader visible thanks to hx-indicator="#content-loader".

When a user clicks the token menu entry, it performs a GET asynchronous call to the Django token url thanks to hx-get="/token". Django returns and HTML fragment that htmx puts in <div id="content"></div> thanks to hx-target="#content" hx-swap="innerHTML".

Same thing for the support menu entry.

Even if the page did not reload, we still want to update the URL in the browser in order to help the user understand where he is. That’s why we use hx-push-url="true".

As you can see we now have an SPA that is using HTML fragments behind the hood rather than JSON, with a mere 9kB lib, and only a couple of directives.

Backend

Of course the above does not work without the Django backend.

Here’s your urls.py:

from django.urls import path

from . import views

urlpatterns = [
    path('', views.index, name='index'),
    path('token', views.token, name='token'),
    path('support', views.support, name='support'),
]

Now your views.py:

def index(request):
    return render(request, 'backoffice/index.html')

def token(request):
    api_token = 'fake_token'

    return render(request, 'backoffice/token.html', {'token': api_token})

def support(request):
    return render(request, 'backoffice/support.html')

And last of all, in a templates/backoffice directory add the following templates.

index.html (i.e. basically the code we wrote above, but with Django url template tags):

<!DOCTYPE html>
<html>
    <head>
        <script src="https://unpkg.com/htmx.org@1.2.1"></script>
    </head>

    <body>
        <progress id="content-loader" class="htmx-indicator" max="100"></progress>
        <aside>
            <ul>
                <li><a hx-get="{% url 'home' %}"
                        hx-push-url="true" hx-target="#content" hx-swap="innerHTML"
                        hx-indicator="#content-loader">Home</a></li>
                <li><a hx-get="{% url 'token' %}" hx-push-url="true"
                        hx-target="#content" hx-swap="innerHTML" 
                        hx-indicator="#content-loader">Token</a></li>
            </ul>
        </aside>
        <div id="content">Hello and welcome to NLP Cloud!</div>
    <body>
</html>

token.html:

Here is your API token: {{ token }}

support.html:

For support questions, please contact support@nlpcloud.io

As you can see, all this is pure Django code using routing and templating as usual. No need of an API and Django Rest Framework here.

Allow Manual Page Reloading

The problem with the above is that if a user manually reloads the token or the support page, he will only end up with the HTML fragment instead of the whole HTML page.

The solution, on the Django side, is to render 2 different templates depending on whether the request is coming from htmx or not.

Here is how you could do it.

In your views.py you need to check whether the HTTP_HX_REQUEST header was passed in the request. If it was, it means this is a request from htmx and in that case you can show the HTML fragment only. If it was not, you need to render the full page.

def index(request):
    return render(request, 'backoffice/index.html')

def token(request):
    if request.META.get("HTTP_HX_REQUEST") != 'true':
        return render(request, 'backoffice/token_full.html', {'token': api_token})

    return render(request, 'backoffice/token.html', {'token': api_token})

def support(request):
    if request.META.get("HTTP_HX_REQUEST") != 'true':
        return render(request, 'backoffice/support_full.html')

    return render(request, 'backoffice/support.html')

Now in your index.html template you want to use blocks in order for the index page to be extended by all the other pages:

<!DOCTYPE html>
<html>
    <head>
        <script src="https://unpkg.com/htmx.org@1.2.1"></script>
    </head>

    <body>
        <progress id="content-loader" class="htmx-indicator" max="100"></progress>
        <aside>
            <ul>
                <li><a hx-get="{% url 'home' %}"
                        hx-push-url="true" hx-target="#content" hx-swap="innerHTML"
                        hx-indicator="#content-loader">Home</a></li>
                <li><a hx-get="{% url 'token' %}" hx-push-url="true"
                        hx-target="#content" hx-swap="innerHTML" 
                        hx-indicator="#content-loader">Token</a></li>
            </ul>
        </aside>
        <div id="content">{% block content %}{% endblock %}</div>
    <body>
</html>

Your token.html template is the same as before but now you need to add a second template called token_full.html in case the page is manually reloaded:


{% extends "home/index.html" %}

{% block content %}
    {% include "home/token.html" %}
{% endblock %}

Same for support.html, add a support_full.html file:


{% extends "home/index.html" %}

{% block content %}
    {% include "home/support.html" %}
{% endblock %}

We are basically extending the index.html template in order to build the full page all at once on the server side.

This is a small hack but this is not very complex, and a middleware can even be created for the occasion in order to make things even simpler.

What Else?

We only scratched the surface of htmx. This library (or framework?) includes tons of usefull other features like:

  • You can use the HTTP verb you want for your requests. Use hx-get for GET, hx-post for POST, etc.
  • You can use polling, websockets, and server side events, in order to listen to events coming from the server
  • You can use only a part of the HTML fragment returned by the server (hx-select)
  • You can leverage CSS transitions
  • You can easily work with forms and file uploads
  • You can use htmx’s hyperscript, which is a pseudo Javascript language that can easily be embedded in HTML tags for advanced usage

Conclusion

I’m very enthusiast about this htmx lib as you can see, and I do hope more and more people will realize they don’t necessarily need a huge JS framework for their project.

For the moment I’ve only integrated htmx into small codebases in production, but I’m pretty sure that htmx fits into large projects too. So far it’s been easy to maintain, lightweight, and its seamless integration with backend frameworks like Django is a must!

If some of you use htmx in production, I’d love to hear your feebacks too!

Existe aussi en français
Traefik Reverse Proxy with Docker Compose and Docker Swarm

My last article about Docker Swarm was the first of a series of articles I wanted to write about the stack used behind NLP Cloud. NLP Cloud is an API based on spaCy and HuggingFace transformers in order to propose Named Entity Recognition (NER), sentiment analysis, text classification, summarization, and much more. One challenge is that each model runs inside its own container, and new models are added to the cluster on a regular basis. So we need a reverse proxy which is both efficient and flexible in front of all these containers.

The solution we chose is Traefik.

I thought it would be interesting to write an article about how we implemented Traefik and why we chose it over standard reverse proxies like Nginx.

Why Traefik

Traefik is still a relatively new reverse proxy solution compared to Nginx or Apache, but it’s been gaining a lot of popularity. Traefik’s main advantage is that it seamlessly integrates with Docker, Docker Compose and Docker Swarm (and even Kubernetes and more): basically your whole Traefik configuration can be in your docker-compose.yml file which is very handy, and, whenever you add new services to your cluster, Traefik discovers them on the fly without having to restart anything.

So Traefik makes maintainability easier and is good from a high-availability standpoint.

It is developed in Go while Nginx is coded in C so I guess it makes a slight difference in terms of performance, but nothing that I could perceive, and in my opinion it is negligible compared to the advantages it gives you.

Traefik takes kind of a learning curve though and, even if their documentation is pretty good, it is still easy to make mistakes and hard to find where the problem is coming from, so let me give you a couple of ready-to-use examples below.

Install Traefik

Basically you don’t have much to do here. Traefik is just another Docker image you’ll need to add to your cluster as a service in your docker-compose.yml:

version: '3.8'
services:
    traefik:
        image: traefik:v2.3

There are several ways to integrate Traefik but, like I said above, we are going to go for the Docker Compose integration.

Basic Configuration

90% of the Traefik’s configuration is done through Docker labels.

Let’s say we have 3 services:

  • A corporate website that is simply served as a static website at http://nlpcloud.io
  • An en_core_web_sm spaCy model served through a FastAPI Python API at http://api.nlpcloud.io/en_core_web_sm
  • An en_core_web_lg spaCy model served through a FastAPI Python API at http://api.nlpcloud.io/en_core_web_lg

More details about spaCy NLP models here and FastAPI here.

Here is a basic local staging configuration routing the requests to the correct services in your docker-compose.yml:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
    corporate:
        image: <your corporate image>
        labels:
            - traefik.http.routers.corporate.rule=Host(`localhost`)
    en_core_web_sm:
        image: <your en_core_web_sm model API image>
        labels:
            - traefik.http.routers.en_core_web_sm.rule=Host(`api.localhost`) && PathPrefix(`/en_core_web_sm`)
    en_core_web_lg:
        image: <your en_core_web_lg model API image>
        labels:
            - traefik.http.routers.en_core_web_lg.rule=Host(`api.localhost`) && PathPrefix(`/en_core_web_lg`)

You can now access your corporate website at http://localhost, your en_core_web_sm model at http://api.localhost/en_core_web_sm, and your en_core_web_lg model at http://api.localhost/en_core_web_lg.

As you can see it’s dead simple.

It was for our local staging only, so we now want to do the same for production in a Docker Swarm cluster:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker.swarmmode
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
        deploy:
            placement:
                constraints:
                    - node.role == manager
    corporate:
        image: <your corporate image>
        deploy:
            labels:
                - traefik.http.routers.corporate.rule=Host(`nlpcloud.io`)
    en_core_web_sm:
        image: <your en_core_web_sm model API image>
        deploy:
            labels:
                - traefik.http.services.en_core_web_sm.loadbalancer.server.port=80
                - traefik.http.routers.en_core_web_sm.rule=Host(`api.nlpcloud.io`) && PathPrefix(`/en_core_web_sm`)
    en_core_web_lg:
        image: <your en_core_web_lg model API image>
        deploy:
            labels:
                - traefik.http.services.en_core_web_lg.loadbalancer.server.port=80
                - traefik.http.routers.en_core_web_lg.rule=Host(`api.nlpcloud.io`) && PathPrefix(`/en_core_web_lg`)

You can now access your corporate website at http://nlpcloud.io, your en_core_web_sm model at http://api.nlpcloud.io/en_core_web_sm, and your en_core_web_lg model at http://api.nlpcloud.io/en_core_web_lg.

It’s still fairly simple but the important things to notice are the following:

  • We should explicitely use the docker.swarmmode provider instead of docker
  • Labels should now be put in the deploy section
  • We need to manually declare the port of each service by using the loadbalancer directive (this has to be done manually because of Docker Swarm lacking the port auto discovery feature)
  • We have to make sure that Traefik will be deployed on a manager node of the Swarm by using constraints

You now have a fully fledged cluster thanks to Docker Swarm and Traefik. Now it’s likely that you have specific requirements and no doubt that the Trafik documentation will help. But let me show you a couple of features we use at NLP Cloud.

Forwarded Authentication

Let’s say your NLP API endpoints are protected and users need a token to reach them. A good solution for this use case is to leverage Traefik’s ForwardAuth.

Basically Traefik will forward all the user requests to a dedicated page you created for the occasion. This page will take care of checking the headers of the request (and maybe extract an authentication token for example) and determine whether the user has the right to access the resource. If it has, the page should return an HTTP 2XX code.

If a 2XX code is returned, Traefik will then make the actual request to the final API endpoint. Otherwise, it will return an error.

Please note that, for performance reasons, Traefik only forwards the user request headers to your authentication page, not the request body. So it’s not possible to authorize a user request based on the body of the request.

Here’s how to achieve it:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker.swarmmode
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
        deploy:
            placement:
                constraints:
                    - node.role == manager
    corporate:
        image: <your corporate image>
        deploy:
            labels:
                - traefik.http.routers.corporate.rule=Host(`nlpcloud.io`)
    en_core_web_sm:
        image: <your en_core_web_sm model API image>
        deploy:
            labels:
                - traefik.http.services.en_core_web_sm.loadbalancer.server.port=80
                - traefik.http.routers.en_core_web_sm.rule=Host(`api.nlpcloud.io`) && PathPrefix(`/en_core_web_sm`)
                - traefik.http.middlewares.forward_auth_api_en_core_web_sm.forwardauth.address=https://api.nlpcloud.io/auth/
                - traefik.http.routers.en_core_web_sm.middlewares=forward_auth_api_en_core_web_sm
    api_auth:
        image: <your api_auth image>
        deploy:
            labels:
                - traefik.http.services.en_core_web_sm.loadbalancer.server.port=80
                - traefik.http.routers.en_core_web_sm.rule=Host(`api.nlpcloud.io`) && PathPrefix(`/auth`)

At NLP Cloud, the api_auth service is actually a Django + Django Rest Framework image in charge of authenticating the requests.

Custom Error Pages

Maybe you don’t want to show raw Traefik error pages to users. If so, it’s possible to replace error pages with your custom error pages.

Traefik does not keep any custom error page in memory, but it can use error pages served by one of your services. When contacting your service in order to retrieve the custom error page, Traefik also passes the HTTP error code as a positional argument, so you can show different error pages based on the initial HTTP error.

Let’s says we have a small static website served by Nginx that serves your custom error pages. We want to use its error pages for HTTP errors from 400 to 599. Here’s how you would do it:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker.swarmmode
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
        deploy:
            placement:
                constraints:
                    - node.role == manager
            labels:
                - traefik.http.middlewares.handle-http-error.errors.status=400-599
                - traefik.http.middlewares.handle-http-error.errors.service=errors_service
                - traefik.http.middlewares.handle-http-error.errors.query=/{status}.html
    corporate:
        image: <your corporate image>
        deploy:
            labels:
                - traefik.http.routers.corporate.rule=Host(`nlpcloud.io`)
                - traefik.http.routers.corporate.middlewares=handle-http-error
    errors_service:
        image: <your static website image>
        deploy:
            labels:
                - traefik.http.routers.corporate.rule=Host(`nlpcloud.io/errors`)

For example thanks to the example above, a 404 error would now use this page: http://nlpcloud.io/errors/404.html

HTTPS

A cool feature from Traefik is that is can automatically provision and use TLS certificates with Let’s Encrypt.

They have a nice tutorial about how to set it up with Docker so I’m just pointing you to the right resource: https://doc.traefik.io/traefik/user-guides/docker-compose/acme-tls/

Raising Upload Size Limit

The default upload size limit is pretty low for performance reasons (I think it’s 4194304 bytes but I’m not 100% sure as it’s not in their docs).

In order to increase it, you need to use the maxRequestBodyBytes directive:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker.swarmmode
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
        deploy:
            placement:
                constraints:
                    - node.role == manager
    corporate:
        image: <your corporate image>
        deploy:
            labels:
                - traefik.http.routers.corporate.rule=Host(`nlpcloud.io`)
                - traefik.http.middlewares.upload-limit.buffering.maxRequestBodyBytes=20000000
                - traefik.http.routers.backoffice.middlewares=upload-limit

In the example above, we raised the upload limit to 20MB.

But don’t forget that uploading a huge file all at once is not necessarily the best option. Instead you want to cut the file in chunks and upload each chunk independantly. I might write an article about this in the future.

Debugging

There are a couple of things you can enable to help you debug Traefik.

First thing is to enable the debugging mode which will show you tons of stuffs about what Traefik is doing.

Second thing is to enable access logs in order to see all incoming HTTP requests.

Last of all, Traefik provides a cool built-in dashboard that helps debug your configuration. It is really useful as it is sometimes tricky to understand why things are not working.

In order to turn on the above features, you could do the following:

version: '3.8'
services:
    traefik:
        image: traefik:v2.4
        ports:
            - "80:80"
        command:
            - --providers.docker.swarmmode
            - --log.level=DEBUG
            - --accesslog
            - --api.dashboard=true
        volumes:
            - /var/run/docker.sock:/var/run/docker.sock:ro
        deploy:
            placement:
                constraints:
                    - node.role == manager
            labels:
                - traefik.http.routers.dashboard.rule=Host(`dashboard.nlpcloud.io`)
                - traefik.http.routers.dashboard.service=api@internal
                - traefik.http.middlewares.auth.basicauth.users=<your basic auth user>.<your basic auth hashed password>.
                - traefik.http.routers.dashboard.middlewares=auth

In this example we enabled debugging, access logs, and the dashboard that can be accessed at http://dashboard.nlpcloud.io with basic auth.

Conclusion

As you can see, Traefik is perfectly integrated with your Docker Compose configuration. If you want to change the config for a service, or add or remove services, just modify your docker-compose.yml and redeploy your Docker Swarm cluster. New changes will be taken into account, and services that were not modified don’t even have to restart, which is great for high availability.

I will keep writing a couple of articles about the stack we use at NLP Cloud. I think next one will be about our frontend and how we are using HTMX instead of big javascript frameworks.

If any question don’t hesitate to ask!

Existe aussi en français
Container Orchestration With Docker Swarm

NLP Cloud is a service I have contributed to recently. It is an API based on spaCy and HuggingFace transformers in order to propose Named Entity Recognition (NER), sentiment analysis, text classification, summarization, and much more. It is using several interesting technologies under the hood so I thought I would create a series of articles about this. This first one will be about container orchestration and how we are implementing it thanks to Docker Swarm. Hope it will be useful!

Why Container Orchestration

NLP Cloud is using tons of containers, mainly because each NLP model is running inside its own container. Not only do pre-trained models have their own containers, but also each user’s custom model has a dedicated container. It is very convenient for several reasons:

  • It is easy to run an NLP model on the server that has the best resources for it. Machine learning models are very resource hungry: they consume a lot of memory, and it is sometimes interesting to run them on a GPU (in case you are using NLP transformers for example). It is then best to deploy them onto a machine with specific hardware.
  • Horizontal scalability can be ensured by simply adding more replicas of the same NLP model
  • High availability is made easier thanks to redundancy and automatic failover
  • It helps lowering costs: scaling horizontally on a myriad of small machines is much more cost effective than scaling vertically on a couple of big machines

Of course setting up such an architecture takes time and skills but in the end it often pays off when you’re building a complex application.

Why Docker Swarm

Docker Swarm is usually opposed to Kubernetes and Kubernetes is supposed to be the winner of the container orchestration match today. But things are not so simple…

Kubernetes has tons of settings that make it perfect for very advanced use cases, but this versatility comes at a cost: Kubernetes is hard to install, configure, and maintain. It is actually so hard that today most companies using Kuberbetes are actually using a managed version of Kubernetes, on GCP for example, and cloud providers don’t all have the same implementation of Kubernetes in their managed offer.

Let’s not forget that Google intially built Kubernetes for their internal needs, the same way that Facebook built React for their own needs too. But you might not have to manage the same complexity for your project, and so many projects could be delivered faster and be maintained more easily by using simpler tools…

At NLP Cloud, we have a lot of containers but we do not need the complex advanced configuration capabilities of Kubernetes. We do not want to use a managed version of Kubernetes either: first for cost reasons, but also because we want to avoid vendor lock-in, and lastly for privacy reasons.

Docker Swarm also has an interesting advantage: it integrates seamlessly with Docker and Docker Compose. It makes configuration a breeze and for teams already used to working with Docker it creates no additional difficulty.

Install the Cluster

Let’s say we want to have 5 servers in our cluster:

  • 1 manager node that will orchestrate the whole cluster. It will also host the database (just an example, the DB could perfectly be on a worker too).
  • 1 worker node that will host our Python/Django backoffice
  • 3 worker nodes that will host the replicated FastAPI Python API serving an NLP model

We are deliberately omitting the reverse proxy that will load balance requests to the right nodes as it will be the topic of a next blog post.

Provision the Servers

Order 5 servers where you want. It can be OVH, Digital Ocean, AWS, GCP… doesn’t matter.

It’s important for you to think about the performance of each server depending on what it will be dedicated to. For example, for the node hosting a simple backoffice you might not need huge performance. For the node hosting the reverse proxy (not addressed in this tutorial) you might need more CPU than usual. And for the API nodes serving the NLP model you might want a lot of RAM, and maybe even GPU.

Install a Linux distribution on each server. I would go for the latest Ubuntu LTS version as far as I’m concerned.

On each server, install the Docker engine.

Now give each server a human friendly hostname. It will be usefull so next time you ssh into the server you will see this hostname in your prompt, which is a good practice in order to avoid working on the wrong server… But it will also be used by Docker Swarm as the name for the node. Run the following on each server:

echo <node name> /etc/hostname; hostname -F /etc/hostname

On the manager, login to your Docker registry so Docker Swarm can pull your images (no need to do this on the worker nodes):

docker login

Initialize the Cluster and Attach Nodes

On the manager node, run:

docker swarm init --advertise-addr <server IP address>

--advertise-addr <server IP address> is only needed if your server has several IP addresses on the same interface so Docker knows which one to choose.

Then, in order to attach worker nodes, run the following on the manager:

docker swarm join-token worker

The output will be something like docker swarm join --token SWMTKN-1-5tl7ya98erd9qtasdfml4lqbosbhfqv3asdf4p13-dzw6ugasdfk0arn0 172.173.174.175:2377

Copy this output and paste it to a worker node. Then repeat the join-token operation for each worker node.

You should now be able to see all your nodes by running:

docker node ls

Give Labels to your Nodes

It’s important to label your nodes properly as you will need these labels later in order for Docker Swarm to determine on which node should a container be deployed. If you do not specify which node you want your container to be deployed to, Docker Swarm will deploy it on any node available. This is clearly not what you want.

Let’s say that your backoffice requires few resources and is basically stateless. So the latter can be deployed to any cheap worker node. Your API is stateless too but, on the contrary, it is memory hungry and requires specific hardware dedicated to machine learning, so you want to deploy it only to any machine learning worker node. Last of all, your database is not stateless so it always has to be deployed to the very same server: let’s say this server will be our manager node (but it could very well be a worker node too).

Do the below on the manager.

The manager will host the database so give it the “database” label:

docker node update --label-add type=database <manager name>

Give the “cheap” label to the worker that has poor performances and that will host the backoffice:

docker node update --label-add type=cheap <backoffice worker name>

Last of all, give the “machine-learning” label to all the workers that will host NLP models:

docker node update --label-add type=machine-learning <api worker 1 name>
docker node update --label-add type=machine-learning <api worker 2 name>
docker node update --label-add type=machine-learning <api worker 3 name>

Set Up Configuration With Docker Compose

If you used Docker Compose already you will most likely find the transition to Swarm fairly easy.

If you do not add anything to an existing docker-compose.yml file it will work with Docker Swarm but basically your containers will be deployed anywhere without your control, and they won’t be able to talk to each other.

Network

In order for containers to communicate, they should be on the same virtual network. For example a Python/Django application, a FastAPI API, and a PostgreSQL database should be on the same network to work together. We will manually create the main_network network later right before deploying, so let’s use it now in our docker-compose.yml:

version: "3.8"

networks:
  main_network:
    external: true

services:
  backoffice:
    image: <path to your custom Django image>
    depends_on:
      - database
    networks:
      - main_network
  api:
    image: <path to your custom FastAPI image>
    depends_on:
      - database
    networks:
      - main_network
  database:
    image: postgres:13
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=db_name
    volumes:
      - /local/path/to/postgresql/data:/var/lib/postgresql/data
    networks:
      - main_network

Deployment Details

Now you want to tell Docker Swarm which server each service will be deployed to. This is where you are going to use the labels that you created earlier.

Basically all this is about using the constraints directive like this:

version: "3.8"

networks:
  main_network:
    external: true

services:
  backoffice:
    image: <path to your custom Django image>
    depends_on:
      - database
    networks:
      - main_network
    deploy:
      placement: 
        constraints:
          - node.role == worker
          - node.labels.type == cheap
  api:
    image: <path to your custom FastAPI image>
    depends_on:
      - database
    networks:
      - main_network
    deploy:
      placement: 
        constraints:
          - node.role == worker
          - node.labels.type == machine-learning
  database:
    image: postgres:13
    environment:
      - POSTGRES_USER=user
      - POSTGRES_PASSWORD=password
      - POSTGRES_DB=db_name
    volumes:
      - /local/path/to/postgresql/data:/var/lib/postgresql/data
    networks:
      - main_network
    deploy:
      placement: 
        constraints:
          - node.role == manager
          - node.labels.type == database

Resources Reservation and Limitation

It can be dangerous to ship your containers as is for 2 reasons:

  • The orchestrator might deploy them to a server that doesn’t have enough resources available (because other containers consume the whole memory available for example)
  • One of your containers might consume more resources than expected and eventually cause troubles to the host server. For example if your machine learning model happens to consume too much memory, it can cause the host to trigger the OOM protection and start killing processes in order to free some RAM. By default the Docker engine is among the very last processes to be killed by the host but if it has to happen it means that all your containers on this host will shut down…

In order to mitigate the above, you can use the reservations and limits directives:

  • reservations makes sure a container is deployed only if the target server has enough resources available. If it hasn’t, the orchestrator won’t deploy it until the necessary ressources are available.
  • limits prevents a container from consuming too many resources once it is deployed somewhere.

Let’s say we want our API container - embedding a machine learning model - to be deployed only if 5GB of RAM and half the CPU are available. Let’s also say the API can consume up to 10GB of RAM and 80% of the CPU. Here’s what we should do:

version: "3.8"

networks:
  main_network:
    external: true

services:
  api:
    image: <path to your custom FastAPI image>
    depends_on:
      - database
    networks:
      - main_network
    deploy:
      placement: 
        constraints:
          - node.role == worker
          - node.labels.type == machine-learning
      resources:
        limits:
          cpus: '0.8'
          memory: 10G
        reservations:
          cpus: '0.5'
          memory: 5G

Replication

In order to implement horizontal scalability, you might want to replicate some of your stateless applications. You just need to use the replicas directive for this. For example let’s say we want our API to have 3 replicas, here’s how to do it:

version: "3.8"

networks:
  main_network:
    external: true

services:
  api:
    image: <path to your custom FastAPI image>
    depends_on:
      - database
    networks:
      - main_network
    deploy:
      placement: 
        constraints:
          - node.role == worker
          - node.labels.type == machine-learning
      resources:
        limits:
          cpus: '0.8'
          memory: 10G
        reservations:
          cpus: '0.5'
          memory: 5G
      replicas: 3

More

More settings are available for more control on your cluster orchestration. Don’t hesitate to refer to the docs for more details.

Secrets

Docker Compose has a built-in convenient way to manage secrets by storing each secret into an external individual file. Thus these files are not part of your configuration and can even be encrypted if necessary, which is great for security.

Let’s say you want to secure the PostgreSQL DB credentials.

First create 3 secret files on your local machines:

  • Create a db_name.secret file and put the DB name in it
  • Create a db_user.secret file and put the DB user in it
  • Create a db_password.secret file and put the DB password in it

Then in your Docker Compose file you can use the secrets this way:

version: "3.8"

networks:
  main_network:
    external: true

secrets:
  db_name:
    file: "./secrets/db_name.secret"
  db_user:
    file: "./secrets/db_user.secret"
  db_password:
    file: "./secrets/db_password.secret"

services:
  database:
    image: postgres:13
    secrets:
      - "db_name"
      - "db_user"
      - "db_password"
    # Adding the _FILE prefix makes the Postgres image to automatically
    # detect secrets and properly load them from files.
    environment:
      - POSTGRES_USER_FILE=/run/secrets/db_user
      - POSTGRES_PASSWORD_FILE=/run/secrets/db_password
      - POSTGRES_DB_FILE=/run/secrets/db_name
    volumes:
      - /local/path/to/postgresql/data:/var/lib/postgresql/data
    deploy:
      placement:
        constraints:
          - node.role == manager
          - node.labels.type == database
    networks:
      - main_network

Secret files are automatically injected into the containers in /run/secrets by Docker Compose. Careful though: these secrets are located in files, not in environment variables. So you then need to manually open these files and read the secrets. The PostgreSQL image has a convenient feature: if you append the _FILE suffix to the environment variable, the image will automatically read the secrets from files.

Staging VS Production

You most likely want to have at least 2 different types of Docker Compose configurations:

  • 1 for your local machine that will be used both for the Docker images creation, but also as a staging environment
  • 1 for production

You have 2 choices. Either leverage the Docker Compose inheritance feature so you only have to write one big docker-compose.yml base file and then write an additional small staging.yml file dedicated to staging and another additional small production.yml file dedicated to production.

In the end at NLP Cloud we ended up realizing that our staging and production configurations were so different that it was easier to just maintain 2 different big files: one for staging and one for production. The main reason is that our production environment uses Docker Swarm but our staging environment doesn’t, so playing with both is pretty impractical.

Deploy

Now we assume that you have locally built your images and pushed them to your Docker registry. Let’s say we only have one single production.yml file for production.

Copy your production.yml file to the server using scp:

scp production.yml <server user>@<server IP>:/remote/path/to/project

Copy your secrets too (and make sure to upload them to the folder you declared in the secrets section of your Docker Compose file):

scp /local/path/to/secrets <server user>@<server IP>:/remote/path/to/secrets

Manually create the network that we’re using in our Docker Compose file. Please note it’s also possible not to do it and let Docker Swarm automatically create it if it’s declared in your Docker Compose file. But we noticed it’s creating erratic behaviors when recreating the stack because Docker does not recreate the network fast enough.

docker network create --driver=overlay main_network

You also need to create the volumes directories manually. The only volume we have in this tuto is for the database. So let’s create it on the node hosting the DB (i.e. the manager):

mkdir -p /local/path/to/postgresql/data

Ok everything is set, so now is time to deploy the whole stack!

docker stack deploy --with-registry-auth -c production.yml <stack name>

The --with-registry-auth option is needed if you need to pull images located on password protected registries.

Wait a moment as Docker Swarm is now pulling all the images and installing them on the nodes. Then check if everything went fine:

docker service ls

You should see something like the following:

ID             NAME                       MODE         REPLICAS   IMAGE
f1ze8qgf24c7   <stack name>_backoffice    replicated   1/1        <path to your custom Python/Django image>     
gxboram56dka   <stack name>_database      replicated   1/1        postgres:13      
3y1nmb6g2xoj   <stack name>_api           replicated   3/3        <path to your custom FastAPI image>      

The important thing is that REPLICAS should all be at their maximum. Otherwise it means that Docker is still pulling or installing your images, or that something went wrong.

Manage the Cluster

Now that your cluster is up and running, here are a couple of usefull things you might want to do to administer your cluster:

  • See all applications and where they are deployed: docker stack ps <stack name>
  • See applications on a specific node: docker node ps <node name>
  • See logs of an application: docker service logs <stack name>_<service name>
  • Completely remove the whole stack: docker stack rm <stack name>

Everytime you want to deploy a new image to the cluster, first upload it to your registry, and just run the docker stack deploy command again on the manager.

Conclusion

As you can see, setting up a Docker Swarm cluster is far from complex, especially when you think about the actual complexity that has to be handled under the hood in such distributed sytems.

Of course many more options are available and you will most likely want to read the documention. Also we did not talk about the reverse proxy/load balancing aspect but it’s an important one. In a next tutorial we will see how to achieve this with Traefik.

At NLP Cloud our configuration is obviously much more complex than what we showed above, and we had to face several tricky challenges in order for our architecture to be both fast and easy to maintain. For example, we have so many machine learning containers that manually writing the configuration file for each container was not an option, so new auto generation mechanisms had to be implemented.

If you are interested in having more in-depth details please don’t hesitate to ask, it will be pleasure to share.

Existe aussi en français