FastAPI is a new Python API framework that is more and more used in production today. We are using FastAPI under the hood behind NLP Cloud. NLP Cloud is an API based on spaCy and HuggingFace transformers in order to propose Named Entity Recognition (NER), sentiment analysis, text classification, summarization, and much more. FastAPI helped us quickly build a fast and robust machine learning API serving NLP models.

Let me tell you why we made such a choice, and show you how to implement an API based on FastAPI and spaCy for Named Entity Recognition (NER).

Why FastAPI?

Until recently, I’ve always used Django Rest Framework for Python APIs. But FastAPI is proposing several interesting features:

  • It is very fast
  • It is well documented
  • It is easy to use
  • It automatically generates API schemas for you (like OpenAPI)
  • It uses type validation with Pydantic under the hood. For a Go developer like myself who is used to static typing, it’s very cool to able to leverage type hints like this. It makes the code clearer, and less error-prone.

FastAPI’s performances are supposed to make it a great candidate for machine learning APIs. Given that we’re serving a lot of demanding NLP models based on spaCy and transformers at NLP Cloud, FastAPI is a great solution.

Set Up FastAPI

The first option you have is to install FastAPI and Uvicorn (the ASGI server in front of FastAPI) by yourself:

pip install fastapi[all]

As you can see, FastAPI is running behind an ASGI server, which means it can natively work with asynchronous Python requests with asyncio.

Then you can run your app with something like this:

uvicorn main:app

Another option is to use one the Docker images generously provided by Sebastián Ramírez, the creator of FastAPI. These images are maintained and work out of the box.

For example the Uvicorn + Gunicorn + FastAPI image adds Gunicorn to the stack in order to handle parallel processes. Basically Uvicorn handles multiple parallel requests within one single Python process, and Gunicorn handles multiple parallel Python processes.

The application is supposed to automatically start with docker run if you properly follow the image documentation.

These images are customizable. For example, you can tweak the number of parallel processes created by Gunicorn. It’s important to play with such parameters depending on the resources demanded by your API. If your API is serving a machine learning model that takes several GBs of memory, you might want to decrease Gunicorn’s default concurrency, otherwise your application will quickly consume too much memory.

Simple FastAPI + spaCy API for NER

Let’s say you want to create an API endpoint that is doing Named Entity Recognition (NER) with spaCy. Basically, NER is about extracting entities like name, company, job title… from a sentence. More details about NER here if needed.

This endpoint will take a sentence as an input, and will return a list of entities. Each entity is made up of the position of the first character of the entity, the last position of the entity, the type of the entity, and the text of the entity itself.

The endpoint will be queried with POST requests this way:

curl "" \
  -X POST \
  -d '{"text":"John Doe is a Go Developer at Google"}'

And it will return something like this:

    "end": 8,
    "start": 0,
    "text": "John Doe",
    "type": "PERSON"
    "end": 25,
    "start": 13,
    "text": "Go Developer",
    "type": "POSITION"
    "end": 35,
    "start": 30,
    "text": "Google",
    "type": "ORG"

Here is how we could do it:

import spacy
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List

en_core_web_lg = spacy.load("en_core_web_lg")

api = FastAPI()

class Input(BaseModel):
    sentence: str

class Extraction(BaseModel):
    first_index: int
    last_index: int
    name: str
    content: str

class Output(BaseModel):
    extractions: List[Extraction]"/extractions", response_model=Output)
def extractions(input: Input):
    document = en_core_web_lg(input.sentence)

    extractions = []
    for entity in document.ents:
      extraction = {}
      extraction["first_index"] = entity.start_char
      extraction["last_index"] = entity.end_char
      extraction["name"] = entity.label_
      extraction["content"] = entity.text

    return {"extractions": extractions}

The first important thing here is that we’re loading the spaCy model. For our example we’re using a large spaCy pre-trained model for the english language. Large models take more memory and more disk space, but give a better accuracy as they were trained on bigger datasets.

en_core_web_lg = spacy.load("en_core_web_lg")

Later, we are using this spaCy model for NER by doing the following:

document = en_core_web_lg(input.sentence)
# [...]

The second thing, which is an amazing feature of FastAPI, is the ability to force data validation with Pydantic. Basically, you need to declare in advance which will be the format of your user input, and the format of the API response. If you’re a Go developer, you’ll find it very similar to JSON unmarshalling with structs. For example, we are declaring the format of a returned entity this way:

class Extraction(BaseModel):
    first_index: int
    last_index: int
    name: str
    content: str

Note that start and end are positions in the sentence, so they are integers, and type and text are strings. If the API is trying to return an entity that does not implement this format (for example if start is not an integer), FastAPI will raise an error.

As you can see, it is possible to embed a validation class into another one. Here we are returning a list of entities, so we need to declare the following:

class Output(BaseModel):
    extractions: List[Extraction]

Some simple types like int and str are built-in, but more complex types like List need to be explicitly imported.

For brevity reasons, the response validation can be implemented within a decorator:"/extractions", response_model=Output)

More Advanced Data Validation

You can do many more advanced validation things with FastAPI and Pydantic. For example, if you need the user input to have a minimum length of 10 characters, you can do the following:

from pydantic import BaseModel, constr

class Input(BaseModel):
    sentence: constr(min_length=10)

Now, what if Pydantic validation passes, but you later realize that there’s something wrong with the data so you want to return an HTTP 400 code?

Simply raise an HTTPException:

from fastapi import HTTPException

raise HTTPException(
            status_code=400, detail="Your request is malformed")

It’s just a couple of examples, you can do much more! Just have a look at the FastAPI and Pydantic docs.

Root Path

It’s very common to run such APIs behind a reverse proxy. For example we’re using the Traefik reverse proxy behind

A tricky thing when running behind a reverse proxy is that your sub-application (here the API) does not necessarily know about the whole URL path. And actually that’s great because it shows that your API is loosely coupled to the rest of your application.

For example here we want our API to believe that the endpoint URL is /entities, but actually the real URL might be something like /api/v1/entities. Here’s how to do it by setting a root path:

api = FastAPI(root_path="/api/v1")

You can also achieve it by passing an extra parameter to Uvicorn in case you’re starting Uvicorn manually:

uvicorn main:app --root-path /api/v1


As you can see, creating an API with FastAPI is dead simple, and the validation with Pydantic makes the code very expressive (and then needs less documentation in return) and less error-prone.

FastAPI comes with great performances and the possibility to use asynchronous requests out of the box with asyncio, which is great for demanding machine learning models. The example above about Named Entity Extraction with spaCy and FastAPI can almost be considered as production-ready (of course the API code is only a small part of a full clustered application). So far, FastAPI has never been the bottleneck in our infrastructure.

If you have any question, please don’t hesitate to ask!

Existe aussi en français

API Rate Limiting With Traefik, Docker, Go, and Caching

Limiting API usage based on advanced rate limiting rule is not so easy. In order to achieve this behind the NLP Cloud API, we're using a combination of Docker, Traefik (as a reverse proxy) and local caching within a Go script. When done correctly, you can considerably improve the performance of your rate limiting and properly throttle API requests without sacrificing speed of the requests. Continue reading