Named Entity Recognition (NER) is an interesting NLP feature that is made very easy thanks to spaCy. If you want to expose your NER model to the world, you can easily build an API with FastAPI.
FastAPI is a new API engine that has just been released a couple of months ago. It makes API development both fast and convenient.
As for spaCy, in case you don’t know it yet, it’s a great open-source framework for NLP, and especially NER.
Code example
We want to build an API endpoint that will return entities from a simple sentence: “John Doe is a Go Developer at Google”.
The following code is mostly coming from this great “spacy-api-docker” repo by jgontrum (thanks!): https://github.com/jgontrum/spacy-api-docker/, and most specifically from this file: https://github.com/jgontrum/spacy-api-docker/blob/master/displacy_service/parse.py.
The API will return each entity along with it’s position.
[
{
"end": 8,
"start": 0,
"text": "John Doe",
"type": "PERSON"
},
{
"end": 25,
"start": 13,
"text": "Go Developer",
"type": "POSITION"
},
{
"end": 35,
"start": 30,
"text": "Google",
"type": "ORG"
},
]
Here is the code:
import spacy
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List
en_core_web_lg = spacy.load("en_core_web_lg")
api = FastAPI()
class Input(BaseModel):
sentence: str
class Extraction(BaseModel):
first_index: int
last_index: int
name: str
content: str
class Output(BaseModel):
extractions: List[Extraction]
@api.post("/extractions", response_model=Output)
def extractions(input: Input):
document = en_core_web_lg(input.sentence)
extractions = []
for entity in document.ents:
extraction = {}
extraction["first_index"] = entity.start_char
extraction["last_index"] = entity.end_char
extraction["name"] = entity.label_
extraction["content"] = entity.text
extractions.append(extraction)
return {"extractions": extractions}
First we load the spaCy model:
en_core_web_lg = spacy.load("en_core_web_lg")
Then we perform NER:
document = en_core_web_lg(input.sentence)
# [...]
document.ents
Data validation
Thanks to FastAPI it is easy to perform input and output data validation:
class Extraction(BaseModel):
first_index: int
last_index: int
name: str
content: str
class Output(BaseModel):
extractions: List[Extraction]
Conclusion
Thanks to spaCy and FastAPI, building an entity extraction API has never been so easy. I hope it will help you for your next project!
If you have questions, please let me know!