Building a modern application with a Golang API backend + a Vue.js SPA frontend using Docker

The goal of this article is to show a real application I made recently using Go, Vue.js and Docker which is in production today. Tutorials are sometimes disappointing because they do not talk about real life situations so I tried to put things differently here. I won’t comment the whole code as it would take ages but explain the overall project structure, which important choices I made, and why. I’ll also try to emphasize interesting parts of code worth commenting.

The code of the whole app is here on my GitHub, maybe you should open it in parallel of this article.

Purpose of the Application

This application is dedicated to presenting data from various databases in a user friendly way. The main features are the following:

  • user has to enter credentials in order to use the Single Page Application (SPA) frontend
  • user can select various interfaces in a left panel in order to retrieve data from various db tables
  • user can decide either to only count results returned by db or get the full results
  • if results returned by db are lightweight enough then they are returned by the API and displayed within the SPA app inside a nice data table. User can also decide to export it as CSV.
  • if results are too heavyweight, then results are sent asynchronously to the user by email within a .zip archive
  • as input criteria, the user can enter text or CSV files listing a big amount of criteria
  • some user inputs are select lists whose values are loaded dynamically from db

Project’s Structure and Tooling

This project is made up of 2 Docker containers:

  • a container for a backend API written in Go. No need of an HTTP server here since Go already has a very efficient built-in HTTP server (net/http). This application exposes a RESTful API in order to get requests from frontend and return results retrieved from several databases.
  • a container for a frontend interface using a Vue.js SPA. Here an Nginx server is needed in order to serve static files.

Here is the Dockerfile of my Go application:

FROM golang
VOLUME /var/log/backend
COPY src /go/src
RUN go install go_project
CMD /go/bin/go_project
EXPOSE 8000

Dead simple as you can see. I’m using a pre-built Docker Golang image based on Debian.

My frontend Dockerfile is slightly bigger because I need to install Nginx, but still very simple:

FROM ubuntu:xenial

RUN apt-get update && apt-get install -y \
    nginx \
    && rm -rf /var/lib/apt/lists/*

COPY site.conf /etc/nginx/sites-available
RUN ln -s /etc/nginx/sites-available/site.conf /etc/nginx/sites-enabled
COPY .htpasswd /etc/nginx

COPY startup.sh /home/
RUN chmod 777 /home/startup.sh
CMD ["bash","/home/startup.sh"]

EXPOSE 9000

COPY vue_project/dist /home/html/

The startup.sh simply starts the Nginx server. Here is my Nginx configuration (site.conf):

server {

    listen 9000;

    server_name api.example.com;

    # In order to avoid favicon errors on some navigators like IE
    # which would pollute Nginx logs (do use the "=")
    location = /favicon.ico { access_log off; log_not_found off; }

    # Static folder that Nginx must serve
    location / {
        root /home/html;
        auth_basic "Restricted";
        auth_basic_user_file /etc/nginx/.htpasswd;
    }

    # robots.txt file generated on the fly
    location /robots.txt {
        return 200 "User-agent: *\nDisallow: /";
    }

}

As you can see, authentication is needed in order to use the frontend app. I implemented this within a .htpasswd file.

Actually using Docker for the Go application is not really a big advantage since Go needs no external dependency once compiled making deployment very easy. Sometimes shipping a Go app inside Docker can be useful if you have external files needed in addition to your Go binary (like HTML templates or config files for example). This is not the case here but I still used Docker for consistency reasons: all my services are deployed through Docker so I do not want to have special cases to deal with.

The Go application is made up of multiple files. This is just for readability reasons and everything could have been put into one single file. You must keep in mind that when splitting the application like this, you need to export things (variables, structs, functions, …) you want to use across multiple files (using a capitalized first letter). During development you also need to use go run with a wildcard like this:

go run src/go_project/*.go

I’m using a couple of Go external libraries (so few thanks to the already very comprehensive Go’s standard library!):

  • gorilla/mux for the routing of REST API requests, especially for endpoints expecting positional arguments
  • rs/cors for easier handling of CORS (which can be a nightmare)
  • gopkg.in/gomail.v2 for email handling, especially for easy addition of attachments

Structure and tooling are much more complex regarding the frontend part. Here is an article dedicated to this. Actually this complexity only affects the development part because in the end, once everything is compiled, you only get regular HTML/CSS/JS files that you simply copy paste into your Docker container.

Dev vs Prod

Configuration is different in development and production. During development I’m working on a locally replicated database, I’m logging errors to console instead of file, I’m using local servers, … How to manage this seamlessly?

In the Vue.js app I need to either connect to a local development API (127.0.0.1) or a production API (api.example.com). So I created a dedicated http-constants.js which returns either a local address or a production address depending on whether we launched the npm run dev command or npm run build command. See this article for more details.

In the Go app, multiple parameters change depending on whether I’m in development mode or production mode. In order to manage this, I’m using environment variables passed to the Go app by Docker. Setting configuration through environment variables is supposed to be best practice according to the 12 factor app. First we need to set environment variables during container creation thanks to the -e option:

docker run --net my_network \
--ip 172.50.0.10 \
-p 8000:8000 \
-e "CORS_ALLOWED_ORIGIN=http://api.example.com:9000" \
-e "REMOTE_DB_HOST=10.10.10.10" \
-e "LOCAL_DB_HOST=172.50.0.1" \
-e "LOG_FILE_PATH=/var/log/backend/errors.log" \
-e "USER_EMAIL=me@example.com" \
-v /var/log/backend:/var/log/backend \
-d --name backend_v1_container myaccount/myrepo:backend_v1

Then those variables are retrieved within the Go program thanks to the os.getenv() function. Here is how I managed it in main.go:

// Initialize db parameters
var localHost string = getLocalHost()
var remoteHost string = getRemoteHost()
const (
	// Local DB:
	localPort     = 5432
	localUser     = "my_local_user"
	localPassword = "my_local_pass"
	localDbname   = "my_local_db"

	// Remote DB:
	remotePort     = 5432
	remoteUser     = "my_remote_user"
	remotePassword = "my_remote_pass"
	remoteDbname   = "my_remote_db"
)

// getLogFilePath gets log file path from env var set by Docker run
func getLogFilePath() string {
	envContent := os.Getenv("LOG_FILE_PATH")
	return envContent
}

// getLocalHost gets local db host from env var set by Docker run.
// If no env var set, set it to localhost.
func getLocalHost() string {
	envContent := os.Getenv("LOCAL_DB_HOST")
	if envContent == "" {
		envContent = "127.0.0.1"
	}
	return envContent
}

// getRemoteHost gets remote db host from env var set by Docker run.
// If no env var set, set it to localhost.
func getRemoteHost() string {
	envContent := os.Getenv("REMOTE_DB_HOST")
	if envContent == "" {
		envContent = "127.0.0.1"
	}
	return envContent
}

// getRemoteHost gets remote db host from env var set by Docker run.
// If no env var set, set it to localhost.
func getCorsAllowedOrigin() string {
	envContent := os.Getenv("CORS_ALLOWED_ORIGIN")
	if envContent == "" {
		envContent = "http://localhost:8080"
	}
	return envContent
}

// getUserEmail gets user email of the person who will receive the results
// from env var set by Docker run.
// If no env var set, set it to admin.
func getUserEmail() string {
	envContent := os.Getenv("USER_EMAIL")
	if envContent == "" {
		envContent = "admin@example.com"
	}
	return envContent
}

As you can see, if the production env variable is not set, we set a default value for local development. Then we can use those dedicated functions anywhere in the program. For example, here is how I’m handling the logging feature (log to console in development mode, and log to file in production):

log.SetFlags(log.LstdFlags | log.Lshortfile)            // add line number to logger
if logFilePath := getLogFilePath(); logFilePath != "" { // write to log file only if logFilePath is set
	f, err := os.OpenFile(logFilePath, os.O_RDWR|os.O_CREATE|os.O_APPEND, 0666)
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()
	log.SetOutput(f)
}

Note that logging also involves the use of a shared volume. Indeed I want my log files to be accessed from the Docker host easily. That’s why I added -v /var/log/backend:/var/log/backend to the docker run command above and put a specific VOLUME directive in the Dockerfile.

Design of the Frontend App with Vuetify.js

I have never been fond of spending days working on design, especially for small apps like this one. That’s why I’m using Vuetify.js which is a great framework to be used on top of Vue.js providing you with ready to use beautiful components. Vuetify uses Google’s material design, which looks very good to me.

Memory Usage

I’ve faced quite a lot of memory issues while building this app due to the fact that some SQL queries can possibly return a huge amount of data.

In the Go Backend

Rows returned from db are put in an array of structs. When millions of rows are returned, manipulating this array becomes very costly in terms of memory. The solution is to put as much logic as you can in your SQL request instead of your Go program. PostgreSQL is excellent at optimizing performances and in my case databases are running on PostgreSQL 10 which increased performance considerably thanks to parallel computing on some operations. Plus my databases have dedicated resources so I should use it as much as possible.

Regarding the CSV generation, you also need to consider whether you should store the CSV in memory or write it to disk. Personally I’m writing it to disk in order to reduce memory usage.

Still, I also had to increase the RAM of my server.

In the Vue.js Frontend

Clearly, a browser cannot handle too much content. If too many rows are to be displayed in the browser, rendering will fail. First solution is what I did: above a certain amount of rows returned by db, send results by email in a .zip archive. Another solution could be that results in browser are paginated and each new page actually triggers a new request to server (behind the hood, you would need to use LIMIT in your SQL request).

Touchy Parts of Code

Here some special parts of code that are worth commenting in my opinion because they can be pretty original or tricky.

Multiple Asynchronous Calls with Axios

My frontend contains multiple HTML selects and I want the values of these lists to be loaded dynamically from the API. For this I need to use axios.all() and axios.spread() in order to make multiple parallel API calls with Axios. Axios’ documentation is not that good in my opinion. It is important to understand that you have 2 choices here:

  • catching error for each request in axios.all: HTTP.get('/get-countries-list').catch(...)
  • catching error globally after axios.spread: .then(axios.spread(...)).catch(...)

The first option allows you to display precise error messages depending on which request raised an error, but this is non blocking so we still enter axios.spread() despite the error and some of the parameters will be undefined in axios.spread() so you need to handle it. In the second option, a global error is raised as soon as one of the requests fails at least, and we do not enter axios.spread().

I chose the 2nd option: if at least one of the API calls fails, then all the calls fail:

created () {
    axios.all([
      HTTP.get('/get-countries-list'),
      HTTP.get('/get-companies-industries-list'),
      HTTP.get('/get-companies-sizes-list'),
      HTTP.get('/get-companies-types-list'),
      HTTP.get('/get-contacts-industries-list'),
      HTTP.get('/get-contacts-functions-list'),
      HTTP.get('/get-contacts-levels-list')
    ])
    // If all requests succeed
    .then(axios.spread(function (
      // Each response comes from the get query above
      countriesResp,
      companyIndustriesResp,
      companySizesResp,
      companyTypesResp,
      contactIndustriesResp,
      contactFunctionsResp,
      contactLevelsResp
    ) {
      // Put countries retrieved from API into an array available to Vue.js
      this.countriesAreLoading = false
      this.countries = []
      for (let i = countriesResp.data.length - 1; i >= 0; i--) {
        this.countries.push(countriesResp.data[i].countryName)
      }
      // Remove France and put it at the top for convenience
      let indexOfFrance = this.countries.indexOf('France')
      this.countries.splice(indexOfFrance, 1)
      // Sort the data alphabetically for convenience
      this.countries.sort()
      this.countries.unshift('France')

      // Put company industries retrieved from API into an array available to Vue.js
      this.companyIndustriesAreLoading = false
      this.companyIndustries = []
      for (let i = companyIndustriesResp.data.length - 1; i >= 0; i--) {
        this.companyIndustries.push(companyIndustriesResp.data[i].industryName)
      }
      this.companyIndustries.sort()

    [...]

    }
    // bind(this) is needed in order to inject this of Vue.js (otherwise
    // this would be the axios instance)
    .bind(this)))
    // In case one of the get request failed, stop everything and tell the user
    .catch(e => {
      alert('Could not load the full input lists in form.')
      this.countriesAreLoading = false
      this.companyIndustriesAreLoading = false
      this.companySizesAreLoading = false
      this.companyTypesAreLoading = false
      this.contactIndustriesAreLoading = false
      this.contactFunctionsAreLoading = false
      this.contactLevelsAreLoading = false
    })
},

Generate CSV in Javascript

I wish there was a straightforward solution in order to create a CSV in javascript and serve it to the user as a download, but it seems there isn’t, so here is my solution:

generateCSV: function () {
      let csvArray = [
        'data:text/csv;charset=utf-8,' +
        'Company Id;' +
        'Company Name;' +
        'Company Domain;' +
        'Company Website;' +
        [...]
        'Contact Update Date'
      ]
      this.resultsRows.forEach(function (row) {
        let csvRow = row['compId'] + ';' +
          row['compName'] + ';' +
          row['compDomain'] + ';' +
          row['compWebsite'] + ';' +
          [...]
          row['contUpdatedOn']
        csvArray.push(csvRow)
      })
      let csvContent = csvArray.join('\r\n')
      let encodedUri = encodeURI(csvContent)
      let link = document.createElement('a')
      link.setAttribute('href', encodedUri)
      link.setAttribute('download', 'companies_and_contacts_extracted.csv')
      document.body.appendChild(link)
      link.click()
    }
}

Get Data Sent by Axios in Go

Axios’ POST data are necessarily sent as JSON. Unfortunately currently there is no way to change this. Go has a useful PostFormValue function that easily retrieves POST data encoded as form data but unfortunately it does not handle JSON encoded data, so I had to unmarshal JSON to a struct in order to retrieve POST data:

body, err := ioutil.ReadAll(r.Body)
if err != nil {
	err = CustErr(err, "Cannot read request body.\nStopping here.")
	log.Println(err)
	http.Error(w, "Internal server error", http.StatusInternalServerError)
	return
}

// Store JSON data in a userInput struct
var userInput UserInput
err = json.Unmarshal(body, &userInput)
if err != nil {
	err = CustErr(err, "Cannot unmarshall json.\nStopping here.")
	log.Println(err)
	http.Error(w, "Internal server error", http.StatusInternalServerError)
	return
}

Variadic Functions in Go

The user can enter a variable number of criteria that will be used within a single SQL query. Basically, each new criteria is a new SQL WHERE clause. As we do not know in advance how many parameters will be passed to the database/sql query() function, we need to use the variadic property of the query() function here. A variadic function is a function that accepts a variable number of parameters. In Python you would use *args or *kwargs. Here we’re using the ... notation. The first argument of query() is a string SQL query, and the second argument is an array of empty interfaces that contains all the parameters:

rows, err := db.Query(sqlStmtStr, sqlArgs...)
if err != nil {
	err = CustErr(err, "SQL query failed.\nStopping here.")
	log.Println(err)
	http.Error(w, "Internal server error", http.StatusInternalServerError)
	return compAndContRows, err
}
defer rows.Close()

Managing CORS

Basically, CORS is a security measure that prevents frontend from retrieving data from a backend that is not located at the same URL. Here is a nice explanation of why CORS is important. In order to comply with this behaviour you should handle CORS properly on the API server side. The most important CORS property to be set is the Allowed Origins property. It’s not that easy to handle it in Go since it implies first a “preflight” request (using HTTP OPTION) and then setting the proper HTTP headers.

The best solution in Go in my opinion seems to be the rs/cors library that allows us to handle CORS like this:

router := mux.NewRouter()

c := cors.New(cors.Options{
	AllowedOrigins: []string{"http://localhost:8080"},
})
handler := c.Handler(router)

NULL Values in Go

When making SQL requests to db, you’ll probably get some NULL values. Those NULL values must be handled explicitly in Go, especially if you want to marshal those results to JSON. You have 2 solutions:

  • use pointers for nullable values in your struct that will receive values. It works but NULL values are not detected by the 'omitempty' keyword during JSON marshaling so an empty string will still be displayed in the JSON result.
  • use the sql lib nullable types: replace string with sql.NullString, int with sql.NullInt64, bool with sql.NullBool, and time with sql.NullTime but then you obtain something like {"Valid":true,"String":"Smith"} which is not directly ok in JSON. So it requires extra steps before marshaling to JSON.

I implemented the 2nd option and created a custom type + method that implements the json.Marshaler. Note that, by using this method, I could have turned NULL into an empty string so that it is not included in the final JSON, but here I wanted the NULL values to be kept and sent to frontend in JSON as null:

type JsonNullString struct {
	sql.NullString
}

func (v JsonNullString) MarshalJSON() ([]byte, error) {
	if v.Valid {
		return json.Marshal(v.String)
	} else {
		return json.Marshal(nil)
	}
}

type CompAndContRow struct {
	CompId                       string         `json:"compId"`
	CompName                     JsonNullString `json:"compName"`
	CompDomain                   JsonNullString `json:"compDomain"`
	CompWebsite                  JsonNullString `json:"compWebsite"`
	[...]
}

Concatenation of Multiple Rows in SQL

SQL is a very old but still very powerful language. In addition to that, PostgreSQL provides us with very useful functions that allow us to do a lot of things within SQL instead of applying scripts to the results (which is not memory/CPU efficient). Here I have quite a lot of SQL LEFT JOIN that return a lot of very similar rows. Problem is I want some of these rows to be concatenated within one single row. For example, a company can have multiple emails and I want all these emails to appear in the same row separated by this symbol: ¤. Doing this in Go would mean parsing the array of SQL results a huge number of time. In case of millions of rows it would be very long and even crash if the server does not have enough memory. Fortunately, doing it with PostgreSQL is very easy using the string_agg() function combined with GROUP BY and DISTINCT:

SELECT comp.id, string_agg(DISTINCT companyemail.email,'¤')
FROM company AS comp
LEFT JOIN companyemail ON companyemail.company_id = comp.id
WHERE comp.id = $1
GROUP BY comp.id

Conclusion

I’m covering a wide range of topics inside a single article here: Go, Vue.js, Javascript, SQL, Docker, Nginx… I hope you found useful tips that you’ll be able to reuse in you own application.

If you have questions about the app feel free to ask. If you think I could have optimized better some parts of this code, I would love to hear it. This article is also for me a way to get critical feedbacks and question my own work!

Existe aussi en français
Plugging a Vue.js SPA frontend to an API backend

Vue.js is a great Javascript frontend framework and its documentation is very clear and straight to the point. You can either choose to integrate Vue into an already existing app (the JQuery way) or build a Webpack based Single Page Application (SPA) in the React.js way. Let’s build a basic SPA here connecting to a REST API backend using Node.js, Webpack, Vue Loader, Vue Router, and Axios. Setting up such a project is not that easy in my opinion, so here is a short memo about how to do it on Ubuntu. For your information, here is why we’re using Webpack.

Set Up a Vue.js Project

Install Node.js and npm.

Then use npm to install vue-cli and use vue-cli to install the vue-loader loader for Webpack that enables you to use Vue components mixing HTML, CSS, and JS in a single .vue file. It also installs the whole ecosystem needed for a SPA like vue-router. Last but not list, install Axios we’ll be using for API fetching.

sudo npm install -g vue-cli
vue init webpack vue_project  # Answer various questions here
cd vue_project
npm install
npm install --save axios

More information about the project’s structure here.

Workflow

Dev

Dev is made dead easy thanks to the local web server and the hot reloading feature (web page is modified on the fly when you change source code). Just run:

npm run dev

and start coding.

Note: I personnaly had to face a bug so that hot reloading was broken because of a permission problem (I’m running Ubuntu 17.10). I fixed it thanks to the following command:

echo 100000 | sudo tee /proc/sys/fs/inotify/max_user_watches

Deployment

Once you need to push your app to production run:

npm run build

and your app is now compiled into the dist directory. It’s up to you to decide how you want to deploy it. Personally I’m putting my app into a Docker container running Nginx and make Nginx point to the folder containing my app thanks to the following block:

location / {
    root /my_app;
}

Of course if you already have another service running on port 80 of the same server you’ll have to think about how you want to organize your services and modify your nginx config accordingly.

Plug to API Backend

Everything is happening inside the src folder from now on.

Set Dev and Prod Server Names Once For All

My API development server is running on http://127.0.0.1 while my API production is running on http://api.example.com so in order to avoid changing my config for every deployment I created the following http-constants.js file at the root of the src folder:

import axios from 'axios'

let baseURL

if (!process.env.NODE_ENV || process.env.NODE_ENV === 'development') {
  baseURL = 'http://127.0.0.1/'
} else {
  baseURL = 'http://api.example.com'
}

export const HTTP = axios.create(
  {
    baseURL: baseURL
  })

and then in every .vue file needing Axios, import HTTP instead of Axios:

import {HTTP} from '../http-constants'

HTTP.get(...).then(...).catch(...)

Note: this proxying feature might be done more easily in the config/index.js using the proxyTable directive but for some reason it did not work for me.

Create The App

Let’s create an app called ShowGreetings that gets an Hello World message from an API. The API endpoint is /greetings and returns the following JSON message when sending a GET request:

{message: "Hello World"}

First create the new Vue component called ShowGreetings.vue in src/components:

<template>
  <div>
    <button @click="getGreetings">Get Greetings</button>
    <h1 v-if="greetings"></h1>
    <p class="error" v-if="errorMessage"></p>
  </div>
</template>

<script>
import {HTTP} from '../http-constants'
export default {
  name: 'ShowGreetings',
  data () {
    return {
      greetings: '',
      errors: ''
    }
  },
  methods: {
    getGreetings: function () {
      HTTP.get('/greetings')
        .then(response => {
          this.greetings = response.data.message
        })
        .catch(e => {
          this.errors = e
        })
    }
  }
}
</script>

<style scoped>
.error {
  color: red;
}
</style>

This component tries to fetch the API backend when you click on a button and displays the returned message. If an error is returned, it displays the error.

Now update the router in order to take this new component into account. Here is our new index.js in src/router:

import Vue from 'vue'
import Router from 'vue-router'
import HelloWorld from '@/components/HelloWorld'
import ShowGreetings from '@/components/ShowGreetings'

Vue.use(Router)

export default new Router({
  routes: [
    {
      path: '/',
      name: 'HelloWorld',
      component: HelloWorld
    },
    {
      path: '/show-greetings',
      name: 'ShowGreetings',
      component: ShowGreetings
    }
  ]
})

We created a named route called “ShowGreetings” so we can now refer to the route by its name rather than its path (much more flexible).

Last of all, edit the App.vue component in src so a link to our new component appears on the home page:

<template>
  <div id="app">
    <img src="./assets/logo.png">
    <router-link :to="{ name: 'ShowGreetings'}">Show Greetings</router-link>
    <router-view/>
  </div>
</template>
<script>
export default {
  name: 'App'
}
</script>
<style>
#app {
  font-family: 'Avenir', Helvetica, Arial, sans-serif;
  -webkit-font-smoothing: antialiased;
  -moz-osx-font-smoothing: grayscale;
  text-align: center;
  color: #2c3e50;
  margin-top: 60px;
}
</style>

Here we just added a new router-link tag.

Conclusion

Once you understand how all the layers are interacting, it becomes very easy to build an SPA with Vue.js that can scale from a basic project to a complex project in production.

This whole little project is available on my GitHub if needed.

Existe aussi en français
Dockerizing a whole physical Linux server

Docker is usually used in microservice architectures because containers are lightweight (compared to VMs at least), easy to configure, communicate with each other efficiently, and can be deployed very quickly. However Docker can be perfectly used if you want to Dockerize a full physical/VPS server into one single container. Let me show you how and why.

Context

I recently had to work on a project developed by people who had left the company before I arrive. I never had the opportunity to meet those guys and contacting them was not an option either. Unfortunately most of the project lacked a documentation. And in addition to all this, only some parts of the project were managed in a VCS (Git here). Of course there was no dev or staging server: everything was located on a production server… Start seeing the problem now?

It was a web scraping project doing quite a lot of complex things. Technical stack of the prod server was more or less the following:

  • Ubuntu 16.04
  • Nginx
  • Postgresql 9.3
  • Python/Django
  • Python virtualenvs
  • Gunicorn
  • Celery
  • RabbitMQ
  • Scrapy/Scrapyd

First attempt: failed

I tried hard reverse engineering this production server. My ultimate goal was to isolate each application, Dockerize it, and make these containers communicate with each other.

But I failed!

I successfully Dockerized Nginx, the Django app, and the Celery asynchronous tasks management. But I was still struggling with Scrapy and Scrapyd. I think it was mainly due to the fact that changes had been made directly to the Scrapy and Scrapyd source files by the former developers (that’s to say in the python/site-package directory itself!) without any documentation. In addition to that, some of the Python libraries used at the time were very specific Python libs which are not available today anymore, or not in the correct version (you can forget about pip freeze and pip install -r requirements.txt here).

Second attempt: failed

I eventually gave up building a microservice system based on the production server. But still I had to secure the existing production server before it experiences troubles. Database was backed up then but nothing else on the server was backed up.

I thought about making a snapshot of the whole server by using a tool like CloneZilla or a simple rsync command like this one. But a mere backup would not allow me to work easily on new features of the project.

So I thought about converting the physical server to a VMware virtual machine by using their VMware vCenter Converter but the VMware download link was broken and so few people talked about this tool on Internet that I got scared and gave up.

Last of all I tried this Dockerization solution based on Blueprint but could not make it work and Blueprint seemed to be a discontinued project.

Third attempt: successful

Actually the solution was pretty simple: I decided to Dockerize the whole prod server by myself - except the Postgresql data - so I have a backup of the server and I can commit new features to this Docker container whenever I want without being afraid of breaking the server forever. Here is how I did it:

1. Install and setup Docker on the server

  1. Install Docker following this guide.
  2. Login to your Docker Hub account: docker login
  3. Create a docker network (if needed): docker network create --subnet=172.20.0.0/16 my_network

2. Create a Docker image of your server

Go to the root of the server:

cd /

Create the following Dockerfile file based on Ubuntu 16.04 LTS (without the part dedicated to Nginx and Rabbitmq of course):

FROM ubuntu:xenial

# Copy the whole system except what is specified in .dockerignore
COPY / /

# Reinstall nginx and rabbitmq because of permissions issues in Docker
RUN apt remove -y nginx
RUN apt install -y nginx
RUN apt remove -y rabbitmq-server
RUN apt install -y rabbitmq-server

# Launch all services
COPY startup.sh /
RUN chmod 777 /startup.sh
CMD ["bash","/startup.sh"]

Create a .dockerignore file that will mention all the files or folders you want to exclude from the COPY above. This is where you need to use your intuition. Remove as many files as possible so the Docker image’s size is not too big, but do not exclude files that are vital to you application. Here is my example that you should customize based on your own server:

# Remove folders mentioned here:
# https://wiki.archlinux.org/index.php/Rsync#As_a_backup_utility
/dev 
/proc
/sys
/tmp
/run
/mnt
/media
/lost+found

# Remove database's data
/var/lib/postgresql

# Remove useless heavy files like /var/lib/scrapyd/reports.old
**/*.old
**/*.log
**/*.bak

# Remove docker
/var/lib/lxcfs
/var/lib/docker
/etc/docker
/root/.docker
/etc/init/docker.conf

# Remove the current program
/.dockerignore
/Dockerfile

Create a startup.sh script in order to launch all the services and set database connection redirection. This is my script here but yours will be totally different of course:

# Redirect all traffic from 127.0.0.1:5432 to 172.20.0.1:5432
# so any connection to Postgresql keeps working without any other modification.
# Requires the --privileged flag when creating container:
sysctl -w net.ipv4.conf.all.route_localnet=1
iptables -t nat -A OUTPUT -p tcp -s 127.0.0.1 --dport 5432 -j DNAT --to-destination 172.20.0.1:5432
iptables -t nat -A POSTROUTING -j MASQUERADE

# Start RabbitMQ.
rabbitmq-server -detached

# Start Nginx.
service nginx start

# Start Scrapyd
/root/.virtualenvs/my_project_2/bin/python /root/.virtualenvs/my_project_2/bin/scrapyd >> /var/log/scrapyd/scrapyd.log 2>&1 &

# Use Python virtualenvwrapper
source /root/.profile

# Start virtualenv and start Django/Gunicorn
workon my_project_1
cd /home/my_project_1
export DJANGO_SETTINGS_MODULE='my_project_1.settings.prod'
gunicorn -c my_project_1/gunicorn.py -p /tmp/gunicorn.pid my_project_1.wsgi &

# Start Celery
export C_FORCE_ROOT=True
celery -A my_project_1 beat &
celery -A my_project_1 worker -l info -Q queue1,queue2 -P gevent -c 1000 &

# Little hack to keep the container running in foreground
tail -f /dev/null

As you can see, I’m using iptables redirections so that all connections to the Postgresql database (port 5432) keep working without any additional change in the configuration files. Indeed my database was initially located on localhost, but it is now located on the Docker host whose ip is 172.20.0.1 (I moved everything to the Docker container except database). Redirections at the kernel level are pretty convenient when you don’t know where all the configuration files are located, and this is necessary when you cannot modify those config files (like in a compiled application you don’t have the source code of).

Now launch the image creation and wait… In my case, the image was about 3Go and was created in about 5 minutes. Make sure you have enough free space on your server before launching this command:

docker build -t your_repo/your_project:your_tag .

If you’ve got no error here, congrats you’ve done the hardest part! Now test you image and see if everything works fine. If not, then you need to adapt one of the 3 files above.

3. Save the newly created Docker image

Just push the new image to Docker Hub:

docker push your_repo/your_project:your_tag

4. Add features to your server image

Now, if you need to work on this server image, you can do the following:

  1. launch a container based on this image with docker run (do not forget to specify network name, ip address, port forwarding, and add the --privileged flag in order for the sysctl command to work in startup.sh)
  2. work in the container
  3. commit changes in the container to a new Docker image with docker commit
  4. push the new image to Docker Hub with docker push and deploy it to staging or production

Conclusion

This solution literally saved my life and is a proof that Docker is great not only for microservice architecture but for whole server containerization as well. Dockerizing a whole server can be a perfect option if you need to secure an existing prod server like mine here with no documentation, no GitHub repo, no initial developers…

The first image creation can be pretty big, but then every commit should not be that heavy thanks to the Docker layer architecture.

Is it a hack? Maybe it is, but it works like a charm!

I would love to here other devs’ opinion on this.

Existe aussi en français
CTOs, developers: how to assess quality of an external API?

Nowadays finding an external API in order to improve your service is getting easier and easier. More and more companies offer APIs. Problem is many developers/CTOs start the API integration right away while it should be the very last step! Before that you need to figure out whether the quality of this API matches some minimum requirements. Let me tell you how I do it. I hope it will help other CTOs and developers.

Quality of data

A lot of APIs expose data in order for you to enrich your system (this is not always the case of course, Stripe is not an enrichment API for example). This is essential that you check the quality of those data. It will take you a long time and I know you do not like testing! Neither do I but you cannot avoid building a serious test scenario here. If you realize data quality was not good enough only 2 weeks after finishing your API integration, trust me you’ll regret it…

Documentation

I recently fell upon an API which exposed great data (much better than his competitors in my opinion), but its documentation was… awful! Actually it almost did not exist. In addition to that it did not always respect the basic REST standards. How can you possibly integrate an external API if error codes are not properly documented ? Well the only solution is for you to test again and again in order to understand how things work behind the hoods. Reverse-engineering might be fun but it takes a lot of time. Remember you have no Github repo to explore here since source code is not available… Bad documentation is a lot of time lost for the devs and certainly bad surprises in the mid term.

Libraries

Can you consume the API with special libraries in your favorite language ? As a Python and Go developer I’m always glad to see APIs offering a Python lib (I know I can forget about Go for the moment). It can save you quite a lot of time, but first make sure the lib is mature enough and covers all the API features (not always the case).

Reputation of the vendor

Reputation can help you find out whether you’ll have bad surprises with your API in the future. By bad surprises I mean service interruption, features regression, or even end of the service… You can partly tackle that by asking yourself the following questions:

  • is this API popular on the internet (in general if you find little information, run away)? Are there a lot of articles/tutorials talking about it? Are those articles positive?
  • are some popular companies using it?
  • if the company developed libs, are they popular on Github? Are the issues on Github solved regularly?
  • were there recent updates of the API or was the last update released a long time ago?

Technical support

Make sure someone answers you quickly by email when you have an issue and the answer is relevant. If you’re based in Europe and the API is run by an American company, check whether time difference is not too much of a problem.

Respect standards

In my humble opinion, you choose only RESTful APIs today. If the API you’re in love with do not respect the REST standard, be suspicious. But keep in mind that it’s not perfectly clear what the REST standard is about, and each API implements its own rules (HTTP codes, POST requests encoding, …). Still, have a close look at the docs, and check that you do not see something original. Originality will slow you down…

Price

Of course price is very important. But be careful, API prices are not always easy to understand. Are you going to be charged per month for an unlimited amount of requests ? Charged per request ? If so are you going to be charged twice for 2 identical requests (in case of an enrichment API) or will the second request be free ? Are you going to be charged for a request returning no result (HTTP 404) ? Make sure you understand all the implications of pricing.

Quality of Service (QoS)

QoS is highly important. Basically you want the API to go fast and have as little downtime as possible. Unfortunately this is not an easy to test point. Indeed QoS may vary a lot over time, and many APIs offer 2 levels of QoS depending on whether you’re using the free version of the API or you paid for it… Sometimes you can also choose between different subscriptions with different levels of response time.

Parallel queries support

Depending on how you’re planning to integrate your API, you might want to speed things up by making multiple parallel queries to the API instead of using it sequentially. Personally I’m using Golang for that most of the time. If so be careful: many vendors do not support parallel queries, and when they do they always set up a limit. In that case make sure to ask them what this limit is (not always told in the docs) and adapt your script based on this.

This post will be a good memo for me. I hope for you too!

Existe aussi en français | También existe en Español
REST API fetching: Go vs Python

APIs are everywhere today. Imagine you want to find business prospect information based on an email. Well there is an API for this. Need to geocode an ugly postal address? There is an API for that. Would you like to make a payment ? There are multiple APIs for that too of course. As a developer I am regularly fetching external APIs using either Python or Go. Both methods are quite different, let’s compare them here on an edge case: JSON data sent through a POST request body.

A real life example

Recently, I’ve used the NameAPI.org API, dedicated to splitting a full name into first name and last name, and determine gender of the person.

In order to use their API you should send JSON data encoded in the request body through POST. Moreover, the request Content-Type should be set to application/json instead of multipart/form-data. This is a pretty tricky case since usually POST data is sent through the request headers, and if we decide to send it through the request body (in case of a complex JSON for example) the usual Content-Type is multipart/form-data.

Here is the JSON data we want to send:

{
  "inputPerson" : {
    "type" : "NaturalInputPerson",
    "personName" : {
      "nameFields" : [ {
        "string" : "Petra",
        "fieldType" : "GIVENNAME"
      }, {
        "string" : "Meyer",
        "fieldType" : "SURNAME"
      } ]
    },
    "gender" : "UNKNOWN"
  }
}

We could do this pretty simply using cURL:

curl -H "Content-Type: application/json" \
-X POST \
-d '{"inputPerson":{"type":"NaturalInputPerson","personName":{"nameFields":[{"string":"Petra Meyer","fieldType":"FULLNAME"}]}}}' \
http://rc50-api.nameapi.org/rest/v5.0/parser/personnameparser?apiKey=<API-KEY>

And here is the NameAPI.org’s response (JSON):

{
"matches" : [ {
  "parsedPerson" : {
    "personType" : "NATURAL",
    "personRole" : "PRIMARY",
    "mailingPersonRoles" : [ "ADDRESSEE" ],
    "gender" : {
      "gender" : "MALE",
      "confidence" : 0.9111111111111111
    },
    "addressingGivenName" : "Petra",
    "addressingSurname" : "Meyer",
    "outputPersonName" : {
      "terms" : [ {
        "string" : "Petra",
        "termType" : "GIVENNAME"
      },{
        "string" : "Meyer",
        "termType" : "SURNAME"
      } ]
    }
  },
  "parserDisputes" : [ ],
  "likeliness" : 0.926699401733102,
  "confidence" : 0.7536487758945387
}

Now let’s see how to do this in Go and Python!

Go implementation

Code

/*
Fetch the NameAPI.org REST API and turn JSON response into a Go struct.

Sent data have to be JSON data encoded into request body.
Send request headers must be set to 'application/json'.
*/

package main

import (
    "encoding/json"
    "io/ioutil"
    "log"
    "net/http"
    "strings"
)

// url of the NameAPI.org endpoint:
const (
    url = "http://rc50-api.nameapi.org/rest/v5.0/parser/personnameparser?" +
        "apiKey=<API-KEY>"
)

func main() {

    // JSON string to be sent to NameAPI.org:
    jsonString := `{
        "inputPerson": {
            "type": "NaturalInputPerson",
            "personName": {
                "nameFields": [
                    {
                        "string": "Petra",
                        "fieldType": "GIVENNAME"
                    }, {
                        "string": "Meyer",
                        "fieldType": "SURNAME"
                    }
                ]
            },
            "gender": "UNKNOWN"
        }
    }`
    // Convert JSON string to NewReader (expected by NewRequest)
    jsonBody := strings.NewReader(jsonString)

    // Need to create a client in order to modify headers
    // and set content-type to 'application/json':
    client := &http.Client{}
    req, err := http.NewRequest("POST", url, jsonBody)
    if err != nil {
        log.Println(err)
    }
    req.Header.Add("Content-Type", "application/json")
    resp, err := client.Do(req)

    // Proceed only if no error:
    switch {
    default:
        // Create a struct dedicated to receiving the fetched
        // JSON content:
        type Level5 struct {
            String   string `json:"string"`
            TermType string `json:"termType"`
        }
        type Level41 struct {
            Gender     string  `json:"gender"`
            Confidence float64 `json:"confidence"`
        }
        type Level42 struct {
            Terms []Level5 `json:"terms"`
        }
        type Level3 struct {
            Gender           Level41 `json:"gender"`
            OutputPersonName Level42 `json:"outputPersonName"`
        }
        type Level2 struct {
            ParsedPerson Level3 `json:"parsedPerson"`
        }
        type RespContent struct {
            Matches []Level2 `json:"matches"`
        }

        // Decode fetched JSON and put it into respContent:
        respContentBytes, err := ioutil.ReadAll(resp.Body)
        if err != nil {
            log.Println(err)
        }
        var respContent RespContent
        err = json.Unmarshal(respContentBytes, &respContent)
        if err != nil {
            log.Println(err)
        }
        log.Println(respContent)
    case err != nil:
        log.Println("Network error:", err)
    case resp.StatusCode != 200:
        log.Println("Bad HTTP status code:", err)
    }

}

Explanations

As you can see we’re facing 2 painful problems with Go:

  • the http lib is quite tricky when it’s about encoding JSON data into the request body and changing the Content-Type header. Go’s documentation is not very clear on this. As a result we cannot use the pretty straightforward http.Post but instead we need to create a http.Client and then use the NewRequest() function and trigger it with client.Do(req). This is the only way to set a custom Content-Type in that case: req.Header.Add("Content-Type", "application/json")
  • decoding the returned JSON into Go data is pretty long and boring (called Unmarshalling in Go). It’s due to the fact that, Go being a statically typed language, we need to know in advance what the final returned JSON will look like. Thus we need to create a dedicated struct that will map the JSON’s structure and receive the data. In case of a nested JSON like the one returned by NameAPI.org, mixing arrays and maps, it is very touchy. Fortunately, our struct does not need to map the whole JSON but only the fields we will need. Another approach, if we have no idea what the final JSON will look like, would be to guess the types of data. Here is a good article on this.

The jsonString input is already a string here. But for a proper comparison with Python, it should have been a struct that we would have turned into a string. I just did not want to make this script too long for the blog.

Python implementation

Code

"""
Fetch the NameAPI.org REST API and turn JSON response into Python dict.

Sent data have to be JSON data encoded into request body.
Send request headers must be set to 'application/json'.
"""

import requests

# url of the NameAPI.org endpoint:
url = (
    "http://rc50-api.nameapi.org/rest/v5.0/parser/personnameparser?"
    "apiKey=<API-KEY>"
)

# Dict of data to be sent to NameAPI.org:
payload = {
    "inputPerson": {
        "type": "NaturalInputPerson",
        "personName": {
            "nameFields": [
                {
                    "string": "Petra",
                    "fieldType": "GIVENNAME"
                }, {
                    "string": "Meyer",
                    "fieldType": "SURNAME"
                }
            ]
        },
        "gender": "UNKNOWN"
    }
}

# Proceed, only if no error:
try:
    # Send request to NameAPI.org by doing the following:
    # - make a POST HTTP request
    # - encode the Python payload dict to JSON
    # - pass the JSON to request body
    # - set header's 'Content-Type' to 'application/json' instead of
    #   default 'multipart/form-data'
    resp = requests.post(url, json=payload)
    resp.raise_for_status()
    # Decode JSON response into a Python dict:
    resp_dict = resp.json()
    print(resp_dict)
except requests.exceptions.HTTPError as e:
    print("Bad HTTP status code:", e)
except requests.exceptions.RequestException as e:
    print("Network error:", e)

Explanations

The Python Request library is an amazing library saves us a lot of time here compared to Go! In one line, resp = requests.post(url, json=payload), almost everything is done under the hood:

  • build a POST HTTP request
  • encode the Python payload dictionary to JSON
  • pass the JSON to the request body
  • set header’s 'Content-Type' to 'application/json' instead of the default 'multipart/form-data' thanks to the json keyword argument
  • send the request

Decoding of returned JSON is also a one-liner: resp_dict = resp.json(). No need to create a complicated data structure in advance here!

Conclusion

Python is clearly the winner. Python’s simplicity combined with its huge set of libraries saves us a lot of time of development!

We’re not dealing with performance here of course. If you’re looking for a high-performance API fetcher using concurrency, Go could be a great choice. But simplicity and performance are not good friends as you can see…

Feel free to comment, I would be glad to here your opinion on this!

Existe aussi en français | También existe en Español