Security of a Go (Golang) website

Web frameworks are not so frequent in the Go world compared to other languages. It gives you more freedom but web frameworks are also a great way to force people to implement basic security practices, especially beginners.

If you’re developing Go websites or APIs without frameworks, like me, here are some security points you should keep in mind.

CSRF

Cross-site Requests Forgery (CSRF) attacks target the password protected pages of your website which are using forms. Authenticated users (via a session cookie in their browser) might post information to a protected form without knowing it if they’re visiting a malicious website. In order to avoid this, every forms should have a hidden field containing a CSRF token that the server will use to check the authenticity of the request.

Let’s use Gorilla Toolkit in this regard. First integrate the CSRF middleware. You can either do it for the whole website:

package main

import (
    "net/http"

    "github.com/gorilla/csrf"
    "github.com/gorilla/mux"
)

func main() {
    r := mux.NewRouter()

    http.ListenAndServe(":8000",
        csrf.Protect([]byte("32-byte-long-auth-key"))(r))
}

Or for some pages only:

package main

import (
    "net/http"

    "github.com/gorilla/csrf"
    "github.com/gorilla/mux"
)

func main() {
    r := mux.NewRouter()
    csrfMiddleware := csrf.Protect([]byte("32-byte-long-auth-key"))

    protectedPageRouter := r.PathPrefix("/protected-page").Subrouter()
    protectedPageRouter.Use(csrfMiddleware)
    protectedPageRouter.HandleFunc("", protectedPage).Methods("POST")

    http.ListenAndServe("8080", r)
}

Pass the CSRF token to your template then:

func protectedPage(w http.ResponseWriter, r *http.Request) {
    var tmplData = ContactTmplData{CSRFField: csrf.TemplateField(r)}
    tmpl.Execute(w, tmplData)
}

Last of all put the hidden field {{.CSRFField}} into your template.

CORS

Cross-origin Resource Sharing (CORS) attacks consist in sending information to a malicious website from a clean website. In order to mitigate this, the clean website should prevent users from sending asynchronous requests (XHR) to another website. Good news: this behavior is implemented by default in every browsers! Bad news: it can lead to false positives so, if you want to consume an API located on another domain or another port from your web page, requests will be blocked by the browser. It’s often a tricky behavior for the API rookies…

You should whitelist some domains in order to avoid the above problem. You can do this using the github.com/rs/cors:

package main

import (
    "net/http"

    "github.com/rs/cors"
)

func main() {
    c := cors.New(cors.Options{
        AllowedOrigins: []string{"http://my-whitelisted-domain"},
    })
    handler = c.Handler(handler)

    http.ListenAndServe(":8080", handler)
}

HTTPS

Switching your website to HTTPS is an essential security point. Here I’m assuming that you’re using the Go built-in server. If not (maybe because you’re using Nginx or Apache), you can skip this section.

Get a A on SSLLabs.com

In order to get the best security grade on SSLLabs (meaning the certificate is perfectly configured and thus avoid security warnings on some web clients), you should disable SSL and only use TLS 1.0 and above. That’s why I’m using the crypto/tls library. In order to serve HTTPS requests we’re using http.ListenAndServeTLS:

package main

import (
    "crypto/tls"
    "net/http"
)

func main() {
    config := &tls.Config{MinVersion: tls.VersionTLS10}
    server := &http.Server{Addr: ":443", Handler: r, TLSConfig: config}
    server.ListenAndServeTLS(tlsCertPath, tlsKeyPath)
}

Redirect HTTP to HTTPS

It’s good practice to force HTTP requests to HTTPS. You should use a dedicated goroutine:

package main

import (
    "crypto/tls"
    "net/http"
)

// httpsRedirect redirects http requests to https
func httpsRedirect(w http.ResponseWriter, r *http.Request) {
    http.Redirect(
        w, r,
        "https://"+r.Host+r.URL.String(),
        http.StatusMovedPermanently,
    )
}

func main() {
    go http.ListenAndServe(":80", http.HandlerFunc(httpsRedirect))

    config := &tls.Config{MinVersion: tls.VersionTLS10}
    server := &http.Server{Addr: ":443", Handler: r, TLSConfig: config}
    server.ListenAndServeTLS(tlsCertPath, tlsKeyPath)
}

Let’s Encrypt’s certificates renewal

Let’s Encrypt has become the most common way of provisioning TLS certs (because it’s free of course) but not necessarily the easiest. Once Let’s Encrypt has been installed and the first certs have been provisioned, the question is how to renew them automatically with Certbot. Certbot does not integrate to the Go HTTP server so it’s necessary to use the Certbot’s standard version (this one for Ubuntu 18.04 for example), and then briefly (a couple of seconds) turn off the production server during the certificates renewal (in order to avoid conflicts on the 80 and 443 ports). It can be done by modifying the renewal command launched by Certbot’s cron (in /etc/cron.d/certbot). On Ubuntu Certbot also uses the systemd timer (as a first choice rather than cron) so it’s better to modify the /lib/systemd/system/certbot.service config file:

[Unit]
Description=Certbot
Documentation=file:///usr/share/doc/python-certbot-doc/html/index.html
Documentation=https://letsencrypt.readthedocs.io/en/latest/
[Service]
Type=oneshot
# Proper command to stop server before renewal and restart server afterwards
ExecStart=/usr/bin/certbot -q renew --pre-hook "command to stop go server" --post-hook "command to start go server"
PrivateTmp=true

One interesting alternative could be to use a Go lib dedicated to the certs renewal inside your program called x/crypto/acme/autocert. Personally I’m not sure I like this solution because - even if it does not create downtime contrary to my solution - it means your code it strongly coupled to a specific type of certs renewal (ACME).

XSS

Cross-site scripting (XSS) attacks consist in executing some Javascript code in the user browser when he’s visiting your website, while it should not be happening. For example in case of a forum, if a user is posting a message containing some Javascript and you’re displaying this message to all users without any filter, the script will execute on all user browsers. In order to mitigate this, strings should be “escaped” before being displayed in order to be harmless.

Good news: when you’re using templates via the html/template lib, Go ensures strings are automatically escaped. Be careful though: text/template does not escape strings!

SQL Injections

SQL Injection attacks consist in posting malicious data containing SQL in an HTML form. When inserting data into database or retrieving data, this malicious code can cause damage.

Once again: good news is this attack is easily mitigated if you’re correctly using the standard SQL libs. For example if you’re using database/sqlSQL injections are automatically escaped when you’re using the $ or ? keywords. Here’s an SQL request properly written for PostgreSQL:

db.Exec("INSERT INTO users(name, email) VALUES($1, $2)",
  "Julien Salinas", "julien@salinas.com")

Personally I’m using the PostgreSQL optimized ORM github.com/go-pg/pg. A properly formed request in that case would be:

user := &User{
    name:       "Julien Salinas",
    email:      "julien@salinas.com",
}
err = db.Insert(user)

Directory Listing

Directory listing is the fact that you can display all the files from a given directory on the website. It might disclose documents to users that you don’t want to show them if they don’t have the exact url. Directory listing is enabled by default if you’re using the standard lib http.FileServer. I’m explaining in this article how to disable this behavior.

Conclusion

Here’s a short insight about the major security points for your Go website.

It might be a good idea to use a tool giving you the ability to easily set various security related essential elements. In my opinion github.com/unrolled/secure is a great lib in this regard. It allows you to easily set up HTTPS redirects, handle CORS, but also filter authorized hosts and many other complex things.

I hope these basic tips helped some of you!

Existe aussi en français
Making an NER API with FastAPI and spaCy

Named Entity Recognition (NER) is an interesting NLP feature that is made very easy thanks to spaCy. If you want to expose your NER model to the world, you can easily build an API with FastAPI.

FastAPI is a new API engine that has just been released a couple of months ago. It makes API development both fast and convenient.

As for spaCy, in case you don’t know it yet, it’s a great open-source framework for NLP, and especially NER.

Code example

We want to build an API endpoint that will return entities from a simple sentence: “John Doe is a Go Developer at Google”.

The following code is mostly coming from this great “spacy-api-docker” repo by jgontrum (thanks!): https://github.com/jgontrum/spacy-api-docker/, and most specifically from this file: https://github.com/jgontrum/spacy-api-docker/blob/master/displacy_service/parse.py.

The API will return each entity along with it’s position.

[
  {
    "end": 8,
    "start": 0,
    "text": "John Doe",
    "type": "PERSON"
  },
  {
    "end": 25,
    "start": 13,
    "text": "Go Developer",
    "type": "POSITION"
  },
  {
    "end": 35,
    "start": 30,
    "text": "Google",
    "type": "ORG"
  },
]

Here is the code:

import spacy
from fastapi import FastAPI
from pydantic import BaseModel
from typing import List

en_core_web_lg = spacy.load("en_core_web_lg")

api = FastAPI()

class Input(BaseModel):
    sentence: str

class Extraction(BaseModel):
    first_index: int
    last_index: int
    name: str
    content: str

class Output(BaseModel):
    extractions: List[Extraction]

@api.post("/extractions", response_model=Output)
def extractions(input: Input):
    document = en_core_web_lg(input.sentence)

    extractions = []
    for entity in document.ents:
      extraction = {}
      extraction["first_index"] = entity.start_char
      extraction["last_index"] = entity.end_char
      extraction["name"] = entity.label_
      extraction["content"] = entity.text
      extractions.append(extraction)

    return {"extractions": extractions}

First we load the spaCy model:

en_core_web_lg = spacy.load("en_core_web_lg")

Then we perform NER:

document = en_core_web_lg(input.sentence)
# [...]
document.ents

Data validation

Thanks to FastAPI it is easy to perform input and output data validation:

class Extraction(BaseModel):
    first_index: int
    last_index: int
    name: str
    content: str
class Output(BaseModel):
    extractions: List[Extraction]

Conclusion

Thanks to spaCy and FastAPI, building an entity extraction API has never been so easy. I hope it will help you for your next project!

If you have questions, please let me know!

Developing and deploying a whole website in Go (Golang)

In my opinion Go (Golang) is a great choice for web development:

  • it makes non-blocking requests and concurrency easy
  • it makes code testing and deployment easy as it does not require any special runtime environment or dependencies (making containerization pretty useless here)
  • it does not require any additional frontend HTTP server like Apache or Nginx as it already ships with a very good one in its standard library
  • it does not force you to use a web framework as everything needed for web development is ready to use in the std lib

A couple of years back the lack of libraries and tutorials around Go could have been a problem, but today it is not anymore. Let me show you the steps to build a website in Go and deploy it to your Linux server from A to Z.

The Basics

Let’s say you are developing a basic HTML page called love-mountains. As you might already know, the rendering of love-mountains is done in a function, and you should launch a web server with a route pointing to that function. This is good practice to use HTML templates in web development so let’s render the page through a basic template here. This is also good practice to load parameters like the path to templates directory from environment variables for better flexibility.

Here is your Go code:

package main

import (
    "html/template"
    "net/http"
)

// Get path to template directory from env variable
var templatesDir = os.Getenv("TEMPLATES_DIR")

// loveMountains renders the love-mountains page after passing some data to the HTML template
func loveMountains(w http.ResponseWriter, r *http.Request) {
    // Build path to template
    tmplPath := filepath.Join(templatesDir, "love-mountains.html")
    // Load template from disk
    tmpl := template.Must(template.ParseFiles(tmplPath))
    // Inject data into template
    data := "La Chartreuse"
    tmpl.Execute(w, data)
}

func main() {
    // Create route to love-mountains web page
    http.HandleFunc("/love-mountains", loveMountains)
    // Launch web server on port 80
    http.ListenAndServe(":80", nil)
}

Retrieving dynamic data in a template is easily achieved with {{.}} here. Here is your love-mountains.html template:

<h1>I Love Mountains<h1>
<p>The mountain I prefer is {{.}}</p>

HTTPS

Nowadays, implementing HTTPS on your website has become almost compulsory. How can you switch your Go website to HTTPS?

Linking TLS Certificates

Firstly, issue your certificate and private key in .pem format. You can issue them by yourself with openssl (but you will end up with a self-signed certificate that triggers a warning in the browser) or you can order your cert from a trusted third-party like Let’s Encrypt. Personally, I am using Let’s Encrypt and Certbot to issue certificates and renew them automatically on my servers. More info about how to use Certbot here.

Then you should tell Go where your cert and private keys are located. I am loading the paths to these files from environment variables.

We are now using the ListenAndServeTLS function instead of the mere ListenAndServe:

[...]

// Load TLS cert info from env variables
var tlsCertPath = os.Getenv("TLS_CERT_PATH")
var tlsKeyPath = os.Getenv("TLS_KEY_PATH")

[...]

func main() {
    [...]
    // Serve HTTPS on port 443
    http.ListenAndServeTLS(":443", tlsCertPath, tlsKeyPath, nil)
}

Forcing HTTPS Redirection

For the moment we have a website listening on both ports 443 and 80. It would be nice to automatically redirect users from port 80 to 443 with a 301 redirection. We need to spawn a new goroutine dedicated to redirecting from http:// to https:// (principle is similar as what you would do in a frontend server like Nginx). Here is how to do it:

[...]

// httpsRedirect redirects HTTP requests to HTTPS
func httpsRedirect(w http.ResponseWriter, r *http.Request) {
    http.Redirect(
        w, r,
        "https://"+r.Host+r.URL.String(),
        http.StatusMovedPermanently,
    )
}

func main() {
    [...]
    // Catch potential HTTP requests and redirect them to HTTPS
    go http.ListenAndServe(":80", http.HandlerFunc(httpsRedirect))
    // Serve HTTPS on port 443
    http.ListenAndServeTLS(":443", tlsCertPath, tlsKeyPath, nil)
}

Static Assets

Serving static assets (like images, videos, Javascript files, CSS files, …) stored on disk is fairly easy but disabling directory listing is a bit hacky.

Serving Files from Disk

In Go, the most secured way to serve files from disk is to use http.FileServer. For example, let’s say we are storing static files in a static folder on disk, and we want to serve them at https://my-website/static, here is how to do it:

[...]
http.Handle("/", http.FileServer(http.Dir("static")))
[...]

Preventing Directory Listing

By default, http.FileServer performs a full directory listing, meaning that https://my-website/static will display all your static assets. We don’t want that for security and intellectual property reasons.

Disabling directory listing requires the creation of a custom FileSystem. Let’s create a struct that implements the http.FileSystem interface. This struct should have an Open() method in order to satisfy the interface. This Open() method first checks if the path to the file or directory exists, and if so checks if it is a file or a directory. If the path is a directory then let’s return a file does not exist error which will be converted to a 404 HTTP error for the user in the end. This way the user cannot know if he reached an existing directory or not.

Once again, let’s retrieve the path to static assets directory from an environment variable.

[...]

// Get path to static assets directory from env variable
var staticAssetsDir = os.Getenv("STATIC_ASSETS_DIR")

// neuteredFileSystem is used to prevent directory listing of static assets
type neuteredFileSystem struct {
    fs http.FileSystem
}

func (nfs neuteredFileSystem) Open(path string) (http.File, error) {
    // Check if path exists
    f, err := nfs.fs.Open(path)
    if err != nil {
        return nil, err
    }

    // If path exists, check if is a file or a directory.
    // If is a directory, stop here with an error saying that file
    // does not exist. So user will get a 404 error code for a file or directory
    // that does not exist, and for directories that exist.
    s, err := f.Stat()
    if err != nil {
        return nil, err
    }
    if s.IsDir() {
        return nil, os.ErrNotExist
    }

    // If file exists and the path is not a directory, let's return the file
    return f, nil
}

func main() {
    [...]
    // Serve static files while preventing directory listing
    mux := http.NewServeMux()
    fs := http.FileServer(neuteredFileSystem{http.Dir(staticAssetsDir)})
    mux.Handle("/", fs)
    [...]
}

Full Example

Eventually, your whole website would look like the following:

package main

import (
    "html/template"
    "net/http"
    "os"
    "path/filepath"
)

var staticAssetsDir = os.Getenv("STATIC_ASSETS_DIR")
var templatesDir = os.Getenv("TEMPLATES_DIR")
var tlsCertPath = os.Getenv("TLS_CERT_PATH")
var tlsKeyPath = os.Getenv("TLS_KEY_PATH")

// neuteredFileSystem is used to prevent directory listing of static assets
type neuteredFileSystem struct {
    fs http.FileSystem
}

func (nfs neuteredFileSystem) Open(path string) (http.File, error) {
    // Check if path exists
    f, err := nfs.fs.Open(path)
    if err != nil {
        return nil, err
    }

    // If path exists, check if is a file or a directory.
    // If is a directory, stop here with an error saying that file
    // does not exist. So user will get a 404 error code for a file/directory
    // that does not exist, and for directories that exist.
    s, err := f.Stat()
    if err != nil {
        return nil, err
    }
    if s.IsDir() {
        return nil, os.ErrNotExist
    }

    // If file exists and the path is not a directory, let's return the file
    return f, nil
}

// loveMountains renders love-mountains page after passing some data to the HTML template
func loveMountains(w http.ResponseWriter, r *http.Request) {
    // Load template from disk
    tmpl := template.Must(template.ParseFiles("love-mountains.html"))
    // Inject data into template
    data := "Any dynamic data"
    tmpl.Execute(w, data)
}

// httpsRedirect redirects http requests to https
func httpsRedirect(w http.ResponseWriter, r *http.Request) {
    http.Redirect(
        w, r,
        "https://"+r.Host+r.URL.String(),
        http.StatusMovedPermanently,
    )
}

func main() {
    // http to https redirection
    go http.ListenAndServe(":80", http.HandlerFunc(httpsRedirect))

    // Serve static files while preventing directory listing
    mux := http.NewServeMux()
    fs := http.FileServer(neuteredFileSystem{http.Dir(staticAssetsDir)})
    mux.Handle("/", fs)

    // Serve one page site dynamic pages
    mux.HandleFunc("/love-mountains", loveMountains)

    // Launch TLS server
    log.Fatal(http.ListenAndServeTLS(":443", tlsCertPath, tlsKeyPath, mux))
}

Plus your love-mountains.html template:

<h1>I Love Mountains<h1>
<p>The mountain I prefer is {{.}}</p>

Testing, Deploying and Daemonizing with Systemd

Having a solid and easy test/deploy process is very important from an efficiency standpoint and Go really helps in this regard. Go is compiling everything within a single executable, including all dependencies (except templates but the latter are not real dependencies and this is better to keep them apart for flexibility reasons anyway). Go also ships with its own frontend HTTP server, so no need to install Nginx or Apache. Thus this is fairly easy to test your application locally and make sure it is equivalent to your production website on the server (not talking about data persistence here of course…). No need to add a container system like Docker to your build/deploy workflow then!

Testing

To test your application locally, compile your Go binary and launch it with the proper environment variables like this:

TEMPLATES_DIR=/local/path/to/templates/dir \
STATIC_ASSETS_DIR=/local/path/to/static/dir \
TLS_CERT_PATH=/local/path/to/cert.pem \
TLS_KEY_PATH=/local/path/to/privkey.pem \
./my_go_website

That’s it! Your website is now running in your browser at https://127.0.0.1.

Deploying

Deployment is just about copying your Go binary to the server (plus your templates, static assets, and certs, if needed). A simple tool like scp is perfect for that. You could also use rsync for more advanced needs.

Daemonizing your App with Systemd

You could launch your website on the server by just issuing the above command, but it is much better to launch your website as a service (daemon) so your Linux system automatically launches it on startup (in case of a server restart) and also tries to restart it in case your app is crashing. On modern Linux distribs, the best way to do so is by using systemd, which is the default tool dedicated to management of system services. Nothing to install then!

Let’s assume you put your Go binary in /var/www on your server. Create a new file describing your service in the systemd directory: /etc/systemd/system/my_go_website.service. Now put the following content inside:

[Unit]
Description=my_go_website
After=network.target auditd.service

[Service]
EnvironmentFile=/var/www/env
ExecStart=/var/www/my_go_website
ExecReload=/var/www/my_go_website
KillMode=process
Restart=always
RestartPreventExitStatus=255
Type=simple

[Install]
WantedBy=multi-user.target

The EnvironmentFile directive points to an env file where you can put all your environment variables. systemd takes care of loading it and passing env vars to your program. I put it in /var/www but feel free to put it somewhere else. Here is what your env file would look like:

TEMPLATES_DIR=/remote/path/to/templates/dir
STATIC_ASSETS_DIR=/remote/path/to/static/dir
TLS_CERT_PATH=/remote/path/to/cert.pem
TLS_KEY_PATH=/remote/path/to/privkey.pem

Feel free to read more about systemd for more details about the config above.

Now:

  • launch the following to link your app to systemd: systemctl enable my_go_website
  • launch the following to start your website right now: systemctl start my_go_website
  • restart with: systemctl restart my_go_website
  • stop with: systemctl stop my_go_website

Replacing Javascript with WebAssembly (Wasm)

Here is a bonus section in case you are feeling adventurous!

As of Go version 1.11, you can now compile Go to Web Assembly (Wasm). More details here. This is very cool as Wasm can work as a substitute for Javascript. In other words you can theoretically replace Javascript with Go through Wasm.

Wasm is supported in modern browsers but this is still pretty experimental. Personally I would only do this as a proof of concept for the moment, but in the mid term it might become a great way to develop your whole stack in Go. Let’s wait and see!

Conclusion

Now you know how to develop a whole website in Go and deploy it on a Linux server. No frontend server to install, no dependency hell, and great performances… Pretty straightforward isn’t it?

If you want to learn how to build a Single Page App (SPA) with Go and Vue.js, have a look at my other post here.

Existe aussi en français
Torrengo: a concurrent torrent searcher CLI in Go

I’ve used the Torrench program for quite a while as this is a great Python tool for easy torrent searching in console (CLI). I figured I could try to develop a similar tool in Go (Golang) as it would be a good opportunity for me to learn new stuffs plus I could make this program concurrent and thus dramatically improve speed (Torrench is a bit slow in this regards).

Let me tell you about this new program’s features here, show you how I handled concurrency in it, and then tell you how I organized this Go library for easy reuse by third-parties.

Torrengo Features

Torrengo is a command line (CLI) program that concurrently searches torrents (torrent files and magnet links) from various torrent websites. Looking for torrents has always been a nightmare to me as it means launching the same search on multiple websites, facing broken websites that are not responding, and being flooded by awful ads… so this little tool is here to solve these problems. Here is an insight of the core features of Torrengo as of this writing.

Simple CLI Program

I realized I did not have much experience writing CLI tools so I figured this little project would be a good way to improve my skills in writing command line programs in Go.

CLI programs for very specific use cases like this one and with very few user interactions needed can prove to be much clearer and lighter than a GUI. They can also easily be plugged to other programs.

You can launch a new search like this:

./torrengo Alexandre Dumas

It will return all results found on all websites sorted by seeders, and it’s up to you to decide which one to download then:

Torrengo output

You can also decide to search only specific websites (let’s say The Pirate Bay and Archive.org):

./torrengo -s tpb,arc Alexandre Dumas

If you want to see more logs (verbose mode), just add the -v flag.

Fast

Thanks to Go being a very efficient language and easy to code concurrent programs with, Torrengo is scraping results very fast. Instead of fetching every website sequentially one by one, it fetches everything in parallel.

Torrengo is as fast as the slowest HTTP response then. Sometimes the slowest HTTP response can take more time than you want (for example Archive.org is pretty slow from France as of this writing). In that case, as Torrengo supports timeouts just set a timeout like this:

./torrengo -t 2000 Alexandre Dumas

The above stops every HTTP requests that take more than 2 seconds and returns the other results found.

Use The Pirate Bay Proxies

The Pirate Bay urls are changing quite often and most of them work erratically. To tackle this, the program concurrently launches a search on all The Pirate Bay urls found on a proxy list website (proxybay.bz as of this writing) and retrieves torrents from the fastest response.

The returned url’s HTML is also checked in-depth because some proxies sometimes return a page with no HTTP error but the page actually does not contain any result…

Bypass Cloudflare’s Protection on Torrent Downloads

Downloading torrent files from the Torrent Downloads’ website is difficult because the latter implements a Cloudflare protection that consists of a Javascript challenge to answer. Torrengo tries to bypass this protection by answering the Javascript challenge, which is working 90% of the times as of this writing.

Authentication

Some torrent websites like Ygg Torrent (previously t411) need users to be authenticated to download the torrent files. Torrengo handles this by asking you for credentials (but of course you need to have an account in the first place).

Word of caution: never trust programs that ask for your credentials without having a look into the source code first.

Open in Torrent Client

Instead of copy pasting magnet links found by Torrengo to your torrent client or looking for the downloaded torrent file and opening it, you can just say that you want to open torrent in local client right away. Only supported for Deluge as of this writing.

Concurrency

Thanks to Go’s simplicity, the way concurrency is used in this program is not hard to understand. This is especially true here because concurrency is perfect for networking programs like this one. Basically, the main goroutine spawns one new goroutine for each scraper and retrieves results in structs through dedicated channels. tpb is a special case as the library spawns n goroutines, n being the number of urls found on the proxy website.

I take the opportunity to say that I learned 2 interesting little things regarding channels best practices:

  • every scraping goroutines return both an error and a result. I decided to create a dedicated channel for the result (a struct) and another one for the error, but I know that some prefer to return both inside a single struct. It’s up to you, both are fine.
  • goroutines spawned by tpb don’t need to return an error but just say when they are done. This is something call an “event”. I did it by sending an empty struct (struct{}{}) through the channel which seems the best to me from a memory footprint standpoint as an empty struct weighs nothing. Instead you could also pass an ok boolean variable which is perfectly fine too.

Library Structure

This lib is made up of multiple sub-libraries: one for each scraper. The goal is that every scraping library is loosely coupled to the Torrengo program so it can be reused easily in other third-party projects. Every scraper has its own documentation.

Here is the current project’s structure:

torrengo
├── arc
│   ├── download.go
│   ├── README.md
│   ├── search.go
├── core
│   ├── core.go
├── otts
│   ├── download.go
│   ├── README.md
│   ├── search.go
├── td
│   ├── download.go
│   ├── README.md
│   ├── search.go
├── tpb
│   ├── proxy.go
│   ├── README.md
│   ├── search.go
├── ygg
│   ├── download.go
│   ├── login.go
│   ├── README.md
│   ├── search.go
├── README.md
├── torrengo.go

Let me give you more details about how I organized this Go project (I wish I had more advice like this regarding Go project structures’ best practice before).

One Go Package Per Folder

Like I said earlier, I created one lib for each scraper. All files contained in the same folder belong to the same Go package. In the chart above there are 5 scraping libs:

  • arc (Archive.org)
  • otts (1337x)
  • td (Torrent Downloads)
  • tpb (The Pirate Bay)
  • ygg (Ygg Torrent)

Each lib can be used in another Go project by importing it like this (example for Archive.org):

import "github.com/juliensalinas/torrengo/arc"

The core lib contains things common to all scraping libs and thus will be imported by all of them.

Each scraping lib has its own documentation (README) in order for it to be really decoupled.

Multiple Files Per Folder

Every folder contain multiple files for easier readability. In Go, you can create as many files as you want for the same package as long as you mention the name of the package at the top of the file. For example here every files contained in the arc folder have the following line at the top:

package arc

For example in arc I organized things so that every structs, methods, functions, variables, related to the download of the final torrent file go into the download.go file.

Command

Some Go developers put everything related to commands, that’s to say the part of your program dedicated to interacting with the user, in a special cmd folder.

Here I preferred to put this in a torrengo.go file at the root of the project which seemed more appropriate for a small program like this.

Conclusion

I hope you’ll like Torrengo and I would love to receive feed-backs about it! Feed-backs can be about the features of Torrengo of course, but if you’re a Go programmer and you think some parts of my code can be improved I’d love to hear your ideas as well.

I am planning to write more articles about what I learned when writing this program!

Existe aussi en français
Nice Readings About Go, Python, and Technical Management

I sometimes find interesting information in books that I do not find on the internet and that’s why I’m reading a lot. I read quite a lot of books recently and some were not worth it, but some were great, so let me share it with you here!

Go In-Depth

If you already know the basics about Go (Golang) and want to go deeper, The Go Programming Language (Donovan & Kernighan) is a great book.

Personally, before reading this book I had developed several projects in Go but I was still missing important points about the Go language, especially about the spirit of the language and how to be idiomatic. This book really helped me. It starts with the basics but goes very deep by talking about reflection, testing, low level programming,… Examples are very concise (I hate reading too much code in books) and related to useful real life scenarios.

Go and Concurrency

If you feel the book above does not go far enough about concurrency, you can read Concurrency In Go (Cox-Buday).

This book is very short and to the point. It really emphasizes best practices about concurrency (in general, and more particularly with the Go language). This book gives you patterns that you can apply in your own concurrent programs.

Tips and Tooling in Python

Python is quite an old language so many great tools exist around this language (libraries, frameworks, etc.). It’s not easy to have a clear view about all the possible options though. The Hitchhiker’s Guide to Python! (Kenneth Reitz) summarizes everything you should know about Python’s ecosystem. It can save you a lot of time. Careful: this is not a coding book!

Technical Management

If you are a in charge of a team of developers you already know that this is a very tough task. If you’re not yet, be prepared! In both cases it’s important to work on it. Managing and leading people is a very difficult job, especially in the programming field (lead developer, technical manager, CTO, …). This book was a great help to me: Managing the Unmanageable (Mantle & Lichty). It does not overwhelm you with theoretical concepts about management but instead gives you lots of very useful tips easy to apply.

Conclusion

If you read one of them please let me know your thoughts!

Existe aussi en français