Open Summer of Code 2020

Once again (as seems to have become sort of tradition) I am late with my osoc blogpost. But I still wanted to write something about the project we made and my experience working on it.

This project was in cooperation with the Flemish ABB (Agentschap Binnenlands Bestuur, which translates to something like national/domestic administration agency). They want to use Linked Open Data technology and Solid to simplify and decentralize election data (mostly data around elections, such as donations and campaigning expenses).

There were two big differences with the previous two editions I attended. The first one of course being that this edition was remote due to the whole deadly pandemic thing currently going on. This actually went really well, and although I missed the after-work hangouts, I was very impressed with how Miet and all the other organizers handled everything!

The other big difference was that our team was much larger than the last two times. The first time we had a team of three people (four after I joined mid-project), and last year there were also only three of us (although we did also get a lot of help from Gilles). This year, we were with nine students. That is a huge group for a hackathon like this, and we realized it would be quite a challenge to communicate well and divide the work so everyone would know what to do, especially during this remote edition.

The first thing we decided to do to keep everything semi-structured was to divide our team in three sub-teams: two frontend teams (one to submit the data and one for end users to view submitted data) and a database/API team. I was in the database team, our goal was to create a central place to index all the Solid pods. I actually spent most of my time setting up the deployment, so that is what the focus in this post will be on.

The thing about Solid is that data is stored decentralized, on a user's own Solid pod. This ensures that the user has full control over their own data. It also means, however, that if we want to fetch the data, we need to store a list of where it's stored in some central place. Our (maybe not so elegant, but simple and effective) solution was to create a simple API with a Postgres database.

Our API was built using a Python library called Sanic. We picked this mostly because of our previous experience with Python and Sanic's similarity to Flask (except it's ~6 times as fast). The performance difference doesn't really matter that much for a proof-of-concept project, but it seemed like a neat little framework.

Originally, I was just going to deploy our application directly to our server, and optionally make a nice Docker image for it if we had time left. But after spending way too much time trying to get an up-to-date version of Python running on our Ubuntu 18.04 LTS server, I gave up and Dockerized the thing. And as someone with some very limited Docker experience, wrapping our app in an image (deployment was a bit more complicated, as I will discuss later) was surprisingly easy!

Our production image would be deployed to a single-server Docker Swarm instance. This just felt like a slightly more stable solution than running docker-compose on our server and the syntax for Swarm stack files are very similar to docker-compose. It's not the same though, and this will bite me in the ass later on. Hard.

Flowchart roughly describing my process of setting up our server

One of the things I really struggled with was setting up TLS. At first my plan was to put our API behind a Caddy reverse proxy. I have used Caddy in the past (not in Docker, just installed system-wide) and it was very easy to set up, the config files are simple and it automatically sets up a certificate from Let's Encrypt. Trying to set it up with Docker however, resulted in a timeout in the ACME challenge (probably some weird network fuckery specific to Swarm). I spent some time trying to fix it but reluctantly gave up after a while (it's only a proof-of-concept site anyway right?) However, our frontend is deployed to a GitHub pages site with automatic HTTPS, which means it can't send requests to a plain text website. Well shit.

I looked around a bit more for how people typically set up TLS for Docker services, and a lot of people seemed to recommend Traefik. Traefik has a pretty nice documentation, with example configurations and everything. However, it only has examples for docker-compose, which I assumed would work with Swarm because the config files are so similar. Yeah, no. For some reason, Traefik would keep serving a self-signed certificate (which is the default if there's no CA set up or if the ACME challenge failed). I made a Reddit post about it, and some kind soul (thank you so much /u/5H4D0W_ReapeR, you saved my sanity) pointed out some small but crucial configuration differences between using Traefik with docker-compose and with Swarm.

However, it still wouldn't work on my test server (I wanted to use a similar setup for my personal website). I suspected for a while it may have something to do with the fact that I'm using Cloudflare for my DNS (even though I disabled all their caching/mitm stuff), so I tested it on the Open Summer of Code server just to check - and sure enough, it worked fine on there. A lot of painful troubleshooting later: it was because I had my DNS set up to point to Scaleway's public DNS (randomcharacters.pub.instances.scw.cloud) using a CNAME record, instead of directly pointing to the server's IP address with an A record. I still don't know why Traefik has a problem with this (and honestly I don't want to find out), but apparently this is a problem that more people have had. Yep, it was DNS. Because of course it was DNS.

Here's the final Swarm config, I hope this might be able to help people facing similar issues. Note the labels thing being under deploy instead of under api as it would be in a docker-compose config, and the --providers.docker.swarmmode=true flag. And of course don't forget to create folders for the volume mountpoints.

version: "3.8"

services:

  api:
    image: solidelections/api
    environment:
      - DEBUG=${DEBUG}
      - PG_HOST=${PG_HOST}
      - PG_DBNAME=${PG_DBNAME}
      - PG_USER=${PG_USER}
      - PG_PASS=${PG_PASS}
      - SPARQL_URL=${SPARQL_URL}
    deploy:
      labels:
        - "traefik.enable=true"
        - "traefik.http.middlewares.redirect-to-https.redirectscheme.scheme=https"
        - "traefik.http.services.solid-elections-api.loadbalancer.server.port=8000"
        - "traefik.http.routers.solid-elections-api.middlewares=redirect-to-https"
        - "traefik.http.routers.solid-elections-api.rule=Host(`${HOST}`)"
        - "traefik.http.routers.solid-elections-api.entrypoints=web"
        - "traefik.http.routers.solid-elections-api-secure.rule=Host(`${HOST}`)"
        - "traefik.http.routers.solid-elections-api-secure.tls=true"
        - "traefik.http.routers.solid-elections-api-secure.tls.certresolver=resolver"

  db:
    image: postgres:alpine
    environment:
      - POSTGRES_PASSWORD=${PG_PASS}
    volumes:
      - ./pgdata:/var/lib/postgresql/data

  traefik:
    image: traefik:2.2
    command:
      - "--providers.docker=true"
      - "--providers.docker.swarmmode=true"
      - "--providers.docker.exposedbydefault=false"
      - "--entrypoints.web.address=:80"
      - "--entrypoints.web-secure.address=:443"
      - "--certificatesresolvers.resolver.acme.tlschallenge=true"
      - "--certificatesresolvers.resolver.acme.email=${LETSENCRYPT_EMAIL}"
      - "--certificatesresolvers.resolver.acme.storage=/letsencrypt/acme.json"
      - "--log.level=DEBUG"
    ports:
      - 80:80
      - 443:443
    volumes:
      - ./letsencrypt:/letsencrypt
      - /var/run/docker.sock:/var/run/docker.sock:ro