Review infrastructure
Right now, the hosting infrastructure is pretty ad-hoc, and maybe not so secure. It may be worthwhile to rethink how we deploy these websites.
Current situation
- Repositories have three branches:
main
,publish
, andpublish-html
. When a group wants to make changes visible to the public (students), they mergemain
intopublish
. - A commit to
publish
triggers a CI pipeline, consisting of two jobs:- The first job builds the book and uploads the HTML as a job artifact.
- The second job takes the artifact of the first job, and commits the changes to the
publish-html
branch. This is done using andrsync
command and requires a GitLab access token.
- The
publish-html
branch is cloned to our server (the same server that's hosting the GitLab runner). A cronjob is running, checking for changes to this branch every 5 minutes. If a change is detected, it pull the changes and the website is updated after a couple of minutes.
Problems with this setup
- Apache web server is installed bare-metal, which, apparently, is not very secure (root access to the server could be obtained through the internet).
- Compute capability of the server is limited (not a huge problem, yet).
- If given the right permissions, people can still mess up the
publish-html
branch. - When books become too large, uploading of the job artifacts will fail. This has already happened with the MUDE archive.
Alternatives
- Deploying books as Docker containers. We can build off of the httpd container, which improves security over the bare-metal Apache server.
- This complicates the whole CI/CD process:
- How do we automatically deploy Docker containers? Kubernetes? We need some other server then.
- No container registry available on the regular TU Delft GitLab instance
- This complicates the whole CI/CD process:
- Do nothing, eventually let the library handle deployment.