Here is yet another answer to the perennial question of who is accessing my website and what pages are they looking at? A long time ago this website ran Google Analytics but I got put off by the privacy impact it has on users. I’ve since switched to another client side tracking solution Goat Counter which doesn’t uniquely track users and is open source.
These client side tracking solutions both slow the website down for users, and completely exclude the ever-growing segment of Ad and Tracking blockers (as they rightly should be!)
Because this is a static site hosted on AWS S3 and served with the Cloudfront CDN I wasn’t sure what ‘server side’ log data is available. I did some research and found not only do Cloudfront logs provide the information I’m interested in, but that they can be parsed and viewed offline by open source software GoAccess.
This post details how to set up Cloudfront CDN logging, downloading and combining those logs, and then using the Go Access to get an interactive dashboard like below.
Enable Cloudfront Logging
Assuming you’ve already set up a Cloudfront distribution for your site, you just need to enable standard logging. Cloudfront will write access logs of every user request into a bucket you specify. Logs are written every ~20 minutes in a gzip compressed format. This site gets low traffic and around three months of access logs uses 15mb of storage.
After enabling the setting check the bucket in an hour or so to confirm that logs are being written.
Download and Compress Logs
We run GoAccess on a single log file, yet Cloudfront produces thousands of log files in ~20 minute increments. We must download all log files and combine them into a single file. Here’s a simple bash script which does it for us.
#!/usr/bin/env bash aws s3 sync s3://your-cloudfront-log-bucket/ . cat *.gz > combined.log.gz gzip -d combined.log.gz rm *.gz
This script assumes the
aws CLI tool is installed and configured locally. You’ll also need to install GoAccess for the next command.
GoAccess and Cloudfront
With our single log we can run GoAccess to generate the HTML analytics report.
goaccess combined.log --log-format=CLOUDFRONT -o report.html
Or run it interactively in the terminal. See the GoAccess man page for more detail.