Skip to content

Commit

Permalink
Added automatic prefetching
Browse files Browse the repository at this point in the history
I have finished writing the code and it seems to work, now tests are fine, test coverage is around 81%. There are no breaking changes with the original version, with a original version's config file, it would run exactly as before and viceversa. No breaking changes with configs.
The only change that may confuse you is that the function "downloadFile" got renamed to "downloadFileAndSend", cause it does both. Now downloadFile simply downloads a file (I use it for prefetching).

I have added a sqlite3 database to keep track of the downloaded databases links, integrated with GORM. Can be hard to debug issues with GORM, especially because it is not strongly typed and because queries use strings (which provide no hints if some parameters are not being used) and can silently fail (with requests not found error message, which happens also at every insert).
I'll try to explain by examples GORM's (default) naming conversion which is like:

TableName->table_name
ATTRIBUTE->attribute
AttRiBuTe->att_ri_bu_te
It should be intuitive, but I wasted about half an hour on that, I don't want to waste your time too.
The database could be useful for deploying the "stats" thing written as a todo.

I'll try to describe the prefetching mechanism as how I have implemented it right now:

When a .db file is requested and downloaded, I save the link and the last time it got requested in the db (MirrorDB table), db links are hard to generate otherwise.
When a package file whose name is structured as name-version-subversion-arch.pkg.tar.zst is requested, I save name, version (merging version-subversion into version), arch and the local repo name in the db (Package table). I also keep track of when it got prefetched and when got downloaded.
When the prefetching routine is being called, as first I delete "old" MirrorDB entries (old = if not downloaded after ttl_unaccessed_in_days days, it gets deleted from the table) and dead packages. A package is assumed dead if:
after ttl_unaccessed_in_days days it had been updated, it hasn't been actively requested by a client
after ttl_unupdated_in_days days in which it had never been requested nor updated. Right now the default value is quite high, set at 300 days (after not updating a package for 300 days with no client asking for it, I consider it dead, so I won't prefetch it anymore)
Then it starts to download the db files stored in the MirrorDB table into "tmp-db" folder in the pacoloco cache directory, it extract them, parse them and once parsed, stored their relevant entries (package name, arch and version) in RepoPackage table. Then I delete all the "tmp-db" files.
Then I do an inner join between Package table and RepoPackage table to find only the installed packages which have a different version upstream (in the join condition, i check package name and arch to be equal, and version to be different). Once those have been found, the old packages files (with their signature files) gets deleted and the new version gets prefetched
The prefetching function is the same handleRequest function adapted to use a single writer (the file one), and it calls a different function to store package data in the db (cause it would have to store different data)
It supports all the supported packages extensions so far
As future/possible improvements:

signature checking with prefetched files could be useful
Add other heuristics to decide whether discard or prefetch a package
Finally, really thanks for your work so far.

Closes anatol#27
Closes anatol#14
  • Loading branch information
Focshole authored and anatol committed Aug 27, 2021
1 parent 02cf2c1 commit 7ee29cc
Show file tree
Hide file tree
Showing 19 changed files with 2,428 additions and 67 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,4 +7,4 @@ dist: focal
language: go

go:
- 1.15
- 1.17
40 changes: 31 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# Pacoloco - caching proxy server for pacman

Pacoloco is a web server that acts if it was an Arch Linux pacman repository.
Every time pacoloco server gets a request from user it downloads this file from
real Arch Linux mirror and bypasses it to the user. Additionally pacoloco
saves this file to local filesystem cache and serves it to the future users.
It also allows to prefetch updates of the most recently used packages.

## How does it help?

Fast internet is still a luxury in many parts of the world. There are many places
where access to internet is expensive and slow due to geographical and economical
reasons.
Expand All @@ -18,24 +21,30 @@ _Pacoloco_ does not mirror the whole Arch repository. It only downloads files ne
You can think of pacoloco as a lazy Arch mirror.

## Install

### Arch systems

Install [pacoloco-git package](https://aur.archlinux.org/packages/pacoloco-git/) from AUR repository.
Then start its systemd service: `# systemctl start pacoloco`.

### Docker

There is a pacoloco docker image available. It can be used with:
`docker run -p 9129:9129 -v /path/to/config/pacoloco.yaml:/etc/pacoloco.yaml -v /path/to/cache:/var/cache/pacoloco/pkgs pacoloco`. You need to provide a config file and a path to store the package cache.

## Build from sources

Optionally you can build the binary from sources using `go build` command.

## Configure

The server configuration is located at `/etc/pacoloco.yaml`. Here is an example how the config file looks like:

```
```yaml
port: 9129
cache_dir: /var/cache/pacoloco
purge_files_after: 360000 # 360000 seconds or 100 hours
download_timeout: 200 # 200 seconds
purge_files_after: 360000 # 360000 seconds or 100 hours, 0 to disable
download_timeout: 3600 # download will timeout after 3600 seconds
repos:
archlinux:
urls:
Expand All @@ -45,19 +54,27 @@ repos:
url: http://pkgbuild.com/~anatolik/quarry/x86_64
sublime:
url: https://download.sublimetext.com/arch/stable/x86_64
prefetch: # optional section, add it if you want to enable prefetching
cron: 0 0 3 * * * * # standard cron expression (https://en.wikipedia.org/wiki/Cron#CRON_expression) to define how frequently prefetch, see https://github.com/gorhill/cronexpr#implementation for documentation.
ttl_unaccessed_in_days: 30 # defaults to 30, set it to a higher value than the number of consecutive days you don't update your systems
# It deletes and stop prefetch packages(and db links) when not downloaded after ttl_unaccessed_in_days days that it had been updated.
ttl_unupdated_in_days: 300 # defaults to 300, it deletes and stop prefetch packages which hadn't been either updated upstream or requested for ttl_unupdated_in_days.
```
* `cache_dir` is the cache directory, this location needs to read/writable by the server process.
* `purge_files_after` specifies inactivity duration (in seconds) after which the file should be removed from the cache. This functionality uses unix "AccessTime" field to find out inactive files. Default value is `0` that means never run the purging.
* `port` is the server port.
* `download_timeout` is a timeout (in seconds) for internet->cache downloads. If a remote server gets slow and file download takes longer than this will be terminated. Default value is `0` that means no timeout.
* `repos` is a list of repositories to mirror. Each repo needs `name` and url of its Arch mirrors. Note that url can be specified either with `url` or `urls` properties, one and only one can be used for each repo configuration.
* The `prefetch` section allows to enable packages prefetching. Comment it out to disable it.
* To test out if the cron value does what you'd expect to do, check cronexpr [implementation](https://github.com/gorhill/cronexpr#implementation) or [test it](https://play.golang.org/p/IK2hrIV7tUk)

With the example configured above `http://YOURSERVER:9129/repo/archlinux` looks exactly like an Arch pacman mirror.
For example a request to `http://YOURSERVER:9129/repo/archlinux/core/os/x86_64/openssh-8.2p1-3-x86_64.pkg.tar.zst` will be served with file content from `http://mirror.lty.me/archlinux/core/os/x86_64/openssh-8.2p1-3-x86_64.pkg.tar.zst`

Once the pacoloco server is up and running it is time to configure the user host. Modify user's `/etc/pacman.conf` with

```
```conf
[core]
Include = /etc/pacman.d/mirrorlist
Expand All @@ -75,19 +92,21 @@ Server = http://yourpacoloco:9129/repo/sublime
```

And `/etc/pacman.d/mirrorlist` with
```

```yaml
Server = http://yourpacoloco:9129/repo/archlinux/$repo/os/$arch
```

That's it. Since now pacman requests will be proxied through our pacoloco server.

## Handling multiple architectures

*pacoloco* does not care about the architecture of your repo as it acts as a mere proxy.

Thus it can handle multiple different arches transparently. One way to do it is to add multiple
repositories with names `foobar_$arch` e.g.:

```
```yaml
repos:
archlinux_x86_64:
urls:
Expand All @@ -102,17 +121,20 @@ repos:
Then modify user's `/etc/pacman.d/mirrorlist` and add

For x86_64:
```

```yaml
Server = http://yourpacoloco:9129/repo/archlinux_$arch/$repo/os/$arch
```

For armv7h:
```

```yaml
Server = http://yourpacoloco:9129/repo/archlinux_$arch/$arch/$repo
```

For x86:
```

```yaml
Server = http://yourpacoloco:9129/repo/archlinux_$arch/$arch/$repo
```

Expand Down
47 changes: 40 additions & 7 deletions config.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,26 @@ import (
"log"
"os/user"

"github.com/gorhill/cronexpr"
"golang.org/x/sys/unix"
"gopkg.in/yaml.v3"
)

const DefaultPort = 9129
const DefaultCacheDir = "/var/cache/pacoloco"
const DefaultTTLUnaccessed = 30
const DefaultTTLUnupdated = 200
const DefaultDBName = "sqlite-pkg-cache.db"

type Repo struct {
Url string `yaml:"url"`
Urls []string `yaml:"urls"`
URL string `yaml:"url"`
URLs []string `yaml:"urls"`
}

type RefreshPeriod struct {
Cron string `yaml:"cron"`
TTLUnaccessed int `yaml:"ttl_unaccessed_in_days"`
TTLUnupdated int `yaml:"ttl_unupdated_in_days"`
}

type Config struct {
Expand All @@ -22,14 +32,16 @@ type Config struct {
Repos map[string]Repo `yaml:"repos,omitempty"`
PurgeFilesAfter int `yaml:"purge_files_after"`
DownloadTimeout int `yaml:"download_timeout"`
Prefetch *RefreshPeriod `yaml:"prefetch"`
}

var config *Config

func parseConfig(raw []byte) *Config {
var result = &Config{
var result = Config{
CacheDir: DefaultCacheDir,
Port: DefaultPort,
Prefetch: nil,
}

if err := yaml.Unmarshal(raw, &result); err != nil {
Expand All @@ -38,16 +50,16 @@ func parseConfig(raw []byte) *Config {

// validate config
for name, repo := range result.Repos {
if repo.Url != "" && len(repo.Urls) > 0 {
if repo.URL != "" && len(repo.URLs) > 0 {
log.Fatalf("repo '%v' specifies both url and urls parameters, please use only one of them", name)
}
if repo.Url == "" && len(repo.Urls) == 0 {
if repo.URL == "" && len(repo.URLs) == 0 {
log.Fatalf("please specify url for repo '%v'", name)
}
}

if result.PurgeFilesAfter < 10*60 && result.PurgeFilesAfter != 0 {
log.Fatalf("purge_files_after period is too low (%v) please specify at least 10 minutes", result.PurgeFilesAfter)
log.Fatalf("'purge_files_after' period is too low (%v) please specify at least 10 minutes", result.PurgeFilesAfter)
}

if unix.Access(result.CacheDir, unix.R_OK|unix.W_OK) != nil {
Expand All @@ -57,6 +69,27 @@ func parseConfig(raw []byte) *Config {
}
log.Fatalf("directory %v does not exist or isn't writable for user %v", result.CacheDir, u.Username)
}
// validate Prefetch config

if result.Prefetch != nil {

return result
// set default values
if result.Prefetch.TTLUnaccessed == 0 {
result.Prefetch.TTLUnaccessed = DefaultTTLUnaccessed
}
if result.Prefetch.TTLUnupdated == 0 {
result.Prefetch.TTLUnupdated = DefaultTTLUnupdated
}
// check Prefetch config
if result.Prefetch.TTLUnaccessed < 0 {
log.Fatal("'ttl_unaccessed_in_days' value is too low. Please set it to a value greater than 0")
}
if result.Prefetch.TTLUnupdated < 0 {
log.Fatal("'ttl_unupdated_in_days' value is too low. Please set it to a value greater than 0")
}
if _, err := cronexpr.Parse(result.Prefetch.Cron); err != nil {
log.Fatal("Invalid cron string (if you don't know how to compose them, there are many online utilities for doing so). Please check https://github.com/gorhill/cronexpr#implementation for documentation.")
}
}
return &result
}
49 changes: 46 additions & 3 deletions config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,17 @@ package main
import (
"reflect"
"testing"

"github.com/google/go-cmp/cmp"
"github.com/google/go-cmp/cmp/cmpopts"
)

// test that `parseConfig()` can successfully load YAML config
func TestLoadConfig(t *testing.T) {
var temp = t.TempDir()
parseConfig([]byte(`
port: 9129
cache_dir: /tmp
cache_dir: ` + temp + `
purge_files_after: 2592000 # 3600 * 24 * 30days
download_timeout: 200 # 200 seconds
repos:
Expand All @@ -24,6 +28,43 @@ repos:
`))
}

// test with prefetch set
func TestLoadConfigWithPrefetch(t *testing.T) {
got := parseConfig([]byte(`
cache_dir: /tmp
purge_files_after: 2592000 # 3600 * 24 * 30days
prefetch:
cron: 0 0 3 * * * *
ttl_unaccessed_in_days: 5
download_timeout: 200
port: 9139
repos:
archlinux:
url: http://mirrors.kernel.org/archlinux
`))
want := &Config{
CacheDir: `/tmp`,
Port: 9139,
Repos: map[string]Repo{
"archlinux": Repo{
URL: "http://mirrors.kernel.org/archlinux",
},
},
PurgeFilesAfter: 2592000,
DownloadTimeout: 200,
Prefetch: &RefreshPeriod{Cron: "0 0 3 * * * *", TTLUnaccessed: 5, TTLUnupdated: 200},
}
if !cmp.Equal(*got, *want, cmpopts.IgnoreFields(Config{}, "Prefetch")) {
t.Errorf("got %v, want %v", *got, *want)
}
gotR := *(*got).Prefetch
wantR := *(*want).Prefetch
if !cmp.Equal(gotR, wantR) {
t.Errorf("got %v, want %v", gotR, wantR)
}
}

// test that `purgeFilesAfter` is being read correctly
func TestPurgeFilesAfter(t *testing.T) {
got := parseConfig([]byte(`
Expand All @@ -38,11 +79,12 @@ repos:
Port: 9129,
Repos: map[string]Repo{
"archlinux": Repo{
Url: "http://mirrors.kernel.org/archlinux",
URL: "http://mirrors.kernel.org/archlinux",
},
},
PurgeFilesAfter: 2592000,
DownloadTimeout: 0,
Prefetch: nil,
}

if !reflect.DeepEqual(got, want) {
Expand All @@ -63,11 +105,12 @@ repos:
Port: 9129,
Repos: map[string]Repo{
"archlinux": Repo{
Url: "http://mirrors.kernel.org/archlinux",
URL: "http://mirrors.kernel.org/archlinux",
},
},
PurgeFilesAfter: 0,
DownloadTimeout: 0,
Prefetch: nil,
}

if !reflect.DeepEqual(got, want) {
Expand Down
9 changes: 7 additions & 2 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,12 @@ module github.com/anatol/pacoloco
go 1.15

require (
github.com/google/go-cmp v0.4.0
golang.org/x/sys v0.0.0-20210113181707-4bcb84eeeb78
github.com/google/go-cmp v0.5.6
github.com/gorhill/cronexpr v0.0.0-20180427100037-88b0669f7d75
github.com/mattn/go-sqlite3 v1.14.8 // indirect
golang.org/x/sys v0.0.0-20210823070655-63515b42dcdf
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 // indirect
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b
gorm.io/driver/sqlite v1.1.4
gorm.io/gorm v1.21.14
)
26 changes: 21 additions & 5 deletions go.sum
Original file line number Diff line number Diff line change
@@ -1,10 +1,26 @@
github.com/google/go-cmp v0.4.0 h1:xsAVV57WRhGj6kEIi8ReJzQlHHqcBYCElAvkovg3B/4=
github.com/google/go-cmp v0.4.0/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
golang.org/x/sys v0.0.0-20210113181707-4bcb84eeeb78 h1:nVuTkr9L6Bq62qpUqKo/RnZCFfzDBL0bYo6w9OJUqZY=
golang.org/x/sys v0.0.0-20210113181707-4bcb84eeeb78/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543 h1:E7g+9GITq07hpfrRu66IVDexMakfv52eLZ2CXBWiKr4=
github.com/google/go-cmp v0.5.6 h1:BKbKCqvP6I+rmFHt06ZmyQtvB8xAkWdhFyr0ZUNZcxQ=
github.com/google/go-cmp v0.5.6/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/gorhill/cronexpr v0.0.0-20180427100037-88b0669f7d75 h1:f0n1xnMSmBLzVfsMMvriDyA75NB/oBgILX2GcHXIQzY=
github.com/gorhill/cronexpr v0.0.0-20180427100037-88b0669f7d75/go.mod h1:g2644b03hfBX9Ov0ZBDgXXens4rxSxmqFBbhvKv2yVA=
github.com/jinzhu/inflection v1.0.0 h1:K317FqzuhWc8YvSVlFMCCUb36O/S9MCKRDI7QkRKD/E=
github.com/jinzhu/inflection v1.0.0/go.mod h1:h+uFLlag+Qp1Va5pdKtLDYj+kHp5pxUVkryuEj+Srlc=
github.com/jinzhu/now v1.1.1/go.mod h1:d3SSVoowX0Lcu0IBviAWJpolVfI5UJVZZ7cO71lE/z8=
github.com/jinzhu/now v1.1.2 h1:eVKgfIdy9b6zbWBMgFpfDPoAMifwSZagU9HmEU6zgiI=
github.com/jinzhu/now v1.1.2/go.mod h1:d3SSVoowX0Lcu0IBviAWJpolVfI5UJVZZ7cO71lE/z8=
github.com/mattn/go-sqlite3 v1.14.5/go.mod h1:WVKg1VTActs4Qso6iwGbiFih2UIHo0ENGwNd0Lj+XmI=
github.com/mattn/go-sqlite3 v1.14.8 h1:gDp86IdQsN/xWjIEmr9MF6o9mpksUgh0fu+9ByFxzIU=
github.com/mattn/go-sqlite3 v1.14.8/go.mod h1:NyWgC/yNuGj7Q9rpYnZvas74GogHl5/Z4A/KQRfk6bU=
golang.org/x/sys v0.0.0-20210823070655-63515b42dcdf h1:2ucpDCmfkl8Bd/FsLtiD653Wf96cW37s+iGx93zsu4k=
golang.org/x/sys v0.0.0-20210823070655-63515b42dcdf/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1 h1:go1bK/D/BFZV2I8cIQd1NKEZ+0owSTG1fDTci4IqFcE=
golang.org/x/xerrors v0.0.0-20200804184101-5ec99f83aff1/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405 h1:yhCVgyC4o1eVCa2tZl7eS0r+SDo693bJlVdllGtEeKM=
gopkg.in/check.v1 v0.0.0-20161208181325-20d25e280405/go.mod h1:Co6ibVJAznAaIkqp8huTwlJQCZ016jof/cbN4VW5Yz0=
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b h1:h8qDotaEPuJATrMmW04NCwg7v22aHH28wwpauUhK9Oo=
gopkg.in/yaml.v3 v3.0.0-20210107192922-496545a6307b/go.mod h1:K4uyk7z7BCEPqu6E+C64Yfv1cQ7kz7rIZviUmN+EgEM=
gorm.io/driver/sqlite v1.1.4 h1:PDzwYE+sI6De2+mxAneV9Xs11+ZyKV6oxD3wDGkaNvM=
gorm.io/driver/sqlite v1.1.4/go.mod h1:mJCeTFr7+crvS+TRnWc5Z3UvwxUN1BGBLMrf5LA9DYw=
gorm.io/gorm v1.20.7/go.mod h1:0HFTzE/SqkGTzK6TlDPPQbAYCluiVvhzoA1+aVyzenw=
gorm.io/gorm v1.21.14 h1:NAR9A/3SoyiPVHouW/rlpMUZvuQZ6Z6UYGz+2tosSQo=
gorm.io/gorm v1.21.14/go.mod h1:F+OptMscr0P2F2qU97WT1WimdH9GaQPoDW7AYd5i2Y0=
Loading

0 comments on commit 7ee29cc

Please sign in to comment.