Private Stratum 1 replica server¶
In this section of the tutorial, we will set up a Stratum 1 replica server, which is the next step towards a production-ready CernVM-FS setup.
CernVM-FS as CDN¶
The content of CernVM-FS repositories is served by a set of Stratum 1 replica servers (sometimes also called mirror servers), which together with the central Stratum 0 server and the proxy servers can be seen as a content delivery network (CDN).
A Stratum 1 replica server is a standard web server that uses the CernVM-FS server tools to provide a full mirror of one or more CernVM-FS repositories, which are served and managed through a central Stratum 0 server.
The figure below shows the CernVM-FS network for repositories in the cern.ch
domain,
including the Stratum 1 replica servers which are spread all across the world,
and a distributed hierarchy of proxy servers which fetch content from the closest Stratum 1.
Motivation¶
Next to the public Stratum 1 servers that are operated by the maintainers of a CernVM-FS repository, you can also set up your own "private" Stratum 1 replica server in your local network.
In the context of using CernVM-FS on HPC infrastructure this brings the following benefits:
- To improve the overall reliability of the setup, for example in case of (temporary) loss of connectivity to the public Stratum 1 replica servers;
- To reduce the load on the public Stratum 1 servers;
- To mitigate the impact of poor network bandwidth to the closest public Stratum 1 server;
- To improve the latency and hence start-up time of software in situations where the cache of the proxy servers has insufficient capacity;
Recommendations¶
When setting up a Stratum 1 replica server, you should take the following recommendations into account:
- A RAID-protected low latency storage setup (like SSD
or NVMe) should be used,
because the CernVM-FS server component will run lots of
stat
system calls against it. - A standard Apache web server should be installed, which should be close to the low latency storage. Directory listing is not required.
- HTTP connections to port
80
must be possible.
Recommendations on monitoring (click to expand)
It is strongly recommended to actively monitor a Stratum 1 replica server, in particular:
- CPU usage;
- disk usage;
- I/O load;
- network bandwidth and latency;
- log messages produced in syslog;
The cvmfs-servermon
package can be used to watch for problems
in every repository’s .cvmfs_status.json
status file.
Recommendations for a high-availability setup (click to expand)
To create a high-availability setup, it is not recommended to use two or more separate Stratum 1 replica servers in a single round-robin service.
Since they will be updated at different rates, that would cause errors when a client sees an updated catalog from one Stratum 1, but then tries to read corresponding data files from another that does not yet have the files.
Instead, different Stratum 1 replica servers
should either be separately configured on the clients, or a pair can be configured
as a high availability active/standby pair using the
cvmfs-hastratum1
package.
An active/standby pair can also be managed by switching a DNS name between two different servers.
Recommendations for a public Stratum 1 replica server (click to expand)
For a public Stratum 1 replica server, it is recommended to install a Squid frontend
in front of the Stratum 1, which should be configured as a
reverse proxy, and installed
on the same system as the web server, to reduce the number of points of failure.
The optimized frontier-squid
distribution is recommended.
For more information, see the CernVM-FS
configuration.
Alternatively, separate Squid proxy server machines can be configured in a round-robin DNS configuration and each forward to the Apache server. Note however that if any of them are down the entire service will be considered down by CernVM-FS clients. The impact of this can be mitigated through front end hardware load balancer that quickly takes a system that is down out of service.
Recommendations on garbage collection (click to expand)
If any CernVM-FS repositories being replicated have garbage collection enabled, the Stratum 1 also needs to run garbage collection in order to prevent the disk space usage from growing rapidly.
See the CernVM-FS documentation for more details.
Using S3-compatible storage (Amazon S3, Azure Blob, Ceph)
CernVM-FS can store data directly to S3-compatible storage systems, such as Amazon S3, Azure Blob, or Ceph.
For more information, see the CernVM-FS documentation.
Setup procedure¶
To set up a Stratum 1 replica server and configure it to replicate a particular CernVM-FS repository, you should:
- Install the
cvmfs-server
package; - Add the public key of the CernVM-FS repository you want to replicate to
/etc/cvmfs/keys/
; - Create the repository replica;
- Run the initial synchronisation;
- Configure
cron
to perform periodic synchronisation;
In the sections below, we will set up a Stratum 1 replica server for the EESSI CernVM-FS repository software.eessi.io
.
Installing CernVM-FS server¶
Start with installing the cvmfs-server
package which provides the CernVM-FS server tools.
Although we won't actually use the functionality that requires it,
we also need to install a package that provides the mod_wsgi
Apache adapter module.
# install cvmfs-release package to add yum repository
sudo yum install -y https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest.noarch.rpm
# install CernVM-FS server package
sudo yum install -y cvmfs-server
# install mod_wsgi Apache adapter module
# (on versions older than equivalent to RHEL8, install mod_wsgi instead)
sudo yum install -y python3-mod_wsgi
# install cvmfs-release package to add apt repository
sudo apt install lsb-release
curl -OL https://ecsft.cern.ch/dist/cvmfs/cvmfs-release/cvmfs-release-latest_all.deb
sudo dpkg -i cvmfs-release-latest_all.deb
sudo apt update
# install CernVM-FS server package + the required mod-wsgi
sudo apt install -y cvmfs-server
# install mod_wsgi Apache adapter module
# (on Ubuntu versions older than 22.04, install libapache2-mod-wsgi instead)
sudo apt install -y libapache2-mod-wsgi-py3
Adding repository public key¶
Add the public key for the repositories in the eessi.io
domain to /etc/cvmfs/keys
:
sudo mkdir -p /etc/cvmfs/keys/eessi.io/
sudo cp eessi.io.pub /etc/cvmfs/keys/eessi.io/
You can get the contents for eessi.io.pub
from the default CernVM-FS config repository on a CernVM-FS client system
at:
/cvmfs/cvmfs-config.cern.ch/etc/cvmfs/keys/eessi.io/eessi.io.pub
Creating repository replica¶
To create the repository replica, we need to use run the cvmfs_server add-replica
command.
Synchronisation server¶
We will need to specify the server that should be used for synchronising
the repository contents.
This can either be the Stratum 0 server, or a public Stratum 1 replica server
that was set up to be used for repository synchronisation
(by having a .cvmfs_master_replica
file in the HTTP root directory).
For EESSI we should use aws-eu-west-s1-sync.eessi.science
as synchronisation server.
Disabling the GeoAPI¶
Before creating the replica, we first need to disable the Geo API service in the CernVM-FS server configuration, to avoid getting this error when creating the replica:
Installing GeoIP Database... CVMFS_GEO_LICENSE_KEY not set
fail
The Geo API service enables client systems to automatically sort Stratum 1 replica servers geographically, so the CernVM-FS client component can prioritize connecting to the closest one.
This is really only relevant for public Stratum 1 replica servers, not a private Stratum 1 replica server that is only accessible from within the local network, like the one we are setting up here.
To disable the Geo API service, set CVMFS_GEO_DB_FILE
to NONE
in /etc/cvmfs/server.local
:
echo 'CVMFS_GEO_DB_FILE=NONE' | sudo tee -a /etc/cvmfs/server.local
Creating replica¶
To actually create the replica, run the cvmfs_server add-replica
command as follows,
specifying that the current user account should be the repository owner via -o $USER
:
sync_server='aws-eu-west-s1-sync.eessi.science'
repo='software.eessi.io'
key_dir='/etc/cvmfs/keys/eessi.io'
sudo cvmfs_server add-replica -o $USER http://${sync_server}/cvmfs/${repo} ${key_dir}
Starting Apache (click to expand)
If creating the replica fails with:
Apache must be installed and running
try starting the httpd
service first:
sudo systemctl start httpd.service
sudo systemctl enable httpd.service
Initial synchronisation¶
After creating the replica, we should trigger the initial synchronisation of the repository replica,
using the cvmfs_server snapshot
command:
cvmfs_server snapshot software.eessi.io
Time for a coffee...
Since this will download the full repository contents from the synchronisation server that was specified when creating the repository replica, the initial synchronisation may take a while.
The time required for the initial synchronisation is heavily dependent on the size of the repository, and the available network latency to the synchronisation server.
Periodic synchronisation¶
To ensure that updates to the contents of the CernVM-FS repository are synchronised
automatically to the Stratum-1 replica server, we should set up a cron job to do periodic synchronisation by running cvmfs_server snapshot -a
.
Log rotation¶
Before setting up a cron job, we first need to configure log rotation, or running snapshot -a
will fail with:
/etc/logrotate.d/cvmfs does not exist!
Create /etc/logrotate.d/cvmfs
with the following contents:
/var/log/cvmfs/*.log {
weekly
missingok
notifempty
}
Cron job¶
To synchronize all active replica repositories every 5 minutes, we can create a cron job /etc/cron.d/cvmfs_stratum1_snapshot
that runs cvmfs_server snapshot -a -i
:
*/5 * * * * OWNER output=$(/usr/bin/cvmfs_server snapshot -a -i 2>&1) || echo "$output"
In here, you must replace "OWNER
" with the account name of the repository owner (cfr. the -o $USER
option
used in the add-replica
command above).
The -a
option enables synchronisation of all active replica repositories,
while -i
indicates that that repositories for which an initial snapshot has not been run should be skipped.
To verify that periodic synchronisation is working correctly, check the contents of the log file:
/var/log/cvmfs/snapshots.log
More information¶
For more information on the setup and configuration of a Stratum 1 replica server, see the CernVM-FS documentation, in particular the following sections:
Using the private Stratum 1¶
To actually use the "private" Stratum 1 replica server that has been set up we need to change the configuration on each CernVM-FS client system.
Initially, we will use only the private Stratum 1 replica server, without a proxy server.
Remove CVMFS_HTTP_PROXY
from client configuration
Do make sure that the CVMFS_HTTP_PROXY
line is removed from the CernVM-FS configuration file
/etc/cvmfs/default.local
on the client system, and that the CernVM-FS configuration
was reloaded (with sudo cvmfs_config reload
), as was instructed here.
After we have verified that the Stratum 1 is used by the client system, we will bring the proxy server back in the game, and demonstrate how to use both the proxy server and the Stratum 1 replica server.
Only private Stratum 1¶
Client configuration¶
The CVMFS_SERVER_URL
configuration setting on a client system:
- Is a string value with a semicolon-separated (
;
) list of known Stratum 1 servers; - Should be enclosed in quotes;
- Specifies each Stratum 1 as a URL that starts with
http://
, and ends with/cvmfs/@fqrn@
For example:
CVMFS_SERVER_URL="http://s1.test.eu/cvmfs/@fqrn@;http://s1.test.us/cvmfs/@fqrn@"
The @fqrn@
substring is replaced by CernVM-FS with the fully qualified repository name, like software.eessi.io
.
CVMFS_SERVER_URL
should be specified in the domain-specific configuration file in /etc/cvmfs
that is
relevant for the CernVM-FS repository we have replicated on our Stratum 1.
For software.eessi.io
, we should add the following to /etc/cvmfs/domain.d/eessi.io.local
:
CVMFS_SERVER_URL="http://STRATUM1_IP/cvmfs/@fqrn@"
in which "STRATUM1_IP
" must be replaced with (you guessed it)
the IP address or hostname of the private Stratum 1 replica server.
To apply the configuration change, run cvmfs_config reload
:
sudo cvmfs_config reload
Testing¶
To verify that the client configuration was changed correctly, use cvmfs_config stat
(which requires that the repository is mounted):
ls /cvmfs/software.eessi.io
cvmfs_config stat -v software.eessi.io
The output line that starts with Connection
should mention online
, like this:
Connection: http://.../cvmfs/software.eessi.io through proxy DIRECT (online)
The proxy DIRECT
indicates that we are not using a proxy server yet in this setup.
You can also use curl
to check the connection to the Stratum 1, by letting it print the HTTP header
for the .cvmfspublished
file in the root of the repository:
curl --head http://STRATUM1_IP/cvmfs/software.eessi.io/.cvmfspublished
the first line of the output should be something like:
HTTP/1.1 200 OK
If instead you see 403 Forbidden
then the proxy server is blocking the connection:
HTTP/1.1 403 Forbidden
Proxy + private Stratum 1¶
To have a more complete view, let's now also bring the proxy server back in the game.
Reconfigure Squid proxy¶
First we need to make a small but important change to the configuration of the Squid proxy, to ensure that the proxy server is allowed to connect to the private Stratum 1 replica server.
Update the ACL for the Stratum 1 servers in /etc/squid/squid.conf
on the proxy server
by adding the IP address of the private Stratum 1:
# replace STRATUM1_IP with the IP address of the private Stratum 1
acl stratum_ones dstdomain .eessi.science STRATUM1_IP
And then reload for configuration for the Squid proxy service:
sudo systemctl reload squid
Client configuration¶
We also need to update the client configuration to restore the CVMFS_HTTP_PROXY
line
in /etc/cvmfs/default.local
, like we did when using the proxy server:
# replace PROXY_IP with the IP address of the proxy server
CVMFS_HTTP_PROXY="http://PROXY_IP:3128"
Don't forget to reload the CernVM-FS client configuration:
sudo cvmfs_config reload
Testing¶
To test whether the setup using both the proxy server and the Stratum 1 replica server works, we can try accessing the EESSI repository, for example by sourcing the initialisation script:
source /cvmfs/software.eessi.io/versions/2023.06/init/bash
The output of sudo cvmfs_config stat -v software.eessi.io
should include a Connection
line that ends with
(online)
, like this:
Connection: http://STRATUM1_IP/cvmfs/software.eessi.io through proxy http://PROXY_IP:3128 (online)
You can also use curl
to check whether the Stratum 1 can be reached via the proxy server:
http_proxy=http://PROXY_IP:3128 curl --head http://STRATUM1_IP/cvmfs/software.eessi.io/.cvmfspublished
Conclusions¶
With a private Stratum 1 replica server, we have a more production-ready setup in place for using CernVM-FS.
Using both a proxy server and a Stratum 1 replica server is another step in that direction, since it further improves the resilience, maintainability, scalability, and performance of the setup (since the proxy server can serve request from its memory cache).
For the sake of demonstration we have used two separate systems for the Stratum 1 replica server and the proxy server, but both services can also be installed and configuration on the same server, and also installing multiple proxy servers is sensible to improve load balancing, for example to serve different HPC clusters that have significantly different workload mixes.