Enable upload via FTP
OverviewQuestions:Objectives:
How can I setup FTP to be easy for my users?
Can I authenticate ftp users with Galaxy credentials?
Requirements:
Configure galaxy and install a FTP server.
Use an Ansible playbook for this.
- slides Slides: Ansible
- tutorial Hands-on: Ansible
- slides Slides: Galaxy Installation with Ansible
- tutorial Hands-on: Galaxy Installation with Ansible
Time estimation: 1 hourSupporting Materials:Published: Jun 18, 2021Last modification: Feb 20, 2024License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurl PURL: https://gxy.io/GTN:T00008rating Rating: 4.0 (0 recent ratings, 1 all time)version Revision: 23
This tutorial will guide you to setup an File Transfer Protocol (FTP) server so galaxy users can use it to upload large files. Indeed, as written on the galaxy community hub, uploading data directly from the browser can be unreliable and cumbersome. FTP will allow users to monitor the upload status as well as resume interrupted transfers.
Agenda
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
Step 1ansible-galaxy Step 2backup-cleanup Step 3customization Step 4tus Step 5cvmfs Step 6apptainer Step 7tool-management Step 8reference-genomes Step 9data-library Step 10dev/bioblend-api Step 11connect-to-compute-cluster Step 12job-destinations Step 13pulsar Step 14celery Step 15gxadmin Step 16reports Step 17monitoring Step 18tiaas Step 19sentry Step 20ftp Step 21beacon
FTP
FTP is a very old and reliable communication protocol that has been around since 1971 Bhushan 1971. It requires a server (here our galaxy server) and a client (user’s computer). The FTP server requires to have at least 2 ports accessible from outside (one for the commands and one for the transfer). Usually the port for the command is 21.
FTP supports two different modes: active, and passive. Active mode requires that the user’s computer be reachable from the internet, which in the age of Network Address Translation (NAT) and firewalls is usually unusable. So passive mode is the most commonly used. In passive mode, a client connects to the FTP server, and requests a channel for sending files. The server responds with an IP and port, from its range of “Passive Ports”.
Comment: Requirements for Running This TutorialYour VM or wherever you are installing Galaxy needs to have the following ports available:
- 21
- Some high range of ports not used by another service, e.g. 56k-60k
You need to know which ports are open so you can use them for the transfer (PassivePorts). In this training we assume that 56k to 60k are open.
Which ports precisely is not important, and these numbers can differ between sites.
FTP and Galaxy
To allow your user to upload via FTP, you will need to:
- configure Galaxy to know where the files are uploaded.
- install a FTP server
- allow your FTP server to read Galaxy’s database so users can use their credential and upload in the good directory.
For secure transmission we will use SSL/TLS (FTPS), not the SSH File Transfer Protocol (SFTP) as the Galaxy users don’t correspond to users on the machine.
Installing and Configuring
Luckily for us, there is an ansible role written by the Galaxy Project for this purpose. It will install proftpd. Firstly, we need to install the role and then update our playbook for using it.
If the terms “Ansible”, “role” and “playbook” mean nothing to you, please checkout the Ansible introduction slides and the Ansible introduction tutorial
It is possible to have ansible installed on the remote machine and run it there, not just from your local machine connecting to the remote machine.
Your hosts file will need to use
localhost
, and whenever you run playbooks withansible-playbook -i hosts playbook.yml
, you will need to add-c local
to your command.Be certain that the playbook that you’re writing on the remote machine is stored somewhere safe, like your user home directory, or backed up on your local machine. The cloud can be unreliable and things can disappear at any time.
Hands-on: Setting up ftp upload with Ansible
In your playbook directory, add the
galaxyproject.proftpd
role to yourrequirements.yml
--- a/requirements.yml +++ b/requirements.yml @@ -57,3 +57,6 @@ # Sentry - name: mvdbeek.sentry_selfhosted src: https://github.com/mvdbeek/ansible-role-sentry/archive/main.tar.gz +# Our FTP Server +- src: galaxyproject.proftpd + version: 0.3.1
Install the role with:
Input: Bashansible-galaxy install -p roles -r requirements.yml
As in this training we are using certbot, we will ask for a private key for proftpd. Add the following line to your
group_vars/galaxyserver.yml
file:--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -210,11 +210,13 @@ certbot_environment: staging certbot_well_known_root: /srv/nginx/_well-known_root certbot_share_key_users: - www-data + - proftpd certbot_share_key_ids: - "999:999" certbot_post_renewal: | systemctl restart nginx || true docker restart rabbit_hole || true + systemctl restart proftpd || true certbot_domains: - "{{ inventory_hostname }}" certbot_agree_tos: --agree-tos
This will make a copy of the current letsencrypt key available as
/etc/ssl/user/privkey-proftpd.pem
, and automatically restart proftpd every time the key is updated.We will configure Galaxy to enable ftp file upload. Add the following line to your
group_vars/galaxyserver.yml
file in the galaxy_config/galaxy section:--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -122,6 +122,9 @@ galaxy_config: sentry_dsn: "{{ vault_galaxy_sentry_dsn }}" sentry_traces_sample_rate: 0.5 error_report_file: "{{ galaxy_config_dir }}/error_reports_file.yml" + # FTP + ftp_upload_dir: /data/uploads + ftp_upload_site: "{{ inventory_hostname }}" gravity: process_manager: systemd galaxy_root: "{{ galaxy_root }}/server"
To check the other options for setting up ftp in Galaxy, please check the Galaxy configuration documentation.
Then we will set the different variables for proftpd. Add the following lines to your
group_vars/galaxyserver.yml
file. Please replace the PassivePorts below with the range of ports that are appropriate for your machine!--- a/group_vars/galaxyservers.yml +++ b/group_vars/galaxyservers.yml @@ -366,3 +366,24 @@ telegraf_plugins_extra: tiaas_dir: /srv/tiaas tiaas_admin_user: admin tiaas_admin_pass: changeme + +# Proftpd: +proftpd_galaxy_auth: yes +galaxy_ftp_upload_dir: "{{ galaxy_config.galaxy.ftp_upload_dir }}" +proftpd_display_connect: | + {{ inventory_hostname }} FTP server + + Unauthorized access is prohibited +proftpd_create_ftp_upload_dir: yes +proftpd_options: + - User: galaxy + - Group: galaxy + - Port: 21 +proftpd_sql_db: galaxy@/var/run/postgresql +proftpd_sql_user: galaxy +proftpd_conf_ssl_certificate: /etc/ssl/certs/cert.pem +proftpd_conf_ssl_certificate_key: /etc/ssl/user/privkey-proftpd.pem +proftpd_global_options: + - PassivePorts: 56000 60000 +proftpd_use_mod_tls_shmcache: false +proftpd_tls_options: NoSessionReuseRequired
Here is a description of the set variables:
Variable Description proftpd_galaxy_auth
Attempt to authenticate users against a Galaxy database. galaxy_ftp_upload_dir
Path to the Galaxy FTP upload directory, should match ftp_upload_dir
in your Galaxy config.proftpd_display_connect
Message to display when users connect to the FTP server. This should be the message, not the path to a file. proftpd_create_ftp_upload_dir
Whether to allow the role to create this with owner galaxy_user
.proftpd_options
Any option for proftpd, we will just set up the user and group of the galaxy_user
.proftpd_sql_db
Database name to connect to for authentication info. proftpd_sql_user
(default: the value of galaxy_user): Value of the username parameter to SQLConnectInfo. proftpd_conf_ssl_certificate
Path on the remote host where the SSL certificate file is. proftpd_conf_ssl_certificate_key
Path on the remote host where the SSL private key file is. proftpd_global_options
Set arbitrary options in the context. We set here the PassivePorts range. proftpd_use_mod_tls_shmcache
By default proftpd uses mod_tls_shmcache
which is not installed on the server so we just disable it.proftpd_tls_options
Additional options for tls. We will use NoSessionReuseRequired
mod_tls
only accepts SSL/TLS data connections that reuse the SSL session of the control connection, as a security measure. Unfortunately, there are some clients (e.g. curl/Filezilla) which do not reuse SSL sessions. To relax the requirement that the SSL session from the control connection be reused for data connections we setNoSessionReuseRequired
.Add the new role to the list of roles under the
roles
key in your playbook,galaxy.yml
:--- a/galaxy.yml +++ b/galaxy.yml @@ -45,6 +45,7 @@ - geerlingguy.redis - usegalaxy_eu.flower - galaxyproject.nginx + - galaxyproject.proftpd - geerlingguy.docker - usegalaxy_eu.rabbitmqserver - galaxyproject.tiaas2
Run the playbook
Input: Bashansible-playbook galaxy.yml
Congratulations, you’ve set up FTP for Galaxy.
Check it works
Hands-on: Checking proftpd from the server
SSH into your machine
Check the active status of proftpd by
systemctl status proftpd
.Check the port has been correctly attributed by
sudo lsof -i -P -n
.QuestionWhat do you see?
You should see all the ports used by the server. What interests us is the line with proftpd. You should see TCP *:21 (LISTEN).
Check the directory
/data/uploads/
has been created and is empty.Input: Bashsudo tree /data/uploads/
1.sh
Hands-on: Checking galaxy detected the ftp possibility
Open your galaxy in a browser.
Log in with a user (FTP is only possible for authenticated sessions).
Click on the upload button. You should now see on the bottom “Choose FTP files”
Click on the Choose FTP files button. You should see a message “Your FTP directory does not contain any files.”
It’s working!
Hands-on: Upload your first fileThere are three options for uploading files, you can choose whichever is easiest for you.
FileZilla
- Follow the tutorial to upload a file.
- You will have a message which ask you to approve the certificate, approve it.
lftp
You can use locally lftp to test the ftp.
- Install lftp with
sudo apt-get install lftp
.- Add the public certificate to the list of known certificates (only for LetsEncrypt Staging Certificates!):
Input: Bashmkdir .lftp echo "set ssl:ca-file \"/etc/ssl/certs/cert.pem\"" > .lftp/rc
- Connect to the server with for example the admin account:
Input: Bashlftp admin@example.org@$HOSTNAME
- Enter the password of the admin@example.org galaxy user.
Put a random file:
put /srv/galaxy/server/CITATION
- Check it is there with
ls
.- Leave lftp with
quit
.Curl
Input: Bashcurl -T {"/srv/galaxy/server/CITATION"} ftp://localhost --user admin@example.org:password --ssl -k
Here
-T
says to upload a file,--ssl
ensures that the FTP connection is SSL/TLS encrypted, and-k
ignores any certificate issues as the hostnamelocalhost
will not match the certificate we have.
Hands-on: Check where the file has been uploaded
SSH into your machine
Check the directory
/uploads/
.Input: Bashsudo tree /uploads/
QuestionWhat do you see?
As I uploaded a file called
CITATION
with the admin@example.org user I see:/uploads/ └── admin@example.org └── CITATION
Hands-on: Use it in galaxy
Open your galaxy in a browser.
Log in with the user you used to upload the file.
Click on the upload button.
Click on the Choose FTP files button. You should see your file.
Click on it and click on Start to launch the upload. It should go to your history as a new dataset.
Click again on Choose FTP files button. Your file has disappeared. By default, the files are removed from the FTP at import.
You just need to add
ftp_upload_purge: false
to the galaxy_config/galaxy variables (next toftp_upload_dir
).
Congratulations! Let your users know this is an option, many of them will prefer to start large uploads from an FTP client.
Hands-on: Time to git commitIt’s time to commit your work! Check the status with
git status
Add your changed files with
git add ... # any files you see that are changed
And then commit it!
git commit -m 'Finished Enable upload via FTP'
Comment: Got lost along the way?If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.
If you’re using
git
to track your progress, remember to add your changes and commit with a good commit message!
Comment: Galaxy Admin Training PathThe yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.
Step 1ansible-galaxy Step 2backup-cleanup Step 3customization Step 4tus Step 5cvmfs Step 6apptainer Step 7tool-management Step 8reference-genomes Step 9data-library Step 10dev/bioblend-api Step 11connect-to-compute-cluster Step 12job-destinations Step 13pulsar Step 14celery Step 15gxadmin Step 16reports Step 17monitoring Step 18tiaas Step 19sentry Step 20ftp Step 21beacon