Enable upload via FTP

Overview
Creative Commons License: CC-BY Questions:
  • How can I setup FTP to be easy for my users?

  • Can I authenticate ftp users with Galaxy credentials?

Objectives:
  • Configure galaxy and install a FTP server.

  • Use an Ansible playbook for this.

Requirements:
Time estimation: 1 hour
Supporting Materials:
Published: Jun 18, 2021
Last modification: Jan 31, 2024
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00008
rating Rating: 4.0 (0 recent ratings, 1 all time)
version Revision: 65

This tutorial will guide you to setup an File Transfer Protocol (FTP) server so galaxy users can use it to upload large files. Indeed, as written on the galaxy community hub, uploading data directly from the browser can be unreliable and cumbersome. FTP will allow users to monitor the upload status as well as resume interrupted transfers.

Agenda
  1. FTP
  2. FTP and Galaxy
    1. Installing and Configuring
    2. Check it works
Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

  1. Step 1
    ansible-galaxy
  2. Step 2
    backup-cleanup
  3. Step 3
    customization
  4. Step 4
    tus
  5. Step 5
    cvmfs
  6. Step 6
    apptainer
  7. Step 7
    tool-management
  8. Step 8
    reference-genomes
  9. Step 9
    data-library
  10. Step 10
    dev/bioblend-api
  11. Step 11
    connect-to-compute-cluster
  12. Step 12
    job-destinations
  13. Step 13
    pulsar
  14. Step 14
    celery
  15. Step 15
    gxadmin
  16. Step 16
    reports
  17. Step 17
    monitoring
  18. Step 18
    tiaas
  19. Step 19
    sentry
  20. Step 20
    ftp
  21. Step 21
    beacon

FTP

FTP is a very old and reliable communication protocol that has been around since 1971 Bhushan 1971. It requires a server (here our galaxy server) and a client (user’s computer). The FTP server requires to have at least 2 ports accessible from outside (one for the commands and one for the transfer). Usually the port for the command is 21.

FTP supports two different modes: active, and passive. Active mode requires that the user’s computer be reachable from the internet, which in the age of Network Address Translation (NAT) and firewalls is usually unusable. So passive mode is the most commonly used. In passive mode, a client connects to the FTP server, and requests a channel for sending files. The server responds with an IP and port, from its range of “Passive Ports”.

Comment: Requirements for Running This Tutorial

Your VM or wherever you are installing Galaxy needs to have the following ports available:

  • 21
  • Some high range of ports not used by another service, e.g. 56k-60k

You need to know which ports are open so you can use them for the transfer (PassivePorts). In this training we assume that 56k to 60k are open.

Which ports precisely is not important, and these numbers can differ between sites.

FTP and Galaxy

To allow your user to upload via FTP, you will need to:

  • configure Galaxy to know where the files are uploaded.
  • install a FTP server
  • allow your FTP server to read Galaxy’s database so users can use their credential and upload in the good directory.

For secure transmission we will use SSL/TLS (FTPS), not the SSH File Transfer Protocol (SFTP) as the Galaxy users don’t correspond to users on the machine.

Installing and Configuring

Luckily for us, there is an ansible role written by the Galaxy Project for this purpose. It will install proftpd. Firstly, we need to install the role and then update our playbook for using it.

If the terms “Ansible”, “role” and “playbook” mean nothing to you, please checkout the Ansible introduction slides and the Ansible introduction tutorial

It is possible to have ansible installed on the remote machine and run it there, not just from your local machine connecting to the remote machine.

Your hosts file will need to use localhost, and whenever you run playbooks with ansible-playbook -i hosts playbook.yml, you will need to add -c local to your command.

Be certain that the playbook that you’re writing on the remote machine is stored somewhere safe, like your user home directory, or backed up on your local machine. The cloud can be unreliable and things can disappear at any time.

Hands-on: Setting up ftp upload with Ansible
  1. In your playbook directory, add the galaxyproject.proftpd role to your requirements.yml

    --- a/requirements.yml
    +++ b/requirements.yml
    @@ -57,3 +57,6 @@
     # Sentry
     - name: mvdbeek.sentry_selfhosted
       src: https://github.com/mvdbeek/ansible-role-sentry/archive/main.tar.gz
    +# Our FTP Server
    +- src: galaxyproject.proftpd
    +  version: 0.3.1
       
    
  2. Install the role with:

    Input: Bash
    ansible-galaxy install -p roles -r requirements.yml
    
  3. As in this training we are using certbot, we will ask for a private key for proftpd. Add the following line to your group_vars/galaxyserver.yml file:

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -210,11 +210,13 @@ certbot_environment: staging
     certbot_well_known_root: /srv/nginx/_well-known_root
     certbot_share_key_users:
       - www-data
    +  - proftpd
     certbot_share_key_ids:
       - "999:999"
     certbot_post_renewal: |
         systemctl restart nginx || true
         docker restart rabbit_hole || true
    +    systemctl restart proftpd || true
     certbot_domains:
      - "{{ inventory_hostname }}"
     certbot_agree_tos: --agree-tos
       
    

    This will make a copy of the current letsencrypt key available as /etc/ssl/user/privkey-proftpd.pem, and automatically restart proftpd every time the key is updated.

  4. We will configure Galaxy to enable ftp file upload. Add the following line to your group_vars/galaxyserver.yml file in the galaxy_config/galaxy section:

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -122,6 +122,9 @@ galaxy_config:
         sentry_dsn: "{{ vault_galaxy_sentry_dsn }}"
         sentry_traces_sample_rate: 0.5
         error_report_file: "{{ galaxy_config_dir }}/error_reports_file.yml"
    +    # FTP
    +    ftp_upload_dir: /data/uploads
    +    ftp_upload_site: "{{ inventory_hostname }}"
       gravity:
         process_manager: systemd
         galaxy_root: "{{ galaxy_root }}/server"
       
    

To check the other options for setting up ftp in Galaxy, please check the Galaxy configuration documentation.

  1. Then we will set the different variables for proftpd. Add the following lines to your group_vars/galaxyserver.yml file. Please replace the PassivePorts below with the range of ports that are appropriate for your machine!

    --- a/group_vars/galaxyservers.yml
    +++ b/group_vars/galaxyservers.yml
    @@ -366,3 +366,24 @@ telegraf_plugins_extra:
     tiaas_dir: /srv/tiaas
     tiaas_admin_user: admin
     tiaas_admin_pass: changeme
    +
    +# Proftpd:
    +proftpd_galaxy_auth: yes
    +galaxy_ftp_upload_dir: "{{ galaxy_config.galaxy.ftp_upload_dir }}"
    +proftpd_display_connect: |
    +  {{ inventory_hostname }} FTP server
    +
    +  Unauthorized access is prohibited
    +proftpd_create_ftp_upload_dir: yes
    +proftpd_options:
    +  - User: galaxy
    +  - Group: galaxy
    +  - Port: 21
    +proftpd_sql_db: galaxy@/var/run/postgresql
    +proftpd_sql_user: galaxy
    +proftpd_conf_ssl_certificate: /etc/ssl/certs/cert.pem
    +proftpd_conf_ssl_certificate_key: /etc/ssl/user/privkey-proftpd.pem
    +proftpd_global_options:
    +  - PassivePorts: 56000 60000
    +proftpd_use_mod_tls_shmcache: false
    +proftpd_tls_options: NoSessionReuseRequired
       
    

    Here is a description of the set variables:

    Variable Description
    proftpd_galaxy_auth Attempt to authenticate users against a Galaxy database.
    galaxy_ftp_upload_dir Path to the Galaxy FTP upload directory, should match ftp_upload_dir in your Galaxy config.
    proftpd_display_connect Message to display when users connect to the FTP server. This should be the message, not the path to a file.
    proftpd_create_ftp_upload_dir Whether to allow the role to create this with owner galaxy_user.
    proftpd_options Any option for proftpd, we will just set up the user and group of the galaxy_user.
    proftpd_sql_db Database name to connect to for authentication info.
    proftpd_sql_user (default: the value of galaxy_user): Value of the username parameter to SQLConnectInfo.
    proftpd_conf_ssl_certificate Path on the remote host where the SSL certificate file is.
    proftpd_conf_ssl_certificate_key Path on the remote host where the SSL private key file is.
    proftpd_global_options Set arbitrary options in the context. We set here the PassivePorts range.
    proftpd_use_mod_tls_shmcache By default proftpd uses mod_tls_shmcache which is not installed on the server so we just disable it.
    proftpd_tls_options Additional options for tls. We will use NoSessionReuseRequired

    mod_tls only accepts SSL/TLS data connections that reuse the SSL session of the control connection, as a security measure. Unfortunately, there are some clients (e.g. curl/Filezilla) which do not reuse SSL sessions. To relax the requirement that the SSL session from the control connection be reused for data connections we set NoSessionReuseRequired.

  2. Add the new role to the list of roles under the roles key in your playbook, galaxy.yml:

    --- a/galaxy.yml
    +++ b/galaxy.yml
    @@ -45,6 +45,7 @@
         - geerlingguy.redis
         - usegalaxy_eu.flower
         - galaxyproject.nginx
    +    - galaxyproject.proftpd
         - geerlingguy.docker
         - usegalaxy_eu.rabbitmqserver
         - galaxyproject.tiaas2
       
    
  3. Run the playbook

    Input: Bash
    ansible-playbook galaxy.yml
    

Congratulations, you’ve set up FTP for Galaxy.

Check it works

Hands-on: Checking proftpd from the server
  1. SSH into your machine

  2. Check the active status of proftpd by systemctl status proftpd.

  3. Check the port has been correctly attributed by sudo lsof -i -P -n.

    Question

    What do you see?

    You should see all the ports used by the server. What interests us is the line with proftpd. You should see TCP *:21 (LISTEN).

  4. Check the directory /data/uploads/ has been created and is empty.

    Input: Bash
    sudo tree /data/uploads/
    
Hands-on: Checking galaxy detected the ftp possibility
  1. Open your galaxy in a browser.

  2. Log in with a user (FTP is only possible for authenticated sessions).

  3. Click on the upload button. You should now see on the bottom “Choose FTP files”

  4. Click on the Choose FTP files button. You should see a message “Your FTP directory does not contain any files.”

It’s working!

Hands-on: Upload your first file

There are three options for uploading files, you can choose whichever is easiest for you.

  1. FileZilla

    1. Follow the tutorial to upload a file.
    2. You will have a message which ask you to approve the certificate, approve it.
  2. lftp

    You can use locally lftp to test the ftp.

    1. Install lftp with sudo apt-get install lftp.
    2. Add the public certificate to the list of known certificates (only for LetsEncrypt Staging Certificates!):
      Input: Bash
      mkdir .lftp
      echo "set ssl:ca-file \"/etc/ssl/certs/cert.pem\"" > .lftp/rc
      
    3. Connect to the server with for example the admin account:
      Input: Bash
      lftp admin@example.org@$HOSTNAME
      
    4. Enter the password of the admin@example.org galaxy user.
    5. Put a random file:

      put /srv/galaxy/server/CITATION

    6. Check it is there with ls.
    7. Leave lftp with quit.
  3. Curl

    Input: Bash
    curl -T {"/srv/galaxy/server/CITATION"} ftp://localhost --user admin@example.org:password --ssl -k
    

    Here -T says to upload a file, --ssl ensures that the FTP connection is SSL/TLS encrypted, and -k ignores any certificate issues as the hostname localhost will not match the certificate we have.

Hands-on: Check where the file has been uploaded
  1. SSH into your machine

  2. Check the directory /uploads/.

    Input: Bash
    sudo tree /uploads/
    
    Question

    What do you see?

    As I uploaded a file called CITATION with the admin@example.org user I see:

    /uploads/
    └── admin@example.org
        └── CITATION
    
Hands-on: Use it in galaxy
  1. Open your galaxy in a browser.

  2. Log in with the user you used to upload the file.

  3. Click on the upload button.

  4. Click on the Choose FTP files button. You should see your file.

  5. Click on it and click on Start to launch the upload. It should go to your history as a new dataset.

  6. Click again on Choose FTP files button. Your file has disappeared. By default, the files are removed from the FTP at import.

    You just need to add ftp_upload_purge: false to the galaxy_config/galaxy variables (next to ftp_upload_dir).

Congratulations! Let your users know this is an option, many of them will prefer to start large uploads from an FTP client.

Hands-on: Time to git commit

It’s time to commit your work! Check the status with

git status

Add your changed files with

git add ... # any files you see that are changed

And then commit it!

git commit -m 'Finished Enable upload via FTP'

Comment: Got lost along the way?

If you missed any steps, you can compare against the reference files, or see what changed since the previous tutorial.

If you’re using git to track your progress, remember to add your changes and commit with a good commit message!

Comment: Galaxy Admin Training Path

The yearly Galaxy Admin Training follows a specific ordering of tutorials. Use this timeline to help keep track of where you are in Galaxy Admin Training.

  1. Step 1
    ansible-galaxy
  2. Step 2
    backup-cleanup
  3. Step 3
    customization
  4. Step 4
    tus
  5. Step 5
    cvmfs
  6. Step 6
    apptainer
  7. Step 7
    tool-management
  8. Step 8
    reference-genomes
  9. Step 9
    data-library
  10. Step 10
    dev/bioblend-api
  11. Step 11
    connect-to-compute-cluster
  12. Step 12
    job-destinations
  13. Step 13
    pulsar
  14. Step 14
    celery
  15. Step 15
    gxadmin
  16. Step 16
    reports
  17. Step 17
    monitoring
  18. Step 18
    tiaas
  19. Step 19
    sentry
  20. Step 20
    ftp
  21. Step 21
    beacon