Storage Management
How does Galaxy locate data?
How can I have Galaxy use multiple storage locations?
Setup Galaxy with both the Hierarachical and Distributed Object Storages
last_modification Published: Jan 14, 2021
last_modification Last Updated: May 15, 2023
Data Libraries
- Provide a convenient way to share datasets with users
- Great for commonly used datasets (e.g. reference data, GTN tutorial data)
Speaker Notes
- Data libraries provide a convenient way for Galaxy administrators to share datasets with users.
- This is ideal for commonly used datasets such as reference data, or data for GTN tutorials.
Data Libraries
- Access to library datasets:
- Shared Data menu, browse data and import into history
- Directly from tool form
Speaker Notes
- Users can browse these data libraries and import datasets directly into their histories.
- Additionally, these datasets can also be selected directly from the tool form.
Advantages of data libraries
- Avoid duplication of data
- Does not count towards user’s quota
- Libraries can be shared with all users, or specific groups
- Manage permissions on library/dataset level using roles and groups.
- Admins can create libraries.
- Ordinary users can be granted permission to manage libraries
Speaker Notes
- Every dataset in the library is stored only once, no matter how many users are using it in their histories.
- The data in data libraries does not count against user quotas.
- Management of libraries can be delegated to users.
- And lastly, libraries can be public, restricted to individuals, or to groups.
Importing Data
- There are multiple ways to add data to libraries:
- From history
- From user directory
- From import directory and/or path (admins only)
- From remote source (via API/BioBlend/ephemeris)
Speaker Notes
- Galaxy provides many options for importing data.
- You can import data from a history, or from a server directory.
- Importing data from a server directory is convenient, as Galaxy can recreate the folder structure that is on disk.
- Additionally Galaxy can store library data as a symlink.
- This prevents needing to copy large shared datasets into Galaxy’s own data store.
In galaxy.yml
- Allows authorized non-administrators to upload a directory of files.
- Directory must contain sub-directories named the same as user’s email.
- Works well in combination with
- Allows administrators to upload a directory of files.
- Admin-only, allows importing from any path that the Galaxy’s user has access to.
Speaker Notes
- The path of the server directories that users and admins can use to import files can be configured in galaxy.yml .
- For admins, it is also possible to enable import from any server path.
Key Points
- The distributed object store configuration allows you to easily expand that storage that is attached to your Galaxy.
- You can move data around without affecting users.
Thank you!
This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!