name: inverse layout: true class: center, middle, inverse
--- # Introduction to Galaxy --- # What is Galaxy? --- [](https://galaxyproject.org/) - **Web-based** platform for computational biomedical research (analysis and data integration) - Developed at Penn State, Johns Hopkins, OHSU and Cleveland Clinic with substantial outside contributions - **Open source** under [Academic Free License](https://opensource.org/licenses/AFL-3.0) - More than 6,500 [citations](https://www.zotero.org/groups/1732893/galaxy) - More than 125 [public Galaxy resources](https://galaxyproject.org/use/) - 100+ public servers, many more non-public - Both general-purpose and domain-specific  ??? - The Galaxy Team is composed by bioinformaticians and software engineers - OHSU = Oregon Health & Science University --- ### Core values - **Accessibility** - Users without programming experience can easily upload/retrieve data, run complex tools and workflows, and visualize data - **Reproducibility** - Galaxy captures information so that any user can understand and repeat a complete computational analysis - **Transparency** - Users can share or publish their analyses (histories, workflows, visualizations) - Pages: online Methods for your paper ??? **accessible** **reproducible** **transparent** research means *sharing everything*. If the Galaxy framework makes everything as simple as possible, researchers are able to: - share their analyses - track all used tools and versions - check all parameters - justify each step in the analysis - publish the findings with all aforementioned information Pages: interactive, web-based documents that describe a complete analysis. --- # User Interface ??? So now that we know what Galaxy and the Galaxy Project are all about, let's look at the Galaxy interface. --- ### Main Galaxy interface  Home page divided into 3 panels --- ### Top menu  Link | Usage -- | -- *Analyze Data* | go back to the homepage *Workflow* | access existing workflows or create new one using the editable diagrammatic pipeline *Visualize* | create new visualisations and launch Interactive Environments *Shared data* | access data libraries, histories, workflows, visualizations and pages shared with you *Help* | links to Galaxy Help Forum (Q&A), Galaxy Community Hub (Wiki), and Interactive Tours *User* | your preferences and saved histories, datasets, pages and visualizations --- ### Tools  - The tool search helps in finding a tool in a crowded toolbox --- ### Tool interface .image-50[] - Tools are simply text files with: - input datasets, parameters, commands, and outputs - help, tests, citations, dependency requirements - New versions can be installed without removing old ones to ensure reproducibility ??? A tool describes: - the input datasets and their datatypes - the tool parameters (numerical, text, boolean, selections, colour) - how to generate a command to execute the tool with the specified inputs and parameters - the output datasets the tool should produce and their datatypes Tools can be viewed as tiny LEGO pieces: each one solves a specific problem, and they can be combined together to build complex analysis pipelines. --- ### Tool Shed .image-50[] - Free "app" store: [Galaxy Tool Shed](https://toolshed.g2.bx.psu.edu/) - Thousands of tools already available - Most software can be integrated - If a tool is not available, ask the Galaxy community for help! - Only a Galaxy admin can install tools --- ### History - Location of all analyses <img style="float: right;" alt="History" src="../images/history.png" /> - collects all datasets produced by tools - collects all operations performed on the data - For each dataset (the heart of Galaxy’s reproducibility), the history tracks - name, format, size, creation time, datatype-specific metadata - tool id, version, inputs, parameters - standard output (`stdout`) and error (`stderr`) - state (<span style="background-color: grey">waiting</span>, <span style="background-color: yellow">running</span>, <span style="background-color: green">success</span>, <span style="background-color: red">failed</span>) - hidden, deleted, purged ??? - We say *datasets* to refer to files as well as databases - Purged means permanently deleted --- ### Multiple histories - You can have as many histories as you want - each history should correspond to a **different analysis** - and should have a meaningful **name** .image-75[] ??? - Give it a good name so you can find it later. I have around a hundred histories and after a month I can't remember what I was doing in some, so a good name is important. - You can drag and drop datasets between histories --- ### History options menu .pull-left[ History behavior is controlled by the *History options* (gear icon)  ] .pull-right[ .image-75[] - *Create New* history will **not** make your current history disappear - To see all of your histories, use the history switcher .image-50[] - *Copy Datasets* from one history to another and save disk space for your quota ] ??? - Copying datasets between histories does not affect your quota, only a single copy of the file is stored on disk because datasets are never modified after creation. --- # Loading data ??? So now you know about the tools to manipulate data and the history where you can see your data, your inputs and outputs. Let's discuss how to get data into Galaxy --- ### Importing data - Copy/paste from a file - Upload data from a local computer - Upload data from internet using URL - Upload data from online databases: UCSC, BioMart, ENCODE, modENCODE, Flymine etc. - Import from Shared Data (libraries, histories, pages) - Upload data from FTP (>2GB) See [Getting data into Galaxy](../../galaxy-data-manipulation/tutorials/get-data/slides.html) --- ### Datatypes - Tools only accept input datasets with the appropriate datatypes - When uploading a dataset, its datatype can be either: - automatically detected - assigned by user - Dataset produced by a tool: datatype assigned by the tool - To change the datatype of a dataset: - <i class="fa fa-pencil" aria-hidden="true"></i><span class="visually-hidden">galaxy-pencil</span> *Edit Attributes* and *Datatype* - <i class="fa fa-pencil" aria-hidden="true"></i><span class="visually-hidden">galaxy-pencil</span> *Edit Attributes* and *Convert Formats* ??? - When you upload data, Galaxy will try to autodetect the format of the data, but can sometimes get it wrong, so you may need to correct it later. - Edit Attributes → Datatype is used to fix a wrongly assigned datatype - Edit Attributes → Convert Formats creates a new dataset using a tool that converts the original dataset in the new format - New datatypes can be added to the Galaxy code base, if missing --- ### Reference genomes .pull-left[ - Genome build specifies which genome assembly a dataset is associated with - e.g. mm10, hg38... - Can be automatically detected or assigned by user - Users can create custom genome builds - New builds can be added by the admin ] .pull-right[  ] ??? - Just like datatypes, you can specify which genome assembly your dataset is about. Some tools need to know this, and Galaxy can tell the tool for you. --- # Workflows ??? Now that you've got data into Galaxy, you know you can use tools to manipulate this data, and histories to keep track of what you've done. You're only missing one key part: workflows. These help you easily reproduce the exact analysis that you ran. --- ### Workflow Editor  - **Extracted** from a history - **Built manually** by adding and configuring tools using the canvas - **Imported** using an existing shared workflow ??? Biologist: - workflows are great - single button to run all of these 50 different tools - a lot of work once to figure out analysis, but easy in the future to just rerun, go get coffee and wait for thing to be done :) Bioinf / dev: - Boxes are workflow steps - 2 types: *input* and *tool* steps - Steps are connected by arrows representing the flow of datasets - Tool panel on the left with Inputs on top (to add input datasets and collections) - Small tool form on the right - Extracting a workflow from a history allows to easily convert an existing history into an analysis workflow --- ### Why would you want to create workflows? - **Re-run** the same analysis on different input data sets - **Change parameters** before re-running a similar analysis - Make use of the workflow job **scheduling** - jobs are submitted as soon as their inputs are ready - Create **sub**-workflows: a workflow inside another workflow - **Share** workflows for publication and with the community ??? Potential information overload for newbies --- ### Visualizations .image-75[] - Datatypes know what tools can be used to visualize datasets: - Sequencing data has a button for visualizing in IGV - Tabular data will prompt you to build charts - Protein data can be seen in a 3D viewer - Interactive environments: Jupyter, RStudio, etc --- ### Sharing data - Share everything you do in Galaxy - histories, workflows, and visualizations - Directly using a Galaxy account's email addresses on the same instance - Using a web link, with anyone who knows the link - Using a web link and publishing it to make it accessible to everyone from the *Shared Data* menu --- ### Community - Support forum: [Galaxy Help](https://help.galaxyproject.org/)  - Community curated documentation: [Galaxy Community Hub](https://www.galaxyproject.org/) - [Events](https://galaxyproject.org/events/) all around the world - Galaxy Training for scientists, developers, admins, instructors: [Galaxy Training Community](/training-material/) - Training questions? Chat with us on [Gitter](https://gitter.im/Galaxy-Training-Network/Lobby) ??? - know was a lot - we'll come back - slides are always available online - first real analysis after the coffee? --- ## Related tutorials --- ## Thank you! This material is the result of a collaborative work. Thanks the [Galaxy Training Network](https://wiki.galaxyproject.org/Teach/GTN) and all the contributors (Andrea Bagnacani, Bérénice Batut, Saskia Hiltemann, Anne Pajon, Nicola Soranzo, Helena Rasche) !