InterMine integration with Galaxy
Author(s) | Daniela Butano Yo Yehudi |
Reviewers |
OverviewQuestions:Objectives:
How to export your query results from your InterMine of choice to Galaxy?
How to export a list of identifiers from Galaxy to your InterMine of choice?
Learn how to import/export data from/to InterMine instances
Understand the InterMine Interchange Dataset
Time estimation: 1 hourSupporting Materials:Published: Dec 9, 2020Last modification: Feb 29, 2024License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MITpurl PURL: https://gxy.io/GTN:T00152version Revision: 9
InterMine (Smith et al. 2012) is a well-establish platform to integrate and access life sciences data. It provides the integrated data via a web interface and RESTful web services.
Other organizations download and deploy InterMine on their servers: there are more than 30 instances over the world (registered at registry.intermine.org), covering many organism, including human data, model animals, plants and drug targets.
InterMine has been integrated with Galaxy: the InterMine tool server in Galaxy allows to import the data returned by any InterMine search and viceversa, using the InterMine Interchange format it’s possible to export a list of identifiers from Galaxy into any InterMine instance of your choice.
Learn more in this tutorial.
AgendaIn this tutorial, we will cover:
Import data from InterMine
Hands-on: ImportSearch Galaxy for
InterMine
(not case sensitive;intermine
is fine too), and click on InterMine Server under Get Data.
InterMine Server
This will redirect you to the InterMine registry, which shows a full list of InterMines and the various organisms they support. Find an InterMine that has the organism type you’re working with, and click on it to redirect to that InterMine.
Once you arrive at your InterMine of choice, you can run a query as normal - this could be a search, a list results page, a template, or a query in the query builder. Eventually you’ll be presented with an InterMine results table.
- Click on Export (top right). This will bring up a modal window.
- Select Send to Galaxy and double-check the “Galaxy Location” is correct.
Click on the Send to Galaxy button on the bottom right of the pop-up window.
If you get an error when you click on the Send to Galaxy button, please make sure to allow popups and try again.
You have now exported your query results from InterMine to Galaxy.
Export identifiers into InterMine
Get data
Hands-on: Data upload
Import some fly data from Zenodo or from the data library
https://zenodo.org/record/3407174/files/GenesLocatedOnChromosome4.tsv
- Copy the link location
Click galaxy-upload Upload Data at the top of the tool panel
- Select galaxy-wf-edit Paste/Fetch Data
Paste the link(s) into the text field
Press Start
- Close the window
As an alternative to uploading the data from a URL or your computer, the files may also have been made available from a shared data library:
- Go into Data (top panel) then Data libraries
- Navigate to the correct folder as indicated by your instructor.
- On most Galaxies tutorial data will be provided in a folder named GTN - Material –> Topic Name -> Tutorial Name.
- Select the desired files
- Click on Add to History galaxy-dropdown near the top and select as Datasets from the dropdown menu
In the pop-up window, choose
- “Select history”: the history you want to import the data to (or create a new one)
- Click on Import
Rename the dataset to
GenesLocatedOnChromosome4
- Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
- In the central panel, change the Name field
- Click the Save button
Inspect the data
The dataset contains the secondary identifier and the symbol of the Drosophila melanogaster genes and their location on the chromosome 4
QuestionDo the data contain the type, e.g
Protein
orGene
?No, they don’t. So we have to specify it, when we create the InterMine Interchange file
Create InterMine Interchange dataset
We will use Create InterMine Interchange Dataset tool in order to generate an intermediate file which will be used to send the identifiers (e.g. gene identifiers) to InterMine. This file requires the identifier’s type (e.g. Gene
), the identifier (e.g WBGene00007063
) and, optionally, the organims’s name.
Hands-on: Generate InterMine file
- Create InterMine Interchange dateset ( Galaxy version 0.0.1) with the following parameters:
- param-file “Tabular file”: select the
GenesLocatedOnChromosome4
dataset which contains some fly’s genes- “Feature Type Column”:
Column: 1
- “Feature Type”:
Gene
- “Feature Identifier column”:
Column: 2
Comment
- In this example, because the
GenesLocatedOnChromosome4
dataset does not contain the type we have to specify it, in the “Feature Type”- “Feature Type”: this is type of the identifiers you are exporting to InterMine, in this example
Gene
. It must be a class in the InterMine data model.- “Feature Identifier column”: select a column from the input file which contains the identifier. We have selected Column 2, which contains the gene symbol.
- “Feature Identifier”: This could be, as an example, a gene symbol like
GATA1
or another other identifier, e.g.FBGN0000099
or perhaps a protein accession. In our example we do not have to edit anything because the values for this field are contained in theGenesLocatedOnChromosome4
dataset, in Column 2.- “Organism Name column”: select a column from the input file which contains the organism’s name, if you have multiple organisms in the same dataset.
- “Organism Name”: alternatively you can directly provide the organism’s name. The organims’ name is not mandatory, but is good to provide if it is known. It does not have to be precise
- Click on Run Tool
Send identifiers to InterMine
Once the generation of the interchange dataset has been completed, open the green box related to Create InterMine Interchange on data.
Hands-on: Send data
- Click on view intermine at Registry to be redirected to the InterMine registry, which shows a full list of InterMines and the various organisms they support.
- Find an InterMine that has the organism type you’re working with, in our case FlyMine, and click on the Send to green button to export the identifiers to.
- You are redirected to FlyMine, in the List Analysis page showing the identifiers you have just exported from Galaxy.
Conclusion
You have now exported your identifiers from Galaxy to InterMine.