{ "metadata": { }, "nbformat": 4, "nbformat_minor": 5, "cells": [ { "id": "metadata", "cell_type": "markdown", "source": "
BioBlend ({% cite Sloggett2013 %}) is a Python library to enable simple interaction with Galaxy ({% cite Afgan2018 %}) via the command line or scripts.
\n\n\nAgenda\nIn this tutorial, we will cover:
\n\n
We are going to use the requests Python library to communicate via HTTP with the Galaxy server. To start, let’s define the connection parameters.
\nYou need to insert the API key for your Galaxy server in the cell below:
\napi_key
variable below.We now make a GET request to retrieve all histories owned by a user:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-3", "source": [ "headers = {\"Content-Type\": \"application/json\", \"x-api-key\": api_key}\n", "r = requests.get(base_url + \"/histories\", headers=headers)\n", "print(r.text)\n", "hists = r.json()\n", "pprint(hists)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">As you can see, GET requests in Galaxy API return JSON strings, which need to be deserialized into Python data structures. In particular, GETting a resource collection returns a list of dictionaries.
\nEach dictionary returned when GETting a resource collection gives basic info about a resource, e.g. for a history you have:
\nid
: the unique identifier of the history, needed for all specific requests about this resourcename
: the name of this history as given by the userdeleted
: whether the history has been deleted.There is no readily-available filtering capability, but it’s not difficult to filter histories by name:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-5", "source": [ "pprint([_ for _ in hists if _['name'] == 'Unnamed history'])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">If you are interested in more details about a given resource, you just need to append its id
to the previous collection request, e.g. to the get more info for a history:
As you can see, there are much more entries in the returned dictionary, e.g.:
\ncreate_time
size
: total disk space used by the historystate_ids
: ids of history datasets for each possible state.To get the list of datasets contained in a history, simply append /contents
to the previous resource request.
The dictionaries returned when GETting the history content give basic info about each dataset, e.g.: id
, name
, deleted
, state
, url
…
To get the details about a specific dataset, you can use the datasets
controller:
Some of the interesting additional dictionary entries are:
\ncreate_time
creating job
: id of the job which created this datasetdownload_url
: URL to download the datasetfile_ext
: the Galaxy data type of this datasetfile_size
genome_build
: the genome build (dbkey) associated to this dataset.New resources are created with POST requests. The uploaded data needs to be serialized in a JSON string. For example, to create a new history:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-13", "source": [ "data = {'name': 'New history'}\n", "r = requests.post(base_url + \"/histories\", data=json.dumps(data), headers=headers)\n", "new_hist = r.json()\n", "pprint(new_hist)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">The return value of a POST request is a dictionary with detailed info about the created resource.
\nTo update a resource, make a PUT request, e.g. to change the history name:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-15", "source": [ "data = {'name': 'Updated history'}\n", "r = requests.put(base_url + \"/histories/\" + new_hist[\"id\"], json.dumps(data), headers=headers)\n", "print(r.status_code)\n", "pprint(r.json())" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">The return value of a PUT request is usually a dictionary with detailed info about the updated resource.
\nFinally to delete a resource, make a DELETE request, e.g.:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-17", "source": [ "r = requests.delete(base_url + \"/histories/\" + new_hist[\"id\"], headers=headers)\n", "print(r.status_code)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Goal: Upload a file to a new history, import a workflow and run it on the uploaded dataset.
\n\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-19", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Initialise\nFirst, define the connection parameters. What variables do you need?
\n\n👁 View solution
\n\n\nimport json\nfrom pprint import pprint\nfrom urllib.parse import urljoin\n\nimport requests\n\nserver = 'https://usegalaxy.eu/'\napi_key = ''\nbase_url = urljoin(server, 'api')\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-21", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: New History\nNext, create a new Galaxy history via POST to the correct API.
\n\n👁 View solution
\n\n\nheaders = {\"Content-Type\": \"application/json\", \"x-api-key\": api_key}\ndata = {\"name\": \"New history\"}\nr = requests.post(base_url + \"/histories\", data=json.dumps(data), headers=headers)\nnew_hist = r.json()\npprint(new_hist)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-23", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Upload a dataset\nUpload the local file
\n1.txt
to the new history. You need to run the specialupload1
tool by making aPOST
request to/api/tools
. You don’t need to pass any inputs to it apart from attaching the file asfiles_0|file_data
. Also, note that when attaching a file the payload should not be serialized to a JSON string and you need to dropContent-Type
from the request headers.You can obtain the
\n1.txt
file from the following URL, you’ll need to download it first.\nhttps://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/1.txt\n
\n👁 View solution
\n\n\ndata = {\n \"history_id\": new_hist[\"id\"],\n \"tool_id\": \"upload1\"\n}\nwith open(\"1.txt\", \"rb\") as f:\n files = {\"files_0|file_data\": f}\n r = requests.post(base_url + \"/tools\", data=data, files=files, headers={\"x-api-key\": api_key})\nret = r.json()\npprint(ret)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-25", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Find the dataset in your history\nFind the new uploaded dataset, either from the dict returned by the POST request above or from the history contents.
\n\n👁 View solution
\n\n\nhda = ret['outputs'][0]\npprint(hda)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-27", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Import a workflow\nImport a workflow from the local file
\nconvert_to_tab.ga
by making aPOST
request to/api/workflows
. The only needed data isworkflow
, which must be a deserialized JSON representation of the workflow.ga
file.You can obtain the
\nconvert_to_tab.ga
file from the following URL, you’ll need to download it first.\nhttps://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/convert_to_tab.ga\n
\n👁 View solution
\n\n\nwith open(\"convert_to_tab.ga\", \"r\") as f:\n workflow_dict = json.load(f)\ndata = {\"workflow\": workflow_dict}\nr = requests.post(base_url + \"/workflows\", data=json.dumps(data), headers=headers)\nwf = r.json()\npprint(wf)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-29", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: View the workflow details\nView the details of the imported workflow by making a GET request to
\n/api/workflows
.\n👁 View solution
\n\n\nr = requests.get(base_url + \"/workflows/\" + wf[\"id\"], headers=headers)\nwf = r.json()\npprint(wf)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-31", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Invoke the workflow\nRun the imported workflow on the uploaded dataset inside the same history by making a
\nPOST
request to/api/workflows/WORKFLOW_ID/invocations
. The only needed data arehistory
andinputs
.\n👁 View solution
\n\n\ninputs = {0: {'id': hda['id'], 'src': 'hda'}}\ndata = {\n 'history': 'hist_id=' + new_hist['id'],\n 'inputs': inputs}\nr = requests.post(base_url + \"/workflows/\" + wf[\"id\"] + \"/invocations\", data=json.dumps(data), headers=headers)\npprint(r.json())\n
\n\nQuestion: View the results\nView the results on the Galaxy server with your web browser. Were you successful? Did it run?
\n
If you need to install BioBlend into your Jupyter environment, you can execute:
\npython\nimport sys\n!{sys.executable} -m pip install bioblend\n
You need to insert the API key for your Galaxy server in the cell below:
\napi_key
variable below.The user interacts with a Galaxy server through a GalaxyInstance
object:
The GalaxyInstance
object gives you access to the various controllers, i.e. the resources you are dealing with, like histories
, tools
and workflows
.\nTherefore, method calls will have the format gi.controller.method()
. For example, the call to retrieve all histories owned by the current user is:
As you can see, methods in BioBlend do not return JSON strings, but deserialize them into Python data structures. In particular, get_
methods return a list of dictionaries.
Each dictionary gives basic info about a resource, e.g. for a history you have:
\nid
: the unique identifier of the history, needed for all specific requests about this resourcename
: the name of this history as given by the userdeleted
: whether the history has been deleted.New resources are created with create_
methods, e.g. the call to create a new history is:
As you can see, to make POST requests in BioBlend it is not necessary to serialize data, you just pass them explicitly as parameters. The return value is a dictionary with detailed info about the created resource.
\nget_
methods usually have filtering capabilities, e.g. it is possible to filter histories by name:
To upload the local file 1.txt
to the new history, you can run the special upload tool by calling the upload_file
method of the tools
controller.
You can obtain the 1.txt
file from the following URL, you’ll need to download it first.
https://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/1.txt\n
If you are interested in more details about a given resource for which you know the id, you can use the corresponding show_
method. For example, to the get more info for the history we have just populated:
As you can see, there are much more entries in the returned dictionary, e.g.:
\ncreate_time
size
: total disk space used by the historystate_ids
: ids of history datasets for each possible state.To get the list of datasets contained in a history, simply add contents=True
to the previous call.
The dictionaries returned when showing the history content give basic info about each dataset, e.g.: id
, name
, deleted
, state
, url
…
To get the details about a specific dataset, you can use the datasets
controller:
Some of the interesting additional dictionary entries are:
\ncreate_time
creating job
: id of the job which created this datasetdownload_url
: URL to download the datasetfile_ext
: the Galaxy data type of this datasetfile_size
genome_build
: the genome build (dbkey) associated to this dataset.To update a resource, use the update_
method, e.g. to change the name of the new history:
The return value of update_
methods is usually a dictionary with detailed info about the updated resource.
Finally to delete a resource, use the delete_
method, e.g.:
Goal: Upload a file to a new history, import a workflow and run it on the uploaded dataset.
\n\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-53", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Initialise\nCreate a
\nGalaxyInstance
object.\n👁 View solution
\n\n\nfrom pprint import pprint\n\nimport bioblend.galaxy\n\nserver = 'https://usegalaxy.eu/'\napi_key = ''\ngi = bioblend.galaxy.GalaxyInstance(url=server, key=api_key)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-55", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: New History\nCreate a new Galaxy history.
\n\n👁 View solution
\n\n\nnew_hist = gi.histories.create_history(name='New history')\npprint(new_hist)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-57", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Upload a dataset\nUpload the local file
\n1.txt
to the new history usingtools.upload_file()
.You can obtain the
\n1.txt
file from the following URL, you’ll need to download it first.\nhttps://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/1.txt\n
\n👁 View solution
\n\n\nret = gi.tools.upload_file(\"1.txt\", new_hist[\"id\"])\npprint(ret)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-59", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Find the dataset in your history\nFind the new uploaded dataset, either from the dict returned by
\ntools.upload_file()
or from the history contents.\n👁 View solution
\n\n\nhda = ret['outputs'][0]\npprint(hda)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-61", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Import a workflow\nImport a workflow from the local file
\nconvert_to_tab.ga
usingworkflows.import_workflow_from_local_path()
.You can obtain the
\nconvert_to_tab.ga
file from the following URL, you’ll need to download it first.\nhttps://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/convert_to_tab.ga\n
\n👁 View solution
\n\n\nwf = gi.workflows.import_workflow_from_local_path(\"convert_to_tab.ga\")\npprint(wf)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-63", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: View the workflow details\nView the details of the imported workflow using
\nworkflows.show_workflow()
\n👁 View solution
\n\n\nwf = gi.workflows.show_workflow(wf['id'])\npprint(wf)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-65", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Invoke the workflow\nRun the imported workflow on the uploaded dataset inside the same history using
\nworkflows.invoke_workflow()
.\n👁 View solution
\n\n\ninputs = {0: {'id': hda['id'], 'src': 'hda'}}\nret = gi.workflows.invoke_workflow(wf['id'], inputs=inputs, history_id=new_hist['id'])\npprint(ret)\n
\n\nQuestion: View the results\nView the results on the Galaxy server with your web browser. Were you successful? Did it run?
\n
You need to insert the API key for your Galaxy server in the cell below:
\napi_key
variable below.The user interacts with a Galaxy server through a GalaxyInstance
object:
All GalaxyInstance
method calls have the client.method()
format, where client
is the name of the resources you dealing with. There are 2 methods to get the list of resources:
get_previews()
: lightweight (one GET request), retrieves basic resources’ info, returns a list of preview objectslist()
: one GET request for each resource, retrieves full resources’ info, returns a list of full objects.For example, the call to retrieve previews of all histories owned by the current user is:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-69", "source": [ "pprint(gi.histories.get_previews())" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">New resources are created with create()
methods, e.g. to create a new history:
As you can see, the create()
methods in BioBlend.objects returns an object, not a dictionary.
Both get_previews()
and list()
methods usually have filtering capabilities, e.g. it is possible to filter histories by name:
To upload the local file 1.txt
to the new history, you can run the special upload tool by calling the upload_file
method of the History
object.
You can obtain the 1.txt
file from the following URL, you’ll need to download it first.
https://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/1.txt\n
Please note that with BioBlend.objects there is no need to find the upload dataset, since upload_file()
already returns a HistoryDatasetAssociation
object.
Both HistoryPreview
and History
objects have many of their properties available as attributes, e.g. the id.
If you need to specify the unique id of the resource to retrieve, you can use the get()
method, e.g. to get back the history we created before:
To get the list of datasets contained in a history, simply look at the content_infos
attribute of the History
object.
To get the details about one dataset, you can use the get_dataset()
method of the History
object:
You can also filter history datasets by name using the get_datasets()
method of History
objects.
To update a resource, use the update()
method of its object, e.g. to change the history name:
The return value of update()
methods is the updated object.
Finally to delete a resource, you can use the delete()
method of the object, e.g.:
Goal: Upload a file to a new history, import a workflow and run it on the uploaded dataset.
\n\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-87", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Initialise\nCreate a
\nGalaxyInstance
object.\n👁 View solution
\n\n\nfrom pprint import pprint\n\nimport bioblend.galaxy\n\nserver = 'https://usegalaxy.eu/'\napi_key = ''\ngi = bioblend.galaxy.objects.GalaxyInstance(url=server, api_key=api_key)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-89", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: New History\nCreate a new Galaxy history.
\n\n👁 View solution
\n\n\nnew_hist = gi.histories.create(name='New history')\npprint(new_hist)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-91", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Upload a dataset\nUpload the local file
\n1.txt
to the new history using theupload_file()
method ofHistory
objects.You can obtain the
\n1.txt
file from the following URL, you’ll need to download it first.\nhttps://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/1.txt\n
\n👁 View solution
\n\n\nhda = new_hist.upload_file(\"1.txt\")\npprint(hda)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-93", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Import a workflow\nImport a workflow from the local file
\nconvert_to_tab.ga
usingworkflows.import_new()
You can obtain the
\nconvert_to_tab.ga
file from the following URL, you’ll need to download it first.\nhttps://raw.githubusercontent.com/nsoranzo/bioblend-tutorial/main/test-data/convert_to_tab.ga\n
\n👁 View solution
\n\n\nwith open(\"convert_to_tab.ga\", \"r\") as f:\n wf_string = f.read()\nwf = gi.workflows.import_new(wf_string)\npprint(wf)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-95", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: View the workflow inputs\n\n👁 View solution
\n\n\npprint(wf.inputs)\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-97", "source": [ "# Try it out here!\n", "" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ ">Question: Invoke the workflow\nRun the imported workflow on the uploaded dataset inside the same history using the
\ninvoke()
method ofWorkflow
objects.\n👁 View solution
\n\n\ninputs = {'0': hda}\nwf.invoke(inputs=inputs, history=new_hist)\n
\n\nQuestion: View the results\nView the results on the Galaxy server with your web browser. Were you successful? Did it run?
\n
If you have completed the exercise, you can try to perform these extra tasks with the help of the online documentation:
\n