Scripting Galaxy using the API and BioBlend
Contributors
Questions
What is a REST API?
How to interact with Galaxy programmatically?
Why and when should I use BioBlend?
Requirements
Galaxy API
- Application Programming Interface (API): the protocol defined by a software for how it can be controlled by an external program
- Galaxy provides a rich API:
- Over the HTTP protocol
- The Galaxy UI is being migrated on top of the Galaxy API
Speaker Notes
- An Application Programming Interface provides software developers with a definition of the methods to interact with a program or a library.
- When the program is a remote web server like Galaxy, the methods are represented by URIs and the communication is through the HTTP protocol.
- Nowadays, most of the Galaxy user interface makes use of the backend API to implement an asynchronous web application.
Interacting with Galaxy: UI vs. API
- The Galaxy UI is good for:
- exploring and visualizing data
- experimenting
- graphically designing workflows
- people not comfortable with the command line
- The Galaxy API is good for:
- interact programmatically with the server
- complex control: branching and looping (not yet possible in workflows)
- automate repetitive tasks
- integration with external resources
- interact programmatically with the server
Speaker Notes
- The Galaxy user interface is a better choice for explorative analysis, visualising data and drawing workflows.
- The API instead allows you to automate tasks using Galaxy’s capabilities programmatically.
- A typical use case is to upload FASTQ files as soon as your sequencer finishes writing them, and running a quality control workflow.
- Importantly, however you interact with Galaxy, you can equally benefit from features like reproducibility of the analysis and data sharing.
- In fact, all the work done via the API is still accessible when you return to the UI.
Galaxy API functionalities
- Users can:
- upload and download data
- run tools and workflows, …
- manage histories and datasets
- Admins can also manage:
- data libraries
- tools and dependencies
- users, quotas, roles…
- Source code lives at https://github.com/galaxyproject/galaxy/tree/dev/lib/galaxy/webapps/galaxy/api/
Speaker Notes
- Most of the operations you would normally perform on the Galaxy UI are available via the API.
- You can for example upload data, run tools and workflows, and manage your histories.
- It is also possible to perform admin tasks, like manage data libraries and install tools.
RESTful API
.left[REpresentational State Transfer (REST) is the architectural style of the World Wide Web:]
- client–server
- API requests:
- standard HTTP request methods (GET, POST, PUT, DELETE,…) and status codes
- Uniform Resource Identifiers (URIs)
- Query and payload for parameters
Speaker Notes
- The Galaxy API follows the REST model typical of web applications.
- A REST API specifies a protocol for the interaction between a client and a server.
- The client sends requests composed by an HTTP method and a resource.
- The main HTTP methods are: GET (to retrieve a resource); POST (to create); PUT (to replace); and DELETE.
- Resources are identified by Uniform Resource Identifiers.
- Examples of resources in the Galaxy API are: datasets, tools, jobs, histories, libraries, users; essentially anything that is recorded in the Galaxy database.
- The client often need to pass additional parameters or data to specify how a request should be carried out.
- The server replies to the request with a status code (to indicate if there was an error) and usually some data.
API requests
-
HTTP method + URI [+ payload]
GET https://usegalaxy.org/api/histories?order=name
-> ordered list of histories
-
URI parameters: IDs in path, others in the query (
?name=value&...
)
Speaker Notes
- Let’s see some examples of possible API requests to a Galaxy server.
- In the first example we use the GET method to retrieve a resource, in this case the list of histories.
- The URI starts with the protocol and address of the server, followed by the resource we are interested in: the histories.
- A URI may then include an optional query, preceded by a question mark, containing a sequence of request parameters specified as key-equal-value and separated by ampersands.
- In this example, the query is used to ask that the list of histories is ordered by name.
API requests
-
HTTP method + URI [+ payload]
GET https://usegalaxy.org/api/histories?order=name
-> ordered list of historiesPOST /api/histories {"name": "New analysis"}
-> create a history named “New analysis”PUT /api/histories/<id> {"published": true}
-> publish a historyDELETE /api/histories/<history_id>/contents/<id>
-> delete a history dataset
- URI parameters: IDs in path, others in the query (
?name=value&...
) - POST/PUT payload as JSON
Speaker Notes
- In the second example, the POST method is used to create a resource on the server, in this case a new history.
- Parameters for the POST request are passed as a payload in JSON format; more on this later.
- In this example the payload contains the new name for the history to be created.
- In the third example, we use the PUT method to update an existing resource, in this case to make a history public.
- The history to modify is indentified by appending its ID to the histories URI.
- The parameters for PUT requests are also passed in a payload.
- In the fourth example, we use the DELETE method to remove a resource from the server.
- In this case, we request the deletion of a particular dataset in a specific history, as indicated in the URI.
JSON format
.left[JavaScript Object Notation https://www.json.org/]
- Lightweight data-interchange text format
- Easy to read/write for both humans and machines
- ECMA-404 open standard (2013), RFC 8259 (2017)
{"history_id": "b5731bb49a17bf50",
"id": "df06cc665d85b6ea",
"inputs": {"0": {"id": "bbd44e69cb8906b51528b5d606d1fdd0",
"src": "hda"}},
"model_class": "WorkflowInvocation",
"outputs": ["bbd44e69cb8906b528819eaaff340ecd",
"0ff30b4e2a4bed9e"],
"state": "scheduled",
"update_time": "2015-07-03T19:28:39.544574",
"workflow_id": "56482e194d798eb6"}
Speaker Notes
- Request payloads passed to Galaxy and data returned by the API are encoded in the JSON format.
- JSON is a standard format used to exchange text data and is supported in all major programming languages.
- In JSON, strings are enclosed by double quotes, dictionaries by curly braces, and arrays by square brackets.
- Dictionaries and arrays can be nested at will, as shown in the example.
Status codes and errors
- HTTP status codes:
- 200 OK, 400 your error, 500 server error, …
https://www.iana.org/assignments/http-status-codes/http-status-codes.xhtml
- 200 OK, 400 your error, 500 server error, …
- Galaxy error codes and messages:
lib/galaxy/exceptions/
- Still a work in progress
Speaker Notes
- In REST APIs, the server should use the standardised HTTP status codes in its responses to indicate either a successful request or the type of error encountered.
- In the Galaxy API, error status codes and messages are returned to the client by raising specific Python exceptions in the backend.
How to access a REST API
.left[With anything that can communicate over HTTP:]
- Command line:
curl
- GUI:
- Browsers: only GET
- RESTClient add-on for Firefox
- Advanced REST Client
- Software libraries:
- General HTTP libraries (e.g. requests for Python)
- Service-specific libraries (e.g. BioBlend to access Galaxy using Python)
Speaker Notes
- A REST API can be accessed via the HTTP protocol. There are 3 main ways to do that.
- On the command line, the curl tool can perform any type of HTTP request; look at its manual for details.
- For graphical user interfaces, you can perform GET requests directly on your browser by simply entering the URI on the address bar.
- For more complex requests, you can use the Advanced REST Client open source app.
- The third and most common way is to write a program.
- All programming languages have some general library to communicate over HTTP.
- For Python, requests is probably the best library to do that.
- Many web services provide dedicated higher-level libraries to access their REST API.
- In particular, BioBlend is a Python library to interact with Galaxy that we will describe later.
Security
- Most API calls require authentication
- When the UI accesses the API, session auth is used
- Other callers needs an API key: alphanumeric string (32 chars) identifying a registered user
Keep it secure, it’s the same as a username+password!
- Always use HTTPS:
- https://example.org/api?key=foo is safer due to the encryption of the transmitted data
Speaker Notes
- A note about security.
- When programmatically performing requests that require authentication, the client need to pass an API key.
- An API key is an alphanumeric string uniquely identifying a user on a server.
- Since it is equivalent to the combination of your username and password, keep it secure!
- In particular, always use the HTTPS protocol to make requests, not HTTP which sends data unencrypted.
Advanced Galaxy API config
.left[Options in config/galaxy.yml
:]
- User impersonation by adding
run_as
in the payload# Optional list of email addresses of API users who can make calls on # behalf of other users. api_allow_run_as: foo@foo.com
-
Bootstrapping Galaxy
# API key that allows performing some admin actions without actually # having a real admin user in the database and config. Only set this # if you need to bootstrap Galaxy, in particular to create a real # admin user account via API. You should probably not set this on a # production server. bootstrap_admin_api_key: MASTERLOCK
This option was called
master_api_key
in Galaxy 22.05 and earlier.
Speaker Notes
- There are 2 options in the Galaxy configuration file that are relevant for the API.
- api_allow_run_as allows the specified user to impersonate any other user on the Galaxy instance.
- bootstrap_admin_api_key instead is an optional special API key that can be used to bootstrap a Galaxy instance, in particular to create an initial admin account.
Galaxy API Modernization
.left[Moving to FastAPI]
- Main advantages:
- Async requests
- Subscriptions via websockets
- Integrated OpenAPI interactive documentation, e.g. https://usegalaxy.org/api/docs
- Reduce maintenance burden
- Simplify client code generation
Speaker Notes
- The Galaxy API is currently in the process of migrating towards FastAPI, which is a modern framework for building REST APIs in Python with really interesting features.
- For example, FastAPI can serve requests with better performances using asynchronous coroutines.
- Another performance advantage is the ability to avoid inefficient polling by using WebSockets subscriptions.
- In addition to the performance benefits, by using FastAPI in combination with type annotations, the Galaxy API can comply with the OpenAPI standard.
- This standard can greatly enhance interoperability with other systems and reduce the maintenance burden of documentation and client code generation.
Galaxy API pros and cons
- Pros:
- Integrated with Galaxy
- Well tested
- Language agnostic
- Cons:
- Very low-level
Speaker Notes
- To summarise this first part, Galaxy has an extensive REST API that allows users and admins to interact programmatically with a server.
- It can be accessed with any programming language, but it’s also very low level.
- For example, you need to construct complex URIs, encode payloads and decode returned data.
BioBlend
- BioBlend is a Python library that wraps the functionality of Galaxy and CloudMan APIs
- Started by Enis Afgan, Nuwan Goonasekera and Clare Sloggett in 2012. Contributions by the Galaxy Team and the community
- Open source (MIT license)
- Available via PyPI and from https://github.com/galaxyproject/bioblend
Speaker Notes
- A Python library called BioBlend was created in 2012 to make it easier to interact with the Galaxy API.
- BioBlend is open source and developed by a community of contributors.
- It is hosted on GitHub and can be installed via pip.
BioBlend features
- Stable procedural API
- Supported under Python >=3.7
- Wraps all main Galaxy API controllers
- Extensive Continuous Integration testing:
- on Galaxy release_17.09 and later
- >240 unit tests
- Well-documented on https://bioblend.readthedocs.io
Speaker Notes
- BioBlend has a very stable procedural API and works on all supported Python versions.
- It provides methods wrapping all the important Galaxy API endpoints.
- The library uses Continuous Integration to perform a large number of tests on a wide range of Galaxy releases.
- The documentation of BioBlend is very well curated and is often more accurate than the corresponding Galaxy API one.
BioBlend limitations
- Python-only (but separate blend4j and blend4php exist)
- Its methods just deserialize the JSON response
- No isolation from changes in the Galaxy API
- Need to extract the entity ID for further processing
- No explicit modeling of Galaxy entities and their relationships
- Complex operations still need many function calls
- Need for higher-level functionality
Speaker Notes
- Although easier to use than the Galaxy API, BioBlend has some limitations.
- First of all, it’s only available for Python.
- There are alternative libraries for Java and PHP, but they are less complete.
- Another limitation of BioBlend is that it doesn’t shield the caller from possible changes in responses from the Galaxy API.
- It can also be annoying to have to constantly keep track of entity IDs.
- This happens because BioBlend does not try to model Galaxy entities and how they are connected.
BioBlend.objects
- BioBlend.objects is an extra layer which adds an object-oriented interface for the Galaxy API
- Started by Simone Leo, Luca Pireddu and Nicola Soranzo at CRS4 in 2013
- Distributed with BioBlend
- Presently limited to datasets, histories, invocations, jobs, libraries, tools and workflows
Speaker Notes
- To implement this modelling of Galaxy entities, some years ago an object-oriented interface was added on top of BioBlend: BioBlend objects
- This is developed and distributed together with BioBlend itself.
- When using this interface, methods will return objects encapsulating the dictionaries returned by the Galaxy API.
- The user can then invoke further methods on these objects, for example the download method for a Dataset object.
- Only a subset of BioBlend is available through the object interface, but most of the common user functionalities are included.
References
- Galaxy API docs: https://docs.galaxyproject.org/en/master/api_doc.html
- BioBlend docs: https://bioblend.readthedocs.io/
- BioBlend chat: https://matrix.to/#/#galaxyproject_bioblend:gitter.im
- C. Sloggett, N. Goonasekera, E. Afgan. BioBlend: automating pipeline analyses within Galaxy and CloudMan. Bioinformatics 29(13), 1685-1686, 2013, doi:10.1093/bioinformatics/btt199
- S. Leo, L. Pireddu, G. Cuccuru, L. Lianas, N. Soranzo, E. Afgan, G. Zanetti. BioBlend.objects: metacomputing with Galaxy. Bioinformatics 30 (19), 2816-2817, 2014, doi:10.1093/bioinformatics/btu386
Speaker Notes
- Here you can find the links to the documentation of the Galaxy API and of BioBlend.
- We have a dedicated Gitter channel to chat about BioBlend.
- If you use BioBlend or BioBlend objects, please cite these papers.
Key Points
- The API allows you to use Galaxy's capabilities programmatically.
- BioBlend makes using the Galaxy API from Python easier.
- BioBlend objects is an object-oriented interface for interacting with Galaxy.