Scanpy Parameter Iterator

Author(s) orcid logoJulia Jakiela avatar Julia Jakiela
Tester(s) orcid logoWendi Bacon avatar Wendi Bacon
Reviewers Helena Rasche avatar Julia Jakiela avatar Saskia Hiltemann avatar Mehmet Tekman avatar Matthias Bernt avatar
Overview
Creative Commons License: CC-BY Questions:
  • How can I run a tool with multiple parameter values?

  • Do I have to enter parameter values manually each time I want to check a new value?

  • What tools can take multiple values at once and iterate over them?

Objectives:
  • Execute the Scanpy Parameter Iterator

  • Recognise what tools you can use Parameter Iterator with

  • Operate tools working on dataset collections

  • Compare plots resulting from different parameters values

Requirements:
Time estimation: 2 hours
Supporting Materials:
Published: Jul 19, 2023
Last modification: Dec 5, 2023
License: Tutorial Content is licensed under Creative Commons Attribution 4.0 International License. The GTN Framework is licensed under MIT
purl PURL: https://gxy.io/GTN:T00357
version Revision: 7

The magic of bioinformatic analysis is that we use maths, statistics and complicated algorithms to deal with huge amounts of data to help us investigate biology. However, analysis is not always straightforward – each tool has various parameters to select. Eventually, we can end up with very different outcomes depending on the values we choose. With analysing scRNA-seq data, it’s almost like you need to know about 75% of your data, then make sure your analysis shows that, for you to then be able to identify the 25% new information.

Given the vast array of values that we can specify in the tool parameters, how can we know if the values we choose are the most optimal ones - or at least good enough? Well, we can try different values in our workflow and then compare the outputs to see which is consistent with our understanding of the underlying biology. But can we do this efficiently, at scale, to test multiple values?

And here the Parameter Iterator comes in – it allows us to test different variables quickly and easily. This tutorial will show you how to use Parameter Iterator to generate multiple outputs with different parameter values in one go.

Tools are frequently updated to new versions. Your Galaxy may have multiple versions of the same tool available. By default, you will be shown the latest version of the tool. This may NOT be the same tool used in the tutorial you are accessing. Furthermore, if you use a newer tool in one step, and try using an older tool in the next step… this may fail! To ensure you use the same tool versions of a given tutorial, use the Tutorial mode feature.

  • Open your Galaxy server
  • Click on the curriculum icon on the top menu, this will open the GTN inside Galaxy.
  • Navigate to your tutorial
  • Tool names in tutorials will be blue buttons that open the correct tool for you
  • Note: this does not work for all tutorials (yet) gif showing how GTN-in-Galaxy works
  • You can click anywhere in the grey-ed out area outside of the tutorial box to return back to the Galaxy analytical interface
Warning: Not all browsers work!
  • We’ve had some issues with Tutorial mode on Safari for Mac users.
  • Try a different browser if you aren’t seeing the button.

Agenda

In this tutorial, we will cover:

  1. Get Data
  2. Workflow
  3. Inputs
  4. Number of neighbours to derive kNN graph (for Scanpy ComputeGraph tool)
  5. Perplexity (for Scanpy RunTSNE tool)
  6. Resolution (for Scanpy FindCluster tool)
  7. Additional steps
  8. Conclusion

Get Data

The data used in this tutorial is from a mouse dataset of fetal growth restriction (Bacon et al. 2018). You can download the dataset below or import the history with the starting data.

Here are several ways of getting our toy dataset – choose whichever you like!

Hands-on: Option 1: Data upload - Import history
  1. Import history from: example input history

    1. Open the link to the shared history
    2. Click on the Import this history button on the top left
    3. Enter a title for the new history
    4. Click on Copy History

  2. Rename galaxy-pencil the the history to your name of choice.

Hands-on: Option 2: Data upload - Add to history
  1. Create a new history for this tutorial

  2. Import the files from Zenodo

    https://zenodo.org/record/8011681/files/Scanpy_RunPCA_AnnData_object.h5ad
    
    • Copy the link location
    • Click galaxy-upload Upload Data at the top of the tool panel

    • Select galaxy-wf-edit Paste/Fetch Data
    • Paste the link(s) into the text field

    • Press Start

    • Close the window

  3. Rename the dataset if you wish: Scanpy RunPCA: AnnData object

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, change the Name field
    • Click the Save button

  4. Check that the datatype is h5ad

    • Click on the galaxy-pencil pencil icon for the dataset to edit its attributes
    • In the central panel, click galaxy-chart-select-data Datatypes tab on the top
    • In the galaxy-chart-select-data Assign Datatype, select datatypes from “New type” dropdown
      • Tip: you can start typing the datatype into the field to filter the dropdown menu
    • Click the Save button

Workflow

This tutorial is an extension of the full analysis shown in Filter, Plot and Explore Single-cell RNA-seq Data tutorial in the Single-cell RNA-seq: Case Study series. So if you’ve been working through it, you can use your dataset from that tutorial here. If you haven’t completed it but you’re interested in how we get to this point, feel free to have a look at the mentioned tutorial.

Our starting data will be the output of Scanpy RunPCA tool. It is part of the full analysis tutorial, but we will only focus on a smaller and shortened bit of the full workflow to show the application of the Parameter Iterator. Our workflow consists of the following steps:

Workflow that we are going to use in this tutorial: Scanpy RunPCA, Scanpy ComputeGraph, Scanpy RunTSNE, Scanpy RunUMAP, Scanpy FindCluster, Scanpy PlotEmbed. Open image in new tab

Figure 1: Shortened workflow that we are going to use in this tutorial.


For the detailed explanation of the tools presented above, check out this tutorial.

Inputs

Scanpy ParameterIterator tool currently works only for the following parameters:

  1. Number of neighbours to derive kNN graph (for Scanpy ComputeGraph tool)
  2. Perplexity (for Scanpy RunTSNE tool)
  3. Resolution (for Scanpy FindCluster tool)

There are two formats of the input values:

  1. List of all parameter values to be iterated
  2. Step increase values to be iterated

Number of neighbours to derive kNN graph (for Scanpy ComputeGraph tool)

We will now use Scanpy ComputeGraph tool to derive the k-nearest neighbour (kNN) graph from our PCA values. We can use the Parameter Iterator to check how the different k values of nearest neighbours will affect the final outcome. It is important that k neighbours is an integer.

Warning: Float vs integer

Using ‘Step increase values to be iterated’ as the format of the input values automatically generates float values instead of integers. Float, or floating point numbers, are values with a ‘floating’ decimal point. To avoid float values, you must use ‘List of all parameter values to be iterated’ as your chosen values.

The kNN graph will be needed for plotting a UMAP. According to the UMAP developers: “Larger neighbor values will result in more global structure being preserved at the loss of detailed local structure. In general this parameter should often be in the range 5 to 50, with a choice of 10 to 15 being a sensible default”. Therefore, let’s pick some values bigger and smaller than 15 to check how it changes the final UMAP. This is where the Parameter Iterator comes in!

Hands-on: Set your values in Parameter Iterator
  1. Scanpy ParameterIterator ( Galaxy version 0.0.1+galaxy9) with the following parameters:
    • “Parameter type”: n-neighbours
    • “Choose the format of the input values”: List of all parameter values to be iterated
    • “User input values”: 5,10,15,20,25,30,35,40
  2. Rename galaxy-pencil the resulting list of datasets: Parameter iterated - n-neighbours (you have to first click on the collection so that you see the datasets, and then rename it)

  3. Tag galaxy-tags each dataset with its corresponding value:
    • navigate to Show hidden (galaxy-show-hidden icon)
    • add tags accordingly - n-neighbours_10: #n-neighbours_10 etc.
    • If you want to refresh your memory on how to add tags to datasets, have a look here:

    Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

    To tag a dataset:

    1. Click on the dataset to expand it
    2. Click on Add Tags galaxy-tags
    3. Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).
    4. Press Enter
    5. Check that the tag appears below the dataset name

    Tags beginning with # are special!

    They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

    1. a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;
    2. dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);
    3. datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;
    4. datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

    A history without name tags versus history with name tags

    Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

    The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

    More information is in a dedicated #nametag tutorial.

The output of the Parameter Iterator is the list of datasets. We will be working on dataset collections quite a lot, so if you want to gain more understanding of collection operations, visit the corresponding tutorial.

Hands-on: Derive kNN graph with iterated parameter
  1. Scanpy ComputeGraph ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-file “Input object in AnnData/Loom format”: Scanpy RunPCA: AnnData object
    • “Use programme defaults”: param-toggle No
    • “File with n_neighbours, use with parameter iterator. Overrides the n_neighbors setting”:
      • Click on param-collection (Dataset collection)
      • Choose Parameter iterated - n-neighbours
    • “Use the indicated representation”: X_pca
    • “Number of PCs to use”: 20
  2. Rename galaxy-pencil the resulting list of datasets: Scanpy ComputeGraph on collection X: Graph object AnnData (n-neighbours) (you have to first click on the collection so that you see the datasets, and then rename it)

You should now see the output is also a collection. If you click on that, you will see Anndata files, differing only by the n-neighbour value.

Now you have two options, either:

  • pick one of the generated output files and proceed to the next tool with another parameter iteration; or
  • continue with the current collection of datasets.

We choose the second option as only then you will be able to see the effect of using different nearest neighbour values. However, the disadvantage of this option is that you have to select only one value for the subsequent parameters in the workflow in order to see the changes in the final plots.

Comment: Why only one Parameter Iteration at a time?

Iterating the parameters within one tool will give you a list with X datasets: each dataset is the output with the given parameter value. However, if you want to use Parameter Iterator again within another tool, specifying Y parameter values, you will not get X x Y datasets as you might expect. Therefore you have to choose just one output file to be passed on to the next tool which will use Parameter Iterator again. Alternatively, you can use Parameter Iterator once and run the rest of the tools on dataset collection with just one parameter value.

Where are we now in our workflow?

Image showing the step we are at: after Scanpy RunPCA, already run Scanpy ComputeGraph, and before Scanpy RunTSNE, Scanpy RunUMAP, Scanpy FindCluster, Scanpy PlotEmbed. Open image in new tab

Figure 2: We used the Parameter Iterator for the k nearest neighbours to derive the kNN graph. Now we’ll complete our small workflow to see the differences at the end.
Hands-on: Complete the workflow
  1. Scanpy RunTSNE ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-collection “Input object in AnnData/Loom format” (make sure you choose Dataset collection): Scanpy ComputeGraph on collection X: Graph object AnnData (n-neighbours)
    • “Use the indicated representation”: X_pca
    • “Use programme defaults”: param-toggle No
    • “The perplexity is related to the number of nearest neighbours, select a value between 5 and 50”: 30

1A. Rename galaxy-pencil the Anndata object collection output: Scanpy RunTSNE on collection X: tSNE object AnnData (n-neighbours) (you have to first click on the collection so that you see the datasets, and then rename it)

  1. Scanpy RunUMAP ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-collection “Input object in AnnData/Loom format” (make sure you choose Dataset collection): Scanpy RunTSNE on collection X: tSNE object AnnData (n-neighbours)
    • “Use programme defaults”: param-toggle Yes

2A. Rename galaxy-pencil the Anndata object collection output: Scanpy RunUMAP on collection X: UMAP object AnnData (n-neighbours) (you have to first click on the collection so that you see the datasets, and then rename it)

  1. Scanpy FindCluster ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-collection “Input object in AnnData/Loom format” (make sure you choose Dataset collection): Scanpy RunUMAP on collection X: UMAP object AnnData
    • “Use programme defaults”: param-toggle No
    • “Resolution, high value for more and smaller clusters”: 0.6

3A. Rename galaxy-pencil the Anndata object collection output: Scanpy FindCluster on collection X: Clusters AnnData (n-neighbours) (you have to first click on the collection so that you see the datasets, and then rename it)

The differences will only appear in the UMAP embedding, so we’ll plot only them. However, when you run your own analysis, you might want to check if there are changes in other embeddings as well.

Hands-on: Plot UMAP embedding
  1. Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-collection “Input object in AnnData/Loom format” (make sure you choose Dataset collection): Scanpy FindCluster on collection X: Clusters AnnData (n-neighbours)
    • “name of the embedding to plot”: umap
    • “color by attributes, comma separated texts”: louvain
    • “Use raw attributes if present”: No

If you click on the resulting collection you will see several plots. Click on galaxy-eye to see how they differ. Galaxy’s galaxy-scratchbook Window Manager, which you can enable (and disable again) from the menu bar can be very helpful for comparing multiple datasets.

Eight graphs showing the differences between UMAP embeddings caused by different values of k nearest neighbours.Open image in new tab

Figure 3: Comparison of UMAP embedding with different values of k nearest neighbours, perplexity set to 30 and resolution to 0.6.

If you compare the UMAP graphs, you can see the differences that were caused by changing the value of k nearestneighbours. Relying on your biological knowledge, you can now choose which parameter value works best and use it for further analysis.

We will go forward with k value equal to 15. But hang on, we’ve been working on a collection and not a single dataset! How can we access that one dataset with the n-neighbour = 15?

Here is the answer: The datasets that are included in the collections can be accessed separately if you go to your history and click on Show hidden (galaxy-show-hidden icon). You can bring each individual dataset to the visible and active datasets by clicking Unhide.

Hands-on: Unhide the dataset of interest
  1. In Show hidden find the dataset Scanpy ComputeGraph on data X and data Y: Graph object AnnData with the tag #n-neighbours_15 (or any value that you want to go forward with).
  2. Click on Unhide
  3. Your chosen dataset is now visible amongst the active datasets in your history!

Perplexity (for Scanpy RunTSNE tool)

Let’s have another look at our wee workflow.

Image showing the step we are at: after Scanpy RunPCA and Scanpy ComputeGraph from which we will take one dataset to pass on to Scanpy RunTSNE with Parameter Iterator.Open image in new tab

Figure 4: We will take one dataset from the Scanpy ComputeGraph as the input to Scanpy RunTSNE and use the Parameter Iterator with the perplexity parameter.

The next tool in our workflow is Scanpy RunTSNE, which contains the perplexity parameter. Although the tool description says that this value should be an integer, we tested it with float values and it works. Therefore, you can use ‘Step increase values to be iterated’. Keep in mind that perplexity should take values between 5 and 50. Let’s run the Parameter Iterator again.

Hands-on: Set your values in Parameter Iterator
  1. Scanpy ParameterIterator ( Galaxy version 0.0.1+galaxy9) with the following parameters:
    • “Parameter type”: perplexity
    • “Choose the format of the input values”: Step increase values to be iterated
    • “Starting value”: 15
    • “Step”: 5
    • “Ending value”: 45
  2. Rename galaxy-pencil the resulting list of datasets: Parameter iterated - perplexity (you have to first click on the collection so that you see the datasets, and then rename it)

  3. Tag galaxy-tags each dataset with its corresponding value:
    • navigate to Show hidden (galaxy-show-hidden icon)
    • add tags accordingly - perplexity_15: #perplexity_15 etc.
    • If you want to refresh your memory on how to add tags to datasets, have a look here:

    Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

    To tag a dataset:

    1. Click on the dataset to expand it
    2. Click on Add Tags galaxy-tags
    3. Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).
    4. Press Enter
    5. Check that the tag appears below the dataset name

    Tags beginning with # are special!

    They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

    1. a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;
    2. dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);
    3. datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;
    4. datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

    A history without name tags versus history with name tags

    Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

    The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

    More information is in a dedicated #nametag tutorial.

Hands-on: RunTSNE with iterated parameter
  1. Scanpy RunTSNE ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-file “Input object in AnnData/Loom format”: Scanpy ComputeGraph on data X and data Y: Graph object AnnData with the tag #n-neighbours_15
    • “Use the indicated representation”: X_pca
    • “Use programme defaults”: param-toggle No
    • “The perplexity is related to the number of nearest neighbours”:
      • Click on param-collection (Dataset collection)
      • Choose Parameter iterated - perplexity
  2. Rename galaxy-pencil the Anndata object collection output: Scanpy RunTSNE on collection X: tSNE object AnnData (perplexity) (you have to first click on the collection so that you see the datasets, and then rename it)

Changing the value of perplexity will only affect the tSNE graphs, so we can complete the workflow and compare the tSNE plots to choose the best value for further analysis.

Hands-on: Complete the workflow
  1. Scanpy RunUMAP ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-collection “Input object in AnnData/Loom format” (make sure you choose Dataset collection): Scanpy RunTSNE on collection X: tSNE object AnnData (perplexity)
    • “Use programme defaults”: param-toggle Yes

1A. Rename galaxy-pencil the Anndata object collection output: Scanpy RunUMAP on collection X: UMAP object AnnData (perplexity) (you have to first click on the collection so that you see the datasets, and then rename it)

  1. Scanpy FindCluster ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-collection “Input object in AnnData/Loom format” (make sure you choose Dataset collection): Scanpy RunUMAP on collection X: UMAP object AnnData (perplexity)
    • “Use programme defaults”: param-toggle No
    • “Resolution, high value for more and smaller clusters”: 0.6

3A. Rename galaxy-pencil the Anndata object collection output: Scanpy FindCluster on collection X: Clusters AnnData (perplexity) (you have to first click on the collection so that you see the datasets, and then rename it)

Warning

In Use programme defaults you can specify Additional suffix to the name of the slot to save the embedding – if it’s included, PERPLEXITY will be substituted with the value of the perplexity setting. However, in that case you will get an error when using Scanpy PlotEmbed: due to the value included in the entry name, the tool will not recognise the correct embedding. Therefore, we leave that box unfilled even though it would make it easier to differentiate between datasets.

Hands-on: Plot tSNE embedding
  1. Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-collection “Input object in AnnData/Loom format” (make sure you choose Dataset collection): Scanpy FindCluster on collection X: Clusters AnnData (perplexity)
    • “name of the embedding to plot”: tsne
    • “color by attributes, comma separated texts”: louvain
    • “Use raw attributes if present”: No

Can you see those differences?

Graphs showing the differences between tSNE embeddings caused by different values of perplexity.Open image in new tab

Figure 5: Comparison of tSNE embedding with different values of perplexity, k nearest neighbours set to 15 and resolution to 0.6.

What do you think about those plots? Which value would you choose? Perplexity is probably the least intuitive or consistently functioning parameter, so feel free to learn more from this nifty blog post. We will go forward with a perplexity value equal to 30.

Resolution (for Scanpy FindCluster tool)

Where is Scanpy FindCluster tool in our workflow?

Image showing the step we are at: we have chosen values for Scanpy ComputeGraph and Scanpy RunTSNE, then proceeded to Scanpy RunUMAP to get to Scanpy FindCluster.Open image in new tab

Figure 6: We have chosen values for Scanpy ComputeGraph and Scanpy RunTSNE, then proceeded to Scanpy RunUMAP to get to Scanpy FindCluster where we can iterate the resolution parameter.

The input data for the Scanpy FindCluster tool is the output of Scanpy RunUMAP tool, so let’s get this single dataset with the parameter values (from previous tools) of our choice, that is n-neighbours=15 and perplexity=30.

Hands-on: Unhide the dataset of interest
  1. In Show hidden find the dataset Scanpy RunUMAP on data X: UMAP object AnnData with the tags #n-neighbours_15 and #perplexity_30 (or any value that you want to go forward with).
  2. Click on Unhide
  3. Your chosen dataset is now visible amongst the active datasets in your history!

The last tool that we can use Parameter Iterator for is Scanpy FindCluster tool. We will iterate over the resolution values. In this case, those values can be floats, so you can use either ‘List of all parameter values to be iterated’ or ‘Step increase values to be iterated’. Keep in mind that when it comes to the resolution, a high value means more and smaller clusters.

Hands-on: Set your values in Parameter Iterator
  1. Scanpy ParameterIterator ( Galaxy version 0.0.1+galaxy9) with the following parameters:
    • “Parameter type”: resolution
    • “Choose the format of the input values”: Step increase values to be iterated
    • “Starting value”: 0.2
    • “Step”: 0.2
    • “Ending value”: 1.4
  2. Rename galaxy-pencil the resulting list of datasets: Parameter iterated - resolution (you have to first click on the collection so that you see the datasets, and then rename it)

  3. Tag galaxy-tags each dataset with its corresponding value:
    • navigate to Show hidden (galaxy-show-hidden icon)
    • add tags accordingly - resolution_0.2: #resolution_0.2 etc.
    • If you want to refresh your memory on how to add tags to datasets, have a look here:

    Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

    To tag a dataset:

    1. Click on the dataset to expand it
    2. Click on Add Tags galaxy-tags
    3. Add tag text. Tags starting with # will be automatically propagated to the outputs of tools using this dataset (see below).
    4. Press Enter
    5. Check that the tag appears below the dataset name

    Tags beginning with # are special!

    They are called Name tags. The unique feature of these tags is that they propagate: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below). The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

    1. a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using Bowtie2 generating dataset 3;
    2. dataset 3 is used to calculate read coverage using BedTools Genome Coverage separately for + and - strands. This generates two datasets (4 and 5 for plus and minus, respectively);
    3. datasets 4 and 5 are used as inputs to Macs2 broadCall datasets generating datasets 6 and 8;
    4. datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using BedTools Intersect generating datasets 10 and 11.

    A history without name tags versus history with name tags

    Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain “plus” data versus “minus” data. For example, does dataset 10 contain “plus” data or “minus” data? Probably “minus” but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

    The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with #plus and #minus, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on… As a result it is straightforward to trace both branches (plus and minus) of this analysis.

    More information is in a dedicated #nametag tutorial.

Hands-on: FindCluster with iterated parameter
  1. Scanpy FindCluster ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-file “Input object in AnnData/Loom format”: Scanpy RunUMAP on data X: UMAP object AnnData with the tags #n-neighbours_15 and #perplexity_30
    • “Use programme defaults”: param-toggle No
    • “File with resolution, use with parameter iterator. Overrides the resolution setting”:
      • Click on param-collection (Dataset collection)
      • Choose Parameter iterated - resolution
  2. Rename galaxy-pencil the Anndata object collection output: Scanpy FindCluster on collection X: Clusters AnnData (resolution) (you have to first click on the collection so that you see the datasets, and then rename it)

You can see the effect of resolution parameter on all embeddings: UMAP, tSNE and PCA so you can plot them all and compare the granularity of the clusters. It is also useful to see how the equal increments affect the clustering and what is the rate of change of the granularity.

Hands-on: Plot the embeddings
  1. Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-collection “Input object in AnnData/Loom format” (make sure you choose Dataset collection): Scanpy FindCluster on collection X: Clusters AnnData (resolution)
    • “name of the embedding to plot”: pca
    • “color by attributes, comma separated texts”: louvain
    • “Use raw attributes if present”: No You can re-run galaxy-refresh the same tool again, but change pca to tsne and then finally to umap in order to skip the following two steps.
  2. Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-collection “Input object in AnnData/Loom format” (make sure you choose Dataset collection): Scanpy FindCluster on collection X: Clusters AnnData (resolution)
    • “name of the embedding to plot”: tsne
    • “color by attributes, comma separated texts”: louvain
    • “Use raw attributes if present”: No
  3. Scanpy PlotEmbed ( Galaxy version 1.8.1+galaxy9) with the following parameters:
    • param-collection “Input object in AnnData/Loom format” (make sure you choose Dataset collection): Scanpy FindCluster on collection X: Clusters AnnData (resolution)
    • “name of the embedding to plot”: umap
    • “color by attributes, comma separated texts”: louvain
    • “Use raw attributes if present”: No

That’s the last tool in our workflow which uses Parameter Iterator! Let’s have a final look at the generated plots.

Graphs showing the differences between PCA embeddings caused by different values of resolution.Open image in new tab

Figure 7: Comparison of PCA embedding with different values of resolution, with n-neighbours set to 15 and perplexity to 30.
Graphs showing the differences between tSNE embeddings caused by different values of resolution.Open image in new tab

Figure 8: Comparison of tSNE embedding with different values of resolution, with n-neighbours set to 15 and perplexity to 30.
Graphs showing the differences between UMAP embeddings caused by different values of resolution.Open image in new tab

Figure 9: Comparison of UMAP embedding with different values of resolution, with n-neighbours set to 15 and perplexity to 30.
Comment

Still not sure which value works best? For more explanation and in-depth analysis, please read through this tutorial. Hopefully it will give you more insight into interpretation of the resulting plots.

Additional steps

  • It may happen that some of the values you choose will give an error, but some will work fine. In that case, you can use tool Filter failed datasets tool to remove datasets with errors from a collection.

  • If you still haven’t found an answer that would help you with the parameter iteration in your own analysis, check out another workflow which has some extra steps, not directly related to our analysis. But it might contain steps that would be helpful for you.

Conclusion

You might want to compare your results with this control history, or check out the workflow for this tutorial. You can also continue to analyse this data by returning to the Filter, Plot and Explore tutorial.

congratulations You have finished the tutorial! You have learned how to use the Parameter Iterator with the nearest neighbours, perplexity and resolution parameters. You also compared multiple outputs resulting from the analysis using different values at three different steps (Scanpy ComputeGraph, Scanpy RunTSNE and Scanpy FindCluster). Hopefully this tool will help you more quickly assess parameter values, ultimately helping you choose values that both confirm prior knowledge as well as offer new insights on biological data.

feedback To discuss with like-minded scientists, join our Galaxy Training Network chatspace in Slack and discuss with fellow users of Galaxy single cell analysis tools on #single-cell-users

We also post new tutorials / workflows there from time to time, as well as any other news.

point-right If you’d like to contribute ideas, requests or feedback as part of the wider community building single-cell and spatial resources within Galaxy, you can also join our Single cell & sPatial Omics Community of Practice.

tool You can request tools here on our Single Cell and Spatial Omics Community Tool Request Spreadsheet