{ "metadata": { }, "nbformat": 4, "nbformat_minor": 5, "cells": [ { "id": "metadata", "cell_type": "markdown", "source": "
\n\n# Python - Lists & Strings & Dictionaries\n\nby [The Carpentries](https://training.galaxyproject.org/hall-of-fame/carpentries/), [Helena Rasche](https://training.galaxyproject.org/hall-of-fame/hexylena/), [Donny Vrins](https://training.galaxyproject.org/hall-of-fame/dirowa/), [Bazante Sanders](https://training.galaxyproject.org/hall-of-fame/bazante1/)\n\nCC-BY licensed content from the [Galaxy Training Network](https://training.galaxyproject.org/)\n\n**Objectives**\n\n- How can I store multiple values?\n\n**Objectives**\n\n- Explain why programs need collections of values.\n- Write programs that create flat lists, index them, slice them, and modify them through assignment and method calls.\n\n**Time Estimation: 1H**\n
\n", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-0", "source": "
\n
Agenda
\n

In this tutorial, we will cover:

\n
    \n
  1. Lists
  2. \n
\n
\n

Lists

\n

Doing calculations with a hundred variables called pressure_001, pressure_002, etc. would be at least as slow as doing them by hand. Using a list to store many values together solves that problems. Lists are surrounded by square brackets: [, ], with values separated by commas:

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-1", "source": [ "pressures = [0.273, 0.275, 0.277, 0.275, 0.276]\n", "print(f'pressures: {pressures}')\n", "print(f'length: {len(pressures)}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-2", "source": "

Indexing

\n

You can use an item’s index to fetch it from a list.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-3", "source": [ "print(f'zeroth item of pressures: {pressures[0]}')\n", "print(f'fourth item of pressures: {pressures[4]}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-4", "source": "

Replacement

\n

Lists’ values can be changed or replaced by assigning a new value to the position in the list.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-5", "source": [ "pressures[0] = 0.265\n", "print(f'pressures is now: {pressures}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-6", "source": "

Note how the first item has changed from 0.273

\n

Appending

\n

Appending items to a list lengthens it. You can do list_name.append() to add items to the end of a list.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-7", "source": [ "primes = [2, 3, 5]\n", "print(f'primes is initially: {primes}')\n", "primes.append(7)\n", "print(f'primes has become: {primes}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-8", "source": "

.append() is a method of lists. It’s like a function, but tied to a particular object. You use object_name.method_name to call methods, which deliberately resembles the way we refer to things in a library.

\n

We will meet other methods of lists as we go along. Use help(list) for a preview. extend is similar to append, but it allows you to combine two lists. For example:

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-9", "source": [ "teen_primes = [11, 13, 17, 19]\n", "middle_aged_primes = [37, 41, 43, 47]\n", "print(f'primes is currently: {primes}')\n", "primes.extend(teen_primes)\n", "print(f'primes has now become: {primes}')\n", "primes.append(middle_aged_primes)\n", "print(f'primes has finally become: {primes}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-10", "source": "

Note that while extend maintains the “flat” structure of the list, appending a list to a list makes the result two-dimensional - the last element in primes is a list, not an integer.

\n

This starts to become a more complicated data structure, and we’ll use more of these later. A list containing both integers and a list can be called a “hetereogenous” list, since it has multiple different data types. This is relatively uncommon, most of the lists you’ll encounter will have a single data type inside of them. Sometimes you’ll see a list of lists, which can be used to store positions, like a chessboard.

\n

List Indices

\n

In computer science and programming we number the positions within a list starting from 0, rather than from 1.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-11", "source": [ "# Position 0 1 2 3 4\n", "weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']\n", "print(weekdays[0])\n", "print(weekdays[4])\n", "print(weekdays[3])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-12", "source": "

But if you try an access a position that is outside of the list, you’ll get an error

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-13", "source": [ "print(weekdays[9])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-14", "source": "

returns a IndexError: list index out of range.

\n
\n
\n

So how do you read this?

\n
1 | ---------------------------------------------------------------------------\n2 | IndexError                                Traceback (most recent call last)\n3 | /tmp/ipykernel_648319/137030145.py in <module>\n4 | ----> 1 print(weekdays[9])\n5 |\n6 | IndexError: list index out of range\n
\n
    \n
  1. This is just a line of -s as a separator
  2. \n
  3. IndexError, here Jupyter/CoCalc/etc are trying to be helpful and highlight the error for us. This is the important bit of information!
  4. \n
  5. This is the path to where the code is, Jupyter/CoCalc/etc create temporary files to execute your code.
  6. \n
  7. Here an arrow points to the line number where something has broken. 1 shows that it’s the first line within the cell, and it points to the print statement. Really it’s pointing at the weekdays[9] within the print statement.
  8. \n
  9. Blank
  10. \n
  11. This is where we normally look for the most important part of the Traceback. The error message. An IndexError, namely that the list index (9) is out of the range of possible values (the length of the list.)
  12. \n
\n
\n

However, sometimes you want to access the very end of a list! You can either start at the beginning and count along to find the last item or second to last item, or you can use Negative Indices

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-15", "source": [ "# Position 0 1 2 3 4\n", "weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']\n", "# Position -5 -4 -3 -2 -1\n", "\n", "print(weekdays[-1])\n", "print(weekdays[-2])\n", "print(weekdays[-4])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-16", "source": "

If you wanted to find the last value in a list, you could also use len(elements) and then subtract back to find the index you want

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-17", "source": [ "elements[len(elements)-1]" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-18", "source": "

This is essentially how negative indexes work, except you don’t have to use len(elements), that’s done for you automatically.

\n

Removing Items.

\n

You can use del to remove items from a list entirely. We use del list_name[index] to remove an element from a list (in the example, 9 is not a prime number) and thus shorten it. del is not a function or a method, but a statement in the language.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-19", "source": [ "primes = [2, 3, 5, 7, 9]\n", "print(f'primes before removing last item: {primes}')\n", "del primes[4]\n", "print(f'primes after removing last item: {primes}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-20", "source": "

Empty Lists

\n

The empty list contains no values. When you want to make a new list, use [] on its own to represent a list that doesn’t contain any values. This is helpful as a starting point for collecting values, which we’ll see soon.

\n

Heterogeneous Lists

\n

Lists may contain values of different types. A single list may contain numbers, strings, and anything else.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-21", "source": [ "goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-22", "source": "

Strings are like Lists

\n

Text is often called a “string” in the programming world. Strings of text like name = \"Helena\" or patient_id = \"19237zud830\" are very similar conceptually to lists. Except instead of being a list of numbers, they’re a lists of characters.

\n

In a number of older programming languages, strings are indeed arrays of numbers internally. However python hides a lot of that complexity from us, so we can just work with text.

\n

Still, many of the operations you use on lists, can also be used on strings as well! Strings can be indexed like lists so you can get single elements from lists.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-23", "source": [ "element = 'carbon'\n", "print(f'zeroth character: {element[0]}')\n", "print(f'third character: {element[3]}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-24", "source": "

Strings, however, cannot be modified, you can’t change a single letter in a string. Things that cannot be modified after creation are called immutable or sometimes frozen, compared to things which can be modified which are called mutable.\nPython considers the string to be a single value with parts, not a collection of values.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-25", "source": [ "element[0] = 'C'" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-26", "source": "

Bounds

\n

You cannot access values beyond the end of the list, this will result in an error. Python reports an IndexError if we attempt to access a value that doesn’t exist. This is a kind of runtime error, as it cannot be detected as the code is parsed. Imagine if you had a script which let you read in a file, depending on how many lines were in the file, whether index 90 was valid or invalid, would depend on how big your file was.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-27", "source": [ "print(f'99th element of element is: {element[99]}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-28", "source": "

Exercises

\n
\n
Question: Checking suffixes
\n
    \n
  1. How could you check that the extension of a filename is .csv
  2. \n
  3. Can you find another way? Maybe check the help page for str
  4. \n
\n
👁 View solution\n
\n
    \n
  1. a[-4:] == \"csv\" (Here we use == for comparing two values)
  2. \n
  3. a.endswith('.csv')
  4. \n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-29", "source": [ "# Test code here!\n", "a = \"1234.csv\"\n", "b = \"1273.tsv\"\n", "c = \"9382.csv\"\n", "d = \"1239.csv\"" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-30", "source": "
\n
Question: Say it loud!
\n
    \n
  1. Can you find a method in the str’s help that converts the string to upper case
  2. \n
  3. or lower case?
  4. \n
  5. Can you use it to fix mixed case DNA sequence?
  6. \n
\n
👁 View solution\n
\n
    \n
  1. \"shout it out\".upper()
  2. \n
  3. \"WHISPER THIS\".lower()
  4. \n
  5. terrible_sequence.upper()
  6. \n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-31", "source": [ "# Test answers here!\n", "print(\"shout it out\")\n", "print(\"WHISPER THIS\")\n", "# Fix this mess to be all capital\n", "terrible_sequence = \"AcTGAGccGGTt\"\n", "print(terrible_sequence)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-32", "source": "
\n
Question: Splitting
\n
    \n
  1. We use .split() to split a string by some character. Here we have a comma separated list of values, try splitting that up by a comma, but we actually wanted it separated by | characters. Can you split it up, and then re-join it with that new character?
  2. \n
  3. Does help(str) give you another option for replacing a character like that.
  4. \n
  5. What happens if you split by another value like 3?
  6. \n
\n
👁 View solution\n
\n
    \n
  1. data.split(\",\")
  2. \n
  3. data.replace(\",\", \"|\")
  4. \n
  5. Those characters will disappear! If you want to reconstruct the same string
  6. \n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-33", "source": [ "# Split me\n", "data = \"0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0\"" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-34", "source": "

Slicing & Dicing

\n

All of the data types we’ve talked about today can be sliced, and this will be a key part of using lists.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-35", "source": [ "elements = ['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F']\n", "# Instead of accessing a single element\n", "print(elements[0])\n", "# We'll access a range\n", "print(elements[0:4])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-36", "source": "

Accessing only a portion of a list is commonly used, say if you have a list of FastQ files from paired end sequencing, perhaps you want two of them at a time. You could access those with [0:2].

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-37", "source": [ "# You don't need to start at 0\n", "print(elements[6:8])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-38", "source": "\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-39", "source": [ "# But your end should be bigger than your start.\n", "# What do you think this will return?\n", "# Make a guess before you run it\n", "print(elements[6:5])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-40", "source": "

If you don’t supply an end value, Python will default to going to the end of the list. Likewise, if you don’t provide a start value, Python will use 0 as the start by default, until whatever end value you provide.

\n
\n
Question: Valid and Invalid Slices
\n

Which of these do you think will be valid? Which are invalid? Predict what they will return:

\n
# 1\nelements = ['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F']\n# 2\nelements[0:3]\n# 3\nelements[:3]\n# 4\nelements[-3:3]\n# 5\nelements[-8:-3]\n# 6\nelements[:]\n# 7\nelements[0:20]\n# 8\nelements['H':'Li']\n# 9\nelements[1.5:]\n
\n
👁 View solution\n
\n

All of these are valid except the last two.

\n
    \n
  1. If you dont’ fill in a position, Python will use the default. 0 for the left hand side of the :, and len(elements) for the right hand side.
  2. \n
  3. You can request a slice longer than your list (e.g. up to 20), but Python may not give you that many items back.
  4. \n
  5. List slicing can only be done with integers, not floats.
  6. \n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-41", "source": [ "# Check your answers here!" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-42", "source": "

Stride

\n

However, list slicing can be more complicated. You can additionally use a ‘stride’ parameter, which is how Python should strep through the list. To take every other element from a list:

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-43", "source": [ "values = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]\n", "print(values[0:12:2]) # every other value\n", "print(values[1:12:2]) # every other value from the second value\n", "print(values[::2]) # the start and end are optional\n", "print(values[::3]) # every third value in the list." ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-44", "source": "

So list slicing together is either list[low:high] or list[low:high:stride], where low and high are optional if you just want to go to the end of the list.

\n

Sorting

\n

Lists occasionally need to be sorted. For example, you have a list of students you might want to alphabetise, and here you can use the function sorted to help you.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-45", "source": [ "students = [\n", " 'Koos Christabella',\n", " 'Zackary Habiba',\n", " 'Jumana Rostam',\n", " 'Sorina Gaia',\n", " 'Kalyani Bessarion',\n", " 'Enéas Nirmala',\n", " '王奕辰',\n", " '刘依诺',\n", "]\n", "students = sorted(students)\n", "print(students)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-46", "source": "
\n
\n

Some people have 1 name, some have 4 or more! Some cultures have surnames first, some not. Sorting names is a complex situation, so be sure you consider your data before sorting and assuming it’s correct. Test with multiple values to make sure your code works!

\n
\n
\n
\n

Some analyses (especially simultaions) can be dependent on data input order or data sorting. This was recently seen in {% cite Bhandari_Neupane_2019 %} where the data files used were sorted one way on Windows, and another on Linux, resulting in different results for the same code and the same datasets! Yikes!

\n

If you know your analyses are dependent on file ordering, then you can use sorted() to make sure the data is provided in a uniform way every time.

\n

If you’re not sure if your results will be dependent, you can try sorting anyway. Or better yet, randomising the list of inputs to make sure your code behaves properly in any scenario.

\n
\n

Type Conversion

\n

Just list with converting \"1.5\" to an float with the float() function, or 3.1 to a string with str(), we can do the same with lists using the list() function, and sets with set():

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-47", "source": [ "# Convert text to a list\n", "print(list(\"sometext\"))" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-48", "source": "

Converting a list back into text is likewise possible, but you need to use the special function join. Join is a function of a str, which accepts a list

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-49", "source": [ "word = ['c', 'a', 'f', 'e']\n", "print(\"-\".join(word))" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-50", "source": "

It takes the string you called it on, and uses that as a separator. Then for the list that you provide, it joins that together with the separator.

\n

Exercise Time

\n
\n
Question: Fill in the Blanks
\n

Fill in the blanks so that the program below produces the output shown.

\n
values = ____\nvalues.____(1)\nvalues.____(3)\nvalues.____(5)\nprint(f'first time: {values}')\nvalues = values[____]\nprint(f'second time: {values}')\n
\n
first time: [1, 3, 5]\nsecond time: [3, 5]\n
\n
👁 View solution\n
\n
values = []\nvalues.append(1)\nvalues.append(3)\nvalues.append(5)\nprint(f'first time: {values}')\nvalues = values[1:]\nprint(f'second time: {values}')\n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-51", "source": [ "# Fill in the blanks here!\n", "\n", "values = ____\n", "values.____(1)\n", "values.____(3)\n", "values.____(5)\n", "print(f'first time: {values}') # Should print [1, 3, 5]\n", "values = values[____]\n", "print(f'second time: {values}') # should print [3, 5]" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-52", "source": "
\n

How Large is a Slice?

\n

If start and stop are both non-negative integers,\nhow long is the list values[start:stop]?

\n
👁 View solution\n

Solution

\n

The list values[start:stop] has up to stop - start elements. For example,\nvalues[1:4] has the 3 elements values[1], values[2], and values[3].\nWhy ‘up to’?\nIf stop is greater than the total length of the list values,\nwe will still get a list back but it will be shorter than expected.

\n
\n
\n
\n

From Strings to Lists and Back

\n

Given this:

\n
print(f'string to list: {list('tin')}')\nprint(f'list to string: {''.join(['g', 'o', 'l', 'd'])}')\n
\n
    \n
  1. What does list('some string') do?
  2. \n
  3. What does '-'.join(['x', 'y', 'z']) generate?
  4. \n
\n
👁 View solution\n

Solution

\n
    \n
  1. list('some string') converts a string into a list containing all of its characters.
  2. \n
  3. join returns a string that is the concatenation\nof each string element in the list and adds the separator between each element in the list. This results in\nx-y-z. The separator between the elements is the string that provides this method.
  4. \n
\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-53", "source": [ "# Test code here" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-54", "source": "
\n

Working With the End

\n

What does the following program print?

\n
element = 'helium'\nprint(element[-1])\n
\n
    \n
  1. How does Python interpret a negative index?
  2. \n
  3. If a list or string has N elements,\nwhat is the most negative index that can safely be used with it,\nand what location does that index represent?
  4. \n
  5. If values is a list, what does del values[-1] do?
  6. \n
  7. How can you display all elements but the last one without changing values?\n(Hint: you will need to combine slicing and negative indexing.)
  8. \n
\n
👁 View solution\n

Solution

\n

The program prints m.

\n
    \n
  1. Python interprets a negative index as starting from the end (as opposed to\nstarting from the beginning). The last element is -1.
  2. \n
  3. The last index that can safely be used with a list of N elements is element\n-N, which represents the first element.
  4. \n
  5. del values[-1] removes the last element from the list.
  6. \n
  7. values[:-1]
  8. \n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-55", "source": [ "# Test code here" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-56", "source": "
\n

Stepping Through a List

\n

What does the following program print?

\n
element = 'fluorine'\nprint(element[::2])\nprint(element[::-1])\n
\n
    \n
  1. If we write a slice as low:high:stride, what does stride do?
  2. \n
  3. What expression would select all of the even-numbered items from a collection?
  4. \n
\n
👁 View solution\n

Solution

\n

The program prints

\n
furn\neniroulf\n
\n
    \n
  1. stride is the step size of the slice.
  2. \n
  3. The slice 1::2 selects all even-numbered items from a collection: it starts\nwith element 1 (which is the second element, since indexing starts at 0),\ngoes on until the end (since no end is given), and uses a step size of 2\n(i.e., selects every second element).
  4. \n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-57", "source": [ "# Test code here" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-58", "source": "
\n

Slice Bounds

\n

What does the following program print?

\n
element = 'lithium'\nprint(element[0:20])\nprint(element[-1:3])\n
\n
👁 View solution\n

Solution

\n
lithium\n
\n

The first statement prints the whole string, since the slice goes beyond the total length of the string.\nThe second statement returns an empty string, because the slice goes “out of bounds” of the string.

\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-59", "source": [ "# Test code here" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-60", "source": "

Dictionaries

\n

When you think of a Dictionary, you should think of a real life Dictionary, they map some key to a value. Like a term to it’s definition

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
KeyValue
EichhörnchenSquirrel
火锅Hot Pot
\n

Or a Country to it’s population

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
KeyValue
South Sudan492,970
Australia411,667
Guinea1,660,973
Morocco573,895
Maldives221,678
Wallis and Futuna1,126
Eswatini94,874
Namibia325,858
Turkmenistan1,031,992
\n

In Python we create a dictionary with {} and use : to separate keys and values. Turning the above list into a Python dictionary, it would look like:

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-61", "source": [ "populations = {\n", " \"South Sudan\": 492970,\n", " \"Australia\": 411667,\n", " \"Guinea\": 1660973,\n", " \"Morocco\": 573895,\n", " \"Maldives\": 221678,\n", " \"Wallis and Futuna\": 1126,\n", " \"Eswatini\": 94874,\n", " \"Namibia\": 325858,\n", " \"Turkmenistan\": 1031992,\n", "}" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-62", "source": "

You can see a string (the country name) being used for the key, and then the number (an integer) as the value. (Would a float make sense? Why or why not?)

\n
\n
\n

They’re also sometimes called associative arrays (because they’re an array or list of values that associate a key to a value) or maps (because they map a key to a value), depending on what you’re reading.

\n
\n

Methods

\n

You can access both the keys, and the values

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-63", "source": [ "print(populations.keys())\n", "print(populations.values())" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-64", "source": "

These will print out two list-like objects. They will become more useful in the future when we talk about looping over dictionaries and processing all of the values within.

\n

Accessing Values

\n

Just like lists where you access by the position in the list

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-65", "source": [ "print(populations[\"Namibia\"])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-66", "source": "

And just like lists, if you try an access a key that isn’t there or an index outside of the range of the list:

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-67", "source": [ "print(populations[\"Mars\"])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-68", "source": "
\n
\n

Just like in real life, searching a dictionary for a specific term is quite fast. Often a lot faster than searching a list for a specific value.

\n

For those of you old enough to remember the paper version of a dictionary, you knew that As would be at the start and Zs at the end, and probably Ms around the middle. And if you were looking for a word like “Squirrel”, you’d open the dictionary in the middle, maybe decide it was in the second half of the book, randomly choose a page in the second half, and you could keep deciding if it was “before” or “after” the current page, never even bothering to search the first half.

\n

Conceptually, compared with a list, you can’t make this guess of if the item is in the first or second half. You need to search item by item, it would be like reading page by page until you get to Squirrel in the dictionary.

\n
\n

Modifying Dictionaries

\n

Adding new values to a dictionary is easy, it’s very similar to replacing a value in a list.

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-69", "source": [ "# For lists we did\n", "x = ['x', 'y', 'z']\n", "x[0] = 'a'\n", "print(x)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-70", "source": "

For dictionaries, it’s essentially the same, we access the ‘place’ in the dictionary just like we did with a list, and set it to a value

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-71", "source": [ "populations[\"Mars\"] = 6 # robots\n", "print(populations)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-72", "source": "

And similarly, removing items is the same as it was for lists:

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-73", "source": [ "print(x)\n", "del x[0] # Removes the first item\n", "print(x)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-74", "source": "

And with dictionaries you delete by specifying which position/key you want to remove

\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-75", "source": [ "del populations['Australia']\n", "print(populations)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-76", "source": "

Exercises

\n
\n
Question: DNA Complement
\n

DNA is usually in the form of dsDNA, a paired strand, where A maps to T and C maps to G and vice versa.\nBut when we’re working with DNA sequences in bioinformatics, we often only store one strand, because we can calculate the complement on the fly, when we need.

\n

Write a dictionary that lets you look up the letters A, C, T, and G and find their complements.

\n
👁 View solution\n
\n

You need to have the complements of every base. If you just defined ‘A’ and ‘C’, how would you look up the complement when you want to translate a ‘T’ or a ‘G’? It’s not easily possible to look up a key by a value, only to search a key and find a value.

\n
translation = {\n'A': 'T',\n'T': 'A',\n'C': 'G',\n'G': 'C',\n}\n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-77", "source": [ "# Test code here!\n", "translation = {\n", "\n", "}\n", "print(translation)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-78", "source": "
\n
Question: Modifying an array
\n

Fill in the blanks to make the execution correct:

\n
variants = {\n  'B.1.1.7': 26267,\n  'B.1.351': 439,\n}\nvariants[_____] =  _____\nprint(variants) # Should print {'B.1.1.7': 26267, 'B.1.351': 439, 'P.1': 384}\n__________\nprint(variants) # Should print {'B.1.1.7': 26267, 'B.1.351': 439, 'P.1': 384, 'B.1.617.2': 43486}\n# Maybe we've exterminated B.1.1.7 and B.1.351, remove their numbers.\ndel _______\ndel _______\nprint(variants[______]) # Should print 384\nprint(variants[______]) # Should print 43486\n
\n
👁 View solution\n
\n

variants = {\n ‘B.1.1.7’: 26267,\n ‘B.1.351’: 439,\n}\nvariants[‘P.1’] = 384\nprint(variants) # Should print {‘B.1.1.7’: 26267, ‘B.1.351’: 439, ‘P.1’: 384}\nvariants[‘B.1.617.2’] = 43486\nprint(variants) # Should print {‘B.1.1.7’: 26267, ‘B.1.351’: 439, ‘P.1’: 384, ‘B.1.617.2’: 43486}

\n

Maybe we’ve exterminated B.1.1.7 and B.1.351, remove their numbers.

\n

del variants[‘B.1.1.7’]\ndel variants[‘B.1.351’]\nprint(variants[‘P.1’]) # Should print 384\nprint(variants[‘B.1.617.2’]) # Should print 43486\n```

\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-79", "source": [ "# Test code here!\n", "variants = {\n", " 'B.1.1.7': 26267,\n", " 'B.1.351': 439,\n", "}\n", "variants[_____] = _____\n", "print(variants) # Should print {'B.1.1.7': 26267, 'B.1.351': 439, 'P.1': 384}\n", "__________\n", "print(variants) # Should print {'B.1.1.7': 26267, 'B.1.351': 439, 'P.1': 384, 'B.1.617.2': 43486}\n", "# Maybe we've exterminated B.1.1.7 and B.1.351, remove their numbers.\n", "del _______\n", "del _______\n", "print(variants[______]) # Should print 384\n", "print(variants[______]) # Should print 43486" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-80", "source": "

Choosing the Right Data Type

\n

Choosing the correct data type can sometimes require some thought, and even discussion with colleagues. And don’t be afraid to search the internet for how other people have done it!

\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n\n
Data typeExamplesWhen to use itWhen not to use it
Boolean (bool)True, FalseIf there are only two possible states, true or falseIf your data is not binary
Integer (int)1, 0, -1023, 42Countable, singular items. How many patients are there, how many events did you record, how many variants are there in the sequenceIf doubling or halving the value would not make sense: do not use for e.g. patient IDs, or phone numbers. If these are integers you might accidentally do math on the value.
Float (float)123.49, 3.14159, -3.33334If you need more precision or partial values. Recording distance between places, height, mass, etc. 
Strings (str)‘patient_12312’, ‘Jane Doe’, ‘火锅’To store free text, identifiers, sequence IDs, etc.If it’s truly a numeric value you can do calculations with, like adding or subtracting or doing statistics.
List / Array (list)['A', 1, 3.4, ['Nested']]If you need to store a list of items, like sequences from a file. Especially if you’re reading in a table of data from a file.If you want to retrieve individual values, and there are clear identifiers it might be better as a dict.
Dictionary / Associative Array / map (dict){\"weight\": 3.4, \"age\": 12, \"name\": \"Fluffy\"}When you have identifiers for your data, and want to look them up by that value. E.g. looking up sequences by an identifier, or data about students based on their name. Counting values.If you just have a list of items without identifiers, it makes more sense to just use a list.
\n

Exercises

\n
\n
Question: Which Datatype
\n
    \n
  1. Chromosome Length
  2. \n
  3. Name
  4. \n
  5. Weight
  6. \n
  7. Sex
  8. \n
  9. Hair Colour
  10. \n
  11. Money/Currency
  12. \n
\n
👁 View solution\n
\n
    \n
  1. Here you need to use an integer, a fractional or float value would not make sense. You cannot have half an A/C/T/G.
  2. \n
  3. Here a string would be a good choice. (And probably just a single name string, rather than a first and last name, as not all humans have two names! And some have more than two.)
  4. \n
  5. An integer is good type for storing weight, if you are using a small unit (e.g. grams). Otherwise you might consider a float, but you’d need to be careful to format it properly (e.g. {value:0.2f}) when printing it out. It depends on the application.
  6. \n
  7. This is a case where you should consider carefully the application, but bool is usally the wrong answer. Are you recording patient data? Is their expressed gender the correct variable or did you need sex? {% cite Miyagi_2021 %} goes into detail on this multifaceted issue in a medical research context. For example chromosomal sex is also more complicated and cannot be stored with a true/false value, as people with Kleinfelters exist. A string can be an ok choice here.
  8. \n
  9. There is a limited vocabulary humans use to describe hair colour, so a string can be used, or a data type we haven’t discussed! An enum is an enumeration, and when you have a limited set of values that are possible, you can use a enum to double check that whatever value is being used (or read from a file, or entered by a user) matches one of the “approved” values.
  10. \n
  11. A float is a good guess, but with floats come weird rounding issues. Often times people choose to use an integer storing the value in cents (or fractional cents, to whatever the desired precision is).
  12. \n
\n
\n
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "cell_type": "markdown", "id": "final-ending-cell", "metadata": { "editable": false, "collapsed": false }, "source": [ "# Key Points\n\n", "- A list stores many values in a single structure.\n", "- Use an item's index to fetch it from a list.\n", "- Lists' values can be replaced by assigning to them.\n", "- Appending items to a list lengthens it.\n", "- Use `del` to remove items from a list entirely.\n", "- The empty list contains no values.\n", "- Lists may contain values of different types.\n", "- Character strings can be indexed like lists.\n", "- Character strings are immutable.\n", "- Indexing beyond the end of the collection is an error.\n", "\n# Congratulations on successfully completing this tutorial!\n\n", "Please [fill out the feedback on the GTN website](https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-iterables/tutorial.html#feedback) and check there for further resources!\n" ] } ] }