{ "metadata": { }, "nbformat": 4, "nbformat_minor": 5, "cells": [ { "id": "metadata", "cell_type": "markdown", "source": "
\n\nAgenda\nIn this tutorial, we will cover:
\n\n
\n- Lists
\n
Doing calculations with a hundred variables called pressure_001
, pressure_002
, etc. would be at least as slow as doing them by hand. Using a list to store many values together solves that problems. Lists are surrounded by square brackets: [
, ]
, with values separated by commas:
You can use an item’s index to fetch it from a list.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-3", "source": [ "print(f'zeroth item of pressures: {pressures[0]}')\n", "print(f'fourth item of pressures: {pressures[4]}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-4", "source": "Lists’ values can be changed or replaced by assigning a new value to the position in the list.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-5", "source": [ "pressures[0] = 0.265\n", "print(f'pressures is now: {pressures}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-6", "source": "Note how the first item has changed from 0.273
Appending items to a list lengthens it. You can do list_name.append()
to add items to the end of a list.
.append()
is a method of lists. It’s like a function, but tied to a particular object. You use object_name.method_name
to call methods, which deliberately resembles the way we refer to things in a library.
We will meet other methods of lists as we go along. Use help(list)
for a preview. extend
is similar to append
, but it allows you to combine two lists. For example:
Note that while extend
maintains the “flat” structure of the list, appending a list to a list makes the result two-dimensional - the last element in primes
is a list, not an integer.
This starts to become a more complicated data structure, and we’ll use more of these later. A list containing both integers and a list can be called a “hetereogenous” list, since it has multiple different data types. This is relatively uncommon, most of the lists you’ll encounter will have a single data type inside of them. Sometimes you’ll see a list of lists, which can be used to store positions, like a chessboard.
\nIn computer science and programming we number the positions within a list starting from 0
, rather than from 1
.
But if you try an access a position that is outside of the list, you’ll get an error
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-13", "source": [ "print(weekdays[9])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-14", "source": "returns a IndexError: list index out of range
.
\n\n\nSo how do you read this?
\n\n1 | ---------------------------------------------------------------------------\n2 | IndexError Traceback (most recent call last)\n3 | /tmp/ipykernel_648319/137030145.py in <module>\n4 | ----> 1 print(weekdays[9])\n5 |\n6 | IndexError: list index out of range\n
\n
\n- This is just a line of
\n-
s as a separator- \n
IndexError
, here Jupyter/CoCalc/etc are trying to be helpful and highlight the error for us. This is the important bit of information!- This is the path to where the code is, Jupyter/CoCalc/etc create temporary files to execute your code.
\n- Here an arrow points to the line number where something has broken. 1 shows that it’s the first line within the cell, and it points to the print statement. Really it’s pointing at the
\nweekdays[9]
within the print statement.- Blank
\n- This is where we normally look for the most important part of the Traceback. The error message. An
\nIndexError
, namely that the list index (9) is out of the range of possible values (the length of the list.)
However, sometimes you want to access the very end of a list! You can either start at the beginning and count along to find the last item or second to last item, or you can use Negative Indices
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-15", "source": [ "# Position 0 1 2 3 4\n", "weekdays = ['Monday', 'Tuesday', 'Wednesday', 'Thursday', 'Friday']\n", "# Position -5 -4 -3 -2 -1\n", "\n", "print(weekdays[-1])\n", "print(weekdays[-2])\n", "print(weekdays[-4])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-16", "source": "If you wanted to find the last value in a list, you could also use len(elements)
and then subtract back to find the index you want
This is essentially how negative indexes work, except you don’t have to use len(elements)
, that’s done for you automatically.
You can use del
to remove items from a list entirely. We use del list_name[index]
to remove an element from a list (in the example, 9 is not a prime number) and thus shorten it. del
is not a function or a method, but a statement in the language.
The empty list contains no values. When you want to make a new list, use []
on its own to represent a list that doesn’t contain any values. This is helpful as a starting point for collecting values, which we’ll see soon.
Lists may contain values of different types. A single list may contain numbers, strings, and anything else.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-21", "source": [ "goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-22", "source": "Text is often called a “string” in the programming world. Strings of text like name = \"Helena\"
or patient_id = \"19237zud830\"
are very similar conceptually to lists. Except instead of being a list of numbers, they’re a lists of characters.
In a number of older programming languages, strings are indeed arrays of numbers internally. However python hides a lot of that complexity from us, so we can just work with text.
\nStill, many of the operations you use on lists, can also be used on strings as well! Strings can be indexed like lists so you can get single elements from lists.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-23", "source": [ "element = 'carbon'\n", "print(f'zeroth character: {element[0]}')\n", "print(f'third character: {element[3]}')" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-24", "source": "Strings, however, cannot be modified, you can’t change a single letter in a string. Things that cannot be modified after creation are called immutable or sometimes frozen, compared to things which can be modified which are called mutable.\nPython considers the string to be a single value with parts, not a collection of values.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-25", "source": [ "element[0] = 'C'" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-26", "source": "You cannot access values beyond the end of the list, this will result in an error. Python reports an IndexError
if we attempt to access a value that doesn’t exist. This is a kind of runtime error, as it cannot be detected as the code is parsed. Imagine if you had a script which let you read in a file, depending on how many lines were in the file, whether index 90 was valid or invalid, would depend on how big your file was.
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-29", "source": [ "# Test code here!\n", "a = \"1234.csv\"\n", "b = \"1273.tsv\"\n", "c = \"9382.csv\"\n", "d = \"1239.csv\"" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-30", "source": "Question: Checking suffixes\n\n
\n- How could you check that the extension of a filename is
\n.csv
- Can you find another way? Maybe check the help page for
\nstr
\n👁 View solution
\n\n\n
\n- \n
a[-4:] == \"csv\"
(Here we use==
for comparing two values)- \n
a.endswith('.csv')
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-31", "source": [ "# Test answers here!\n", "print(\"shout it out\")\n", "print(\"WHISPER THIS\")\n", "# Fix this mess to be all capital\n", "terrible_sequence = \"AcTGAGccGGTt\"\n", "print(terrible_sequence)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-32", "source": "Question: Say it loud!\n\n
\n- Can you find a method in the
\nstr
’s help that converts the string to upper case- or lower case?
\n- Can you use it to fix mixed case DNA sequence?
\n\n👁 View solution
\n\n\n
\n- \n
\"shout it out\".upper()
- \n
\"WHISPER THIS\".lower()
- \n
terrible_sequence.upper()
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-33", "source": [ "# Split me\n", "data = \"0,0,1,3,1,2,4,7,8,3,3,3,10,5,7,4,7,7,12,18,6,13,11,11,7,7,4,6,8,8,4,4,5,7,3,4,2,3,0,0\"" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-34", "source": "Question: Splitting\n\n
\n- We use
\n.split()
to split a string by some character. Here we have a comma separated list of values, try splitting that up by a comma, but we actually wanted it separated by|
characters. Can you split it up, and then re-join it with that new character?- Does
\nhelp(str)
give you another option for replacing a character like that.- What happens if you split by another value like
\n3
?\n👁 View solution
\n\n\n
\n- \n
data.split(\",\")
- \n
data.replace(\",\", \"|\")
- Those characters will disappear! If you want to reconstruct the same string
\n
All of the data types we’ve talked about today can be sliced, and this will be a key part of using lists.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-35", "source": [ "elements = ['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F']\n", "# Instead of accessing a single element\n", "print(elements[0])\n", "# We'll access a range\n", "print(elements[0:4])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-36", "source": "Accessing only a portion of a list is commonly used, say if you have a list of FastQ files from paired end sequencing, perhaps you want two of them at a time. You could access those with [0:2]
.
If you don’t supply an end value, Python will default to going to the end of the list. Likewise, if you don’t provide a start value, Python will use 0
as the start by default, until whatever end value you provide.
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-41", "source": [ "# Check your answers here!" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-42", "source": "Question: Valid and Invalid Slices\nWhich of these do you think will be valid? Which are invalid? Predict what they will return:
\n\n# 1\nelements = ['H', 'He', 'Li', 'Be', 'B', 'C', 'N', 'O', 'F']\n# 2\nelements[0:3]\n# 3\nelements[:3]\n# 4\nelements[-3:3]\n# 5\nelements[-8:-3]\n# 6\nelements[:]\n# 7\nelements[0:20]\n# 8\nelements['H':'Li']\n# 9\nelements[1.5:]\n
\n👁 View solution
\n\nAll of these are valid except the last two.
\n\n
\n- If you dont’ fill in a position, Python will use the default. 0 for the left hand side of the
\n:
, andlen(elements)
for the right hand side.- You can request a slice longer than your list (e.g. up to 20), but Python may not give you that many items back.
\n- List slicing can only be done with integers, not floats.
\n
However, list slicing can be more complicated. You can additionally use a ‘stride’ parameter, which is how Python should strep through the list. To take every other element from a list:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-43", "source": [ "values = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]\n", "print(values[0:12:2]) # every other value\n", "print(values[1:12:2]) # every other value from the second value\n", "print(values[::2]) # the start and end are optional\n", "print(values[::3]) # every third value in the list." ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-44", "source": "So list slicing together is either list[low:high]
or list[low:high:stride]
, where low and high are optional if you just want to go to the end of the list.
Lists occasionally need to be sorted. For example, you have a list of students you might want to alphabetise, and here you can use the function sorted
to help you.
\n\n\nSome people have 1 name, some have 4 or more! Some cultures have surnames first, some not. Sorting names is a complex situation, so be sure you consider your data before sorting and assuming it’s correct. Test with multiple values to make sure your code works!
\n
\n\n\nSome analyses (especially simultaions) can be dependent on data input order or data sorting. This was recently seen in {% cite Bhandari_Neupane_2019 %} where the data files used were sorted one way on Windows, and another on Linux, resulting in different results for the same code and the same datasets! Yikes!
\nIf you know your analyses are dependent on file ordering, then you can use
\nsorted()
to make sure the data is provided in a uniform way every time.If you’re not sure if your results will be dependent, you can try sorting anyway. Or better yet, randomising the list of inputs to make sure your code behaves properly in any scenario.
\n
Just list with converting \"1.5\"
to an float with the float()
function, or 3.1
to a string with str()
, we can do the same with lists using the list()
function, and sets with set()
:
Converting a list back into text is likewise possible, but you need to use the special function join
. Join is a function of a str
, which accepts a list
It takes the string you called it on, and uses that as a separator. Then for the list that you provide, it joins that together with the separator.
\n\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-51", "source": [ "# Fill in the blanks here!\n", "\n", "values = ____\n", "values.____(1)\n", "values.____(3)\n", "values.____(5)\n", "print(f'first time: {values}') # Should print [1, 3, 5]\n", "values = values[____]\n", "print(f'second time: {values}') # should print [3, 5]" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-52", "source": "Question: Fill in the Blanks\nFill in the blanks so that the program below produces the output shown.
\n\nvalues = ____\nvalues.____(1)\nvalues.____(3)\nvalues.____(5)\nprint(f'first time: {values}')\nvalues = values[____]\nprint(f'second time: {values}')\n
\nfirst time: [1, 3, 5]\nsecond time: [3, 5]\n
\n👁 View solution
\n\n\nvalues = []\nvalues.append(1)\nvalues.append(3)\nvalues.append(5)\nprint(f'first time: {values}')\nvalues = values[1:]\nprint(f'second time: {values}')\n
\n\nHow Large is a Slice?
\nIf
\nstart
andstop
are both non-negative integers,\nhow long is the listvalues[start:stop]
?\n👁 View solution
\nSolution
\nThe list
\nvalues[start:stop]
has up tostop - start
elements. For example,\nvalues[1:4]
has the 3 elementsvalues[1]
,values[2]
, andvalues[3]
.\nWhy ‘up to’?\nIfstop
is greater than the total length of the listvalues
,\nwe will still get a list back but it will be shorter than expected.
\n\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-53", "source": [ "# Test code here" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-54", "source": "From Strings to Lists and Back
\nGiven this:
\n\nprint(f'string to list: {list('tin')}')\nprint(f'list to string: {''.join(['g', 'o', 'l', 'd'])}')\n
\n
\n- What does
\nlist('some string')
do?- What does
\n'-'.join(['x', 'y', 'z'])
generate?👁 View solution
\nSolution
\n\n
\n- \n
list('some string')
converts a string into a list containing all of its characters.- \n
join
returns a string that is the concatenation\nof each string element in the list and adds the separator between each element in the list. This results in\nx-y-z
. The separator between the elements is the string that provides this method.
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-55", "source": [ "# Test code here" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-56", "source": "Working With the End
\nWhat does the following program print?
\n\nelement = 'helium'\nprint(element[-1])\n
\n
\n- How does Python interpret a negative index?
\n- If a list or string has N elements,\nwhat is the most negative index that can safely be used with it,\nand what location does that index represent?
\n- If
\nvalues
is a list, what doesdel values[-1]
do?- How can you display all elements but the last one without changing
\nvalues
?\n(Hint: you will need to combine slicing and negative indexing.)\n👁 View solution
\nSolution
\nThe program prints
\nm
.\n
\n- Python interprets a negative index as starting from the end (as opposed to\nstarting from the beginning). The last element is
\n-1
.- The last index that can safely be used with a list of N elements is element\n
\n-N
, which represents the first element.- \n
del values[-1]
removes the last element from the list.- \n
values[:-1]
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-57", "source": [ "# Test code here" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-58", "source": "Stepping Through a List
\nWhat does the following program print?
\n\nelement = 'fluorine'\nprint(element[::2])\nprint(element[::-1])\n
\n
\n- If we write a slice as
\nlow:high:stride
, what doesstride
do?- What expression would select all of the even-numbered items from a collection?
\n\n👁 View solution
\nSolution
\nThe program prints
\n\nfurn\neniroulf\n
\n
\n- \n
stride
is the step size of the slice.- The slice
\n1::2
selects all even-numbered items from a collection: it starts\nwith element1
(which is the second element, since indexing starts at0
),\ngoes on until the end (since noend
is given), and uses a step size of2
\n(i.e., selects every second element).
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-59", "source": [ "# Test code here" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-60", "source": "Slice Bounds
\nWhat does the following program print?
\n\nelement = 'lithium'\nprint(element[0:20])\nprint(element[-1:3])\n
\n👁 View solution
\nSolution
\n\nlithium\n
The first statement prints the whole string, since the slice goes beyond the total length of the string.\nThe second statement returns an empty string, because the slice goes “out of bounds” of the string.
\n
When you think of a Dictionary, you should think of a real life Dictionary, they map some key to a value. Like a term to it’s definition
\nKey | \nValue | \n
---|---|
Eichhörnchen | \nSquirrel | \n
火锅 | \nHot Pot | \n
Or a Country to it’s population
\nKey | \nValue | \n
---|---|
South Sudan | \n492,970 | \n
Australia | \n411,667 | \n
Guinea | \n1,660,973 | \n
Morocco | \n573,895 | \n
Maldives | \n221,678 | \n
Wallis and Futuna | \n1,126 | \n
Eswatini | \n94,874 | \n
Namibia | \n325,858 | \n
Turkmenistan | \n1,031,992 | \n
In Python we create a dictionary with {}
and use :
to separate keys and values. Turning the above list into a Python dictionary, it would look like:
You can see a string (the country name) being used for the key, and then the number (an integer) as the value. (Would a float make sense? Why or why not?)
\n\n\n\nThey’re also sometimes called associative arrays (because they’re an array or list of values that associate a key to a value) or maps (because they map a key to a value), depending on what you’re reading.
\n
You can access both the keys, and the values
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-63", "source": [ "print(populations.keys())\n", "print(populations.values())" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-64", "source": "These will print out two list-like objects. They will become more useful in the future when we talk about looping over dictionaries and processing all of the values within.
\nJust like lists where you access by the position in the list
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-65", "source": [ "print(populations[\"Namibia\"])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-66", "source": "And just like lists, if you try an access a key that isn’t there or an index outside of the range of the list:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-67", "source": [ "print(populations[\"Mars\"])" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-68", "source": "\n\n\nJust like in real life, searching a dictionary for a specific term is quite fast. Often a lot faster than searching a list for a specific value.
\nFor those of you old enough to remember the paper version of a dictionary, you knew that As would be at the start and Zs at the end, and probably Ms around the middle. And if you were looking for a word like “Squirrel”, you’d open the dictionary in the middle, maybe decide it was in the second half of the book, randomly choose a page in the second half, and you could keep deciding if it was “before” or “after” the current page, never even bothering to search the first half.
\nConceptually, compared with a list, you can’t make this guess of if the item is in the first or second half. You need to search item by item, it would be like reading page by page until you get to Squirrel in the dictionary.
\n
Adding new values to a dictionary is easy, it’s very similar to replacing a value in a list.
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-69", "source": [ "# For lists we did\n", "x = ['x', 'y', 'z']\n", "x[0] = 'a'\n", "print(x)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-70", "source": "For dictionaries, it’s essentially the same, we access the ‘place’ in the dictionary just like we did with a list, and set it to a value
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-71", "source": [ "populations[\"Mars\"] = 6 # robots\n", "print(populations)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-72", "source": "And similarly, removing items is the same as it was for lists:
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-73", "source": [ "print(x)\n", "del x[0] # Removes the first item\n", "print(x)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-74", "source": "And with dictionaries you delete by specifying which position/key you want to remove
\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-75", "source": [ "del populations['Australia']\n", "print(populations)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-76", "source": "\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-77", "source": [ "# Test code here!\n", "translation = {\n", "\n", "}\n", "print(translation)" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-78", "source": "Question: DNA Complement\nDNA is usually in the form of dsDNA, a paired strand, where A maps to T and C maps to G and vice versa.\nBut when we’re working with DNA sequences in bioinformatics, we often only store one strand, because we can calculate the complement on the fly, when we need.
\nWrite a dictionary that lets you look up the letters A, C, T, and G and find their complements.
\n\n👁 View solution
\n\nYou need to have the complements of every base. If you just defined ‘A’ and ‘C’, how would you look up the complement when you want to translate a ‘T’ or a ‘G’? It’s not easily possible to look up a key by a value, only to search a key and find a value.
\n\ntranslation = {\n'A': 'T',\n'T': 'A',\n'C': 'G',\n'G': 'C',\n}\n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "id": "cell-79", "source": [ "# Test code here!\n", "variants = {\n", " 'B.1.1.7': 26267,\n", " 'B.1.351': 439,\n", "}\n", "variants[_____] = _____\n", "print(variants) # Should print {'B.1.1.7': 26267, 'B.1.351': 439, 'P.1': 384}\n", "__________\n", "print(variants) # Should print {'B.1.1.7': 26267, 'B.1.351': 439, 'P.1': 384, 'B.1.617.2': 43486}\n", "# Maybe we've exterminated B.1.1.7 and B.1.351, remove their numbers.\n", "del _______\n", "del _______\n", "print(variants[______]) # Should print 384\n", "print(variants[______]) # Should print 43486" ], "cell_type": "code", "execution_count": null, "outputs": [ ], "metadata": { "attributes": { "classes": [ "> In this tutorial, we will cover:" ], "id": "" } } }, { "id": "cell-80", "source": "Question: Modifying an array\nFill in the blanks to make the execution correct:
\n\nvariants = {\n 'B.1.1.7': 26267,\n 'B.1.351': 439,\n}\nvariants[_____] = _____\nprint(variants) # Should print {'B.1.1.7': 26267, 'B.1.351': 439, 'P.1': 384}\n__________\nprint(variants) # Should print {'B.1.1.7': 26267, 'B.1.351': 439, 'P.1': 384, 'B.1.617.2': 43486}\n# Maybe we've exterminated B.1.1.7 and B.1.351, remove their numbers.\ndel _______\ndel _______\nprint(variants[______]) # Should print 384\nprint(variants[______]) # Should print 43486\n
\n👁 View solution
\n\nvariants = {\n ‘B.1.1.7’: 26267,\n ‘B.1.351’: 439,\n}\nvariants[‘P.1’] = 384\nprint(variants) # Should print {‘B.1.1.7’: 26267, ‘B.1.351’: 439, ‘P.1’: 384}\nvariants[‘B.1.617.2’] = 43486\nprint(variants) # Should print {‘B.1.1.7’: 26267, ‘B.1.351’: 439, ‘P.1’: 384, ‘B.1.617.2’: 43486}
\nMaybe we’ve exterminated B.1.1.7 and B.1.351, remove their numbers.
\ndel variants[‘B.1.1.7’]\ndel variants[‘B.1.351’]\nprint(variants[‘P.1’]) # Should print 384\nprint(variants[‘B.1.617.2’]) # Should print 43486\n```
\n
Choosing the correct data type can sometimes require some thought, and even discussion with colleagues. And don’t be afraid to search the internet for how other people have done it!
\nData type | \nExamples | \nWhen to use it | \nWhen not to use it | \n
---|---|---|---|
Boolean (bool ) | \nTrue , False | \nIf there are only two possible states, true or false | \nIf your data is not binary | \n
Integer (int ) | \n1, 0, -1023, 42 | \nCountable, singular items. How many patients are there, how many events did you record, how many variants are there in the sequence | \nIf doubling or halving the value would not make sense: do not use for e.g. patient IDs, or phone numbers. If these are integers you might accidentally do math on the value. | \n
Float (float ) | \n123.49, 3.14159, -3.33334 | \nIf you need more precision or partial values. Recording distance between places, height, mass, etc. | \n\n |
Strings (str ) | \n‘patient_12312’, ‘Jane Doe’, ‘火锅’ | \nTo store free text, identifiers, sequence IDs, etc. | \nIf it’s truly a numeric value you can do calculations with, like adding or subtracting or doing statistics. | \n
List / Array (list ) | \n['A', 1, 3.4, ['Nested']] | \nIf you need to store a list of items, like sequences from a file. Especially if you’re reading in a table of data from a file. | \nIf you want to retrieve individual values, and there are clear identifiers it might be better as a dict. | \n
Dictionary / Associative Array / map (dict ) | \n{\"weight\": 3.4, \"age\": 12, \"name\": \"Fluffy\"} | \nWhen you have identifiers for your data, and want to look them up by that value. E.g. looking up sequences by an identifier, or data about students based on their name. Counting values. | \nIf you just have a list of items without identifiers, it makes more sense to just use a list. | \n
\n\n", "cell_type": "markdown", "metadata": { "editable": false, "collapsed": false } }, { "cell_type": "markdown", "id": "final-ending-cell", "metadata": { "editable": false, "collapsed": false }, "source": [ "# Key Points\n\n", "- A list stores many values in a single structure.\n", "- Use an item's index to fetch it from a list.\n", "- Lists' values can be replaced by assigning to them.\n", "- Appending items to a list lengthens it.\n", "- Use `del` to remove items from a list entirely.\n", "- The empty list contains no values.\n", "- Lists may contain values of different types.\n", "- Character strings can be indexed like lists.\n", "- Character strings are immutable.\n", "- Indexing beyond the end of the collection is an error.\n", "\n# Congratulations on successfully completing this tutorial!\n\n", "Please [fill out the feedback on the GTN website](https://training.galaxyproject.org/training-material/topics/data-science/tutorials/python-iterables/tutorial.html#feedback) and check there for further resources!\n" ] } ] }Question: Which Datatype\n\n
\n- Chromosome Length
\n- Name
\n- Weight
\n- Sex
\n- Hair Colour
\n- Money/Currency
\n\n👁 View solution
\n\n\n
\n- Here you need to use an integer, a fractional or float value would not make sense. You cannot have half an A/C/T/G.
\n- Here a string would be a good choice. (And probably just a single
\nname
string, rather than afirst
andlast
name, as not all humans have two names! And some have more than two.)- An integer is good type for storing weight, if you are using a small unit (e.g. grams). Otherwise you might consider a float, but you’d need to be careful to format it properly (e.g.
\n{value:0.2f}
) when printing it out. It depends on the application.- This is a case where you should consider carefully the application, but
\nbool
is usally the wrong answer. Are you recording patient data? Is their expressed gender the correct variable or did you need sex? {% cite Miyagi_2021 %} goes into detail on this multifaceted issue in a medical research context. For example chromosomal sex is also more complicated and cannot be stored with a true/false value, as people with Kleinfelters exist. A string can be an ok choice here.- There is a limited vocabulary humans use to describe hair colour, so a string can be used, or a data type we haven’t discussed! An
\nenum
is anenumeration
, and when you have a limited set of values that are possible, you can use aenum
to double check that whatever value is being used (or read from a file, or entered by a user) matches one of the “approved” values.- A float is a good guess, but with floats come weird rounding issues. Often times people choose to use an integer storing the value in cents (or fractional cents, to whatever the desired precision is).
\n