Introduction to Data Driven Life Sciences
purlPURL: https://gxy.io/GTN:P00017Comment: What is a Learning Pathway?We recommend you follow the tutorials in the order presented on this page. They have been selected to fit together and build up your knowledge step by step. If a lesson has both slides and a tutorial, we recommend you start with the slides, then proceed with the tutorial.
This learning path starts with the history of biology and takes you on a journey through fundamental data analysis techniques and their applications.
Module 1: History
Knowing history is essential for understanding how we arrived to the current state of affairs in our field
Time estimation: 1 hour
Learning Objectives
- Have a basic understanding of history of biology from Darwin to today.
Lesson | Slides | Hands-on | Recordings |
---|
Module 2: Data Processing Tooling
Before jumping to Biology we need to review basic data processing machinery
Time estimation: 13 hours
Learning Objectives
- Explain how the shell relates to the keyboard, the screen, the operating system, and users' programs.
- Explain when and why command-line interfaces should be used instead of graphical interfaces.
- Explain the similarities and differences between a file and a directory.
- Translate an absolute path into a relative path and vice versa.
- Construct absolute and relative paths that identify specific files and directories.
- Use options and arguments to change the behaviour of a shell command.
- Demonstrate the use of tab completion and explain its advantages.
- Create a directory hierarchy that matches a given diagram.
- Create files in that hierarchy using an editor or by copying and renaming existing files.
- Delete, copy and move specified files and/or directories.
- Redirect a command's output to a file.
- Process a file instead of keyboard input using redirection.
- Construct command pipelines with two or more stages.
- Explain what usually happens if a program or pipeline isn't given any input to process.
- Explain Unix's 'small pieces, loosely joined' philosophy.
- Write a loop that applies one or more commands separately to each file in a set of files.
- Trace the values taken on by a loop variable during execution of the loop.
- Explain the difference between a variable's name and its value.
- Explain why spaces and some punctuation characters shouldn't be used in file names.
- Demonstrate how to see what commands have recently been executed.
- Re-run recently executed commands without retyping them.
- Use `grep` to select lines from text files that match simple patterns.
- Use `find` to find files and directories whose names match simple patterns.
- Use the output of one command as the command-line argument(s) to another command.
- Explain what is meant by 'text' and 'binary' files, and why many common tools don't handle the latter well.
- Learn the fundamentals of programming in Python
- Have a basic understanding of the history of sequencing
- Understand Python basics
- Understand manipulation of FASTQ data in Python
- Understand quality metrics
- Understanding of lists and dictionaries
- Learning about dynamic programming
- Learning about how to translate DNA in Python
- Understand manipulation of files in Python
- Understand data manipulation in Pandas
Lesson | Slides | Hands-on | Recordings |
---|---|---|---|
CLI basics | |||
Advanced CLI in Galaxy | |||
Introduction to Python
|
Editorial Board
This material is reviewed by our Editorial Board:
Anton Nekrutenko