Introduction to Machine learning
Contributors
Questions
What is machine learning?
Why is it useful?
What are its different approaches?
Objectives
Provide the basics of machine learning and its variants.
Learn how to do classification using the training and test data.
Learn how to use Galaxy’s machine learning tools.
last_modification Published: Mar 21, 2025
last_modification Last Updated: Mar 21, 2025
Contents
- What is machine learning?
- Types of machine learning
- Techniques for
- Hyperparameter optimisation
- Learning and evaluation of models
- Various applications of machine learning
Machine learning
.pull-left[
- Learns patterns from data
- Comprises of different fields
- Linear algebra, statistics and probability
- Programming
- Data analysis
- Visualization
- Applicable to data from multiple fields - protein and DNA sequences, weather data, stock and house prices, images …
]
.pull-right[ ]
Variants of ML
.center[ ]
Classification
.pull-left[
- Supervised learning
- Learn/predict classes or targets
- Find decision boundary
- Linear and non-linear boundaries
- Algorithms are classifiers
- Examples
- Tumor or no tumor
- Rain or no rain
- … ]
.pull-right[ ]
Classification dataset
- Breast tumor dataset - Features and target
.center[ ]
Regression
.pull-left[
- Supervised learning
- Targets are real numbers
- Find fitting curve
- Linear or non-linear curves
- Algorithms are regressors
- Examples:
- Temperature forecast
- Stock/house prices
- …
]
.pull-right[ ]
Regression dataset
- Body fat dataset - features and target
.center[ ]
Hyperparameter optimisation
.pull-left[
- Grid search
- Random search
]
.pull-right[ ]
Learning and evaluation
.pull-left[
- K-fold cross-validation
- Dataset in K equal parts
- Part == fold
- Learn on training set
- Evaluate on validation set
]
.pull-right[ ]
Learning and evaluation
.pull-left[
- Training and test sets
- Learn on training set
- Evaluate on test set ]
.pull-right[ ]
Applications of machine learning
.pull-left[
- BioInformatics
- Protein structure prediction
- Drug response prediction
- Biological age prediction
- Biomedical image analysis
- …
- Computer vision/image recognition
- Natural language processing
- Speech recognition
- …
]
.pull-right[ ]
References
- Machine learning for everyone - hhttps://vas3k.com/blog/machine_learning/
- Breast cancer dataset - https://archive.ics.uci.edu/dataset/15/breast+cancer+wisconsin+original
- Body fat dataset - https://rstudio-pubs-static.s3.amazonaws.com/65314_c0d1e5696cdd4e93a3784ea67f9e3d34.html
For additional references, please see tutorial’s References section
- Galaxy Training Materials (training.galaxyproject.org)
Speaker Notes
- If you would like to learn more about Galaxy, there are a large number of tutorials available.
- These tutorials cover a wide range of scientific domains.
Getting Help
-
Help Forum (help.galaxyproject.org)
-
Gitter Chat
- Main Chat
- Galaxy Training Chat
- Many more channels (scientific domains, developers, admins)
Speaker Notes
- If you get stuck, there are ways to get help.
- You can ask your questions on the help forum.
- Or you can chat with the community on Gitter.
Join an event
- Many Galaxy events across the globe
- Event Horizon: galaxyproject.org/events
Speaker Notes
- There are frequent Galaxy events all around the world.
- You can find upcoming events on the Galaxy Event Horizon.
Key Points
- Machine learning algorithms learn features from data.
- It is used for multiple tasks such as classification, regression, clustering and so on.
- Multiple learning tasks can be performed using Galaxy's machine learning tools.
- For the classification and regression tasks, data is divided into training and test sets.
- Each sample/record in the training data has a category/class/label.
- A machine learning algorithm learns features from the training data and do predictions on the test data.
Thank you!
This material is the result of a collaborative work. Thanks to the Galaxy Training Network and all the contributors!