
Courses
Courses
Choosing a course is one of the most important decisions you'll ever make! View our courses and see what our students and lecturers have to say about the courses you are interested in at the links below.

University Life
University Life
Each year more than 4,000 choose NUI Galway as their University of choice. Find out what life at NUI Galway is all about here.

About NUI Galway
About NUI Galway
Since 1845, NUI Galway has been sharing the highest quality teaching and research with Ireland and the world. Find out what makes our University so special â€“ from our distinguished history to the latest news and campus developments.

Colleges & Schools
Colleges & Schools
NUI Galway has earned international recognition as a researchled university with a commitment to top quality teaching across a range of key areas of expertise.

Research
Research
NUI Galwayâ€™s vibrant research community take on some of the most pressing challenges of our times.

Business & Industry
Guiding Breakthrough Research at NUI Galway
We explore and facilitate commercial opportunities for the research community at NUI Galway, as well as facilitating industry partnership.

Alumni, Friends & Supporters
Alumni, Friends & Supporters
There are over 90,000 NUI Galway graduates Worldwide, connect with us and tap into the online community.

Community Engagement
Community Engagement
At NUI Galway, we believe that the best learning takes place when you apply what you learn in a real world context. That's why many of our courses include work placements or community projects.
CS4103
Geometric Foundations of Data Analysis II
Module Content
Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering and communicating useful information, informing conclusions, and supporting decisionmaking. Data analysis involves many overlapping areas of study, such as descriptive statistics (concerned with summarizing information in a sample), statistical inference (concerned with inferences about a population based on properties of a sample), machine learning (concerned with performing tasks using algorithms that rely on patterns and inference), data mining (sometimes considered a subfield of machine learning and concerned with discovering patterns in large data sets), information visualization (concerned with the visual representations of abstract data to reinforce human cognition), and several other areas. See this interesting review for comments about the related term data science.Geometry is concerned with questions of shape, size, relative position of figures, isometry, and the properties of space. It underlies much of data analysis, as can be seen from textbooks such as those by [Kendall], [Le Roux], [Kirby], [Tossdorff], [Hartmann], [Outot], [Tierny], [Edelsbrunner], [Patrangenaru], [Biau], [Wichura], [Dryden]. The recent online textbook Mathematical Foundations for Data Analysis by Jeff M. Phillips again emphasizes the importance of a geometric understanding of techniques when applying them to data analysis.
This module focuses on some geometric methods used in data analysis. It covers the geometric and algorithmic aspects of these methods, as well as their implementation as Python code on Linux computers, and their application to a range of different types of data. The first half of the course emphasizes geometric aspects of classical techniques such as least squares fitting, principal component analysis, hierarchical clustering, nearest neighbour searching, and the JohnsonLindenstrauss Theorem for dimensionality reduction. The second half of the course covers more recent techniques that have been developed over the last two or three decades, and emphasizes topological aspects as well as geometric aspects. The second half of the course makes use of R interfaces to Python Mapper and to efficient C++ routines for persistent homology.
Part I = CS4102: Classical Techniques (5 ECTS, first 24 lectures)
 Least Squares Fitting
 Principal Component Analysis
 Hierarchical Clustering and Persistence
 Nearest Neighbours and the Johnsonâ€“Lindenstrauss Theorem
 Topological Preliminaries
 Mapper Clustering
 Persistent Homology
Module Coordinates
 Lecturer: Graham Ellis & Emil Sköldberg
 Lectures:
Mon 10.00am, GE, ADB1020
Tue 12.00m, GE, ADB1020
Wed 10.00am, ES, ADB1019 (or ADB1020)
Fri 14.00pm, ES, ADB1019 (or IT206)  Tutorials: Wednesday and Friday lectures will often take the format of a tutorial and so no formal tutorials are scheduled.
 Recomended text: Part I is based on chapters from the textbook Multivariate Analysis by Sir Maurice Kendall and chapters from the online textbook Mathematical Foundations for Data Analysis by Jeff M. Phillips . Part II is based on the survey An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists by Frédéric Chazal and Bertrand Michel, and on the recent book Topological Data Analysis for Genomics and Evolution Topology in Biology by Rabadán and Blumberg.
 Problem sheet: available here. (A list of examtype problems for selfstudy is available here.)
 Module Website: Information and module documents will be posted to this site, which is linked from the Blackboard CS4103
Geometric Foundations of Data Analysis II pages. Blackboard will also be used for announcements and for posting grades.
Module Assessment
Part I will be assessed by a 2hour written exam (52%) and three continuous assessment assignments (16% each).
Part II will be assessed by a 2hour written exam (50%) and two continuous assessment assignments (25% each).
Each exam will consist of four questions, with full marks for four correct answers.
Each assignment will consist of a data analysis problem that needs to be tackled using the Python programming language or the R programming environment, and submitted (by email to both lecturers) as a PDF document.
Supplementary Material and News
Emil's Lecture Notes
These will be posted here.Exam Details
Exam hints/help for MA500/CS4102/CS4103 exams can be found here and here.Lecture Notes
Lecture Notes (Click number to download notes for the lecture.) 
Lecture
Summaries 
1 
Video of the lecture. Talked about "dimension reduction" and, as an example, the paper "Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival" by Nicolau, Levine and Carlsson. Then gave the definition and examples of a simplicial complex. 
2 
Video of the lecture. Explained what is meant by the geometric realization K of a simplicial complex K. Then explained how, from any n×n matrix of pairwise distances between n objects, and any ε>0, we can construct a geometric realization K_{ε} of a simplicial complex. We considered a sample S of 250 readings x_i=(h_{i,0}, h_{i,2}, h_{i,4}) of water levels at Galway harbour, where h_{i,0} is the initial level when person i reaches the harbour, h_{i,2} is the level 2 hours later, and h_{i,4} is the level 4 hours after the initial reading. We constructed a 250x250 matrix of distances d_{i,j}=x_{i}x_{j} and used this to construct a simplicial complex K_{ε} for various thresholds ε>0. The graph of one of these simplicial complexes was as follows. 
3 
Video of the lecture. Began with 750 points randomly selected from two disjoint quarters of a torus in R^{3}. Any linear transformation R^{3} > R^{2} (obtained say from PCA or the JohnsonLindenstrauss theorem) would lose geometric information. However, we saw that the geometric information seems to be retained when the points are mapped to the vertices of the clique complex K_{ε} for various values of ε. Introduced the notion of homotopy between two maps f,g:X>Y. Introduced the notion of homotopy equivalence between two topological spaces. Proved that the circle is homotopy equivalence to the projective plane minus the origin. 
4 
Video of the lecture. Stated and illustrated Leray's Nerve Theorem. This provides theoretical motivation for studying clique complexes in data analysis. 
5 
Video of the lecture. Explained how Leray's nerve theorem motivates the use of the clique complex K_{ε} in data analysis. Next lecture we'll see how these ideas lead to the Mapper clustering procedure for representing a matrix of distances between data points as a simplicial complex. The procedure was illustrated in this lecture by taking 1000 points in the plane, sampled at random from the image of the starfish. The following code produced the following 1dimensional simplicial complex as a representation of the data.
gap> HapExample("1.3.5");

6 
Video of the lecture. Described the Mapper Algorithm for visualizing data. 
7 
Video of the lecture. Recapped what we've done so far in topological data analysis, and then defined the nth Betti number of a simplicial complex. Ended up with an example in which I calculated the 0th and 1st Betti numbers of a simplicial complex. 
8 
Video of the lecture. Introduced, in an informal way, the notion of persistence of 1dimensional holes in a data set, and explained how this persistence can be represented using barcodes. Ended up with the definition of the degree n homology vector space H_{n}(K) of a simplicial complex K. For finite K this vector space has finite dimension equal to the Betti number β_{n}(K). 
9 
Video of the lecture. Started with a computer illustration of how persistent homology can be used to place some topological structure on a data set which might help an expert in the application domain interpret the data. Then gave the precise definition of persistent homology and persistent Betti numbers. Finished by stating that the computation of persistent Betti numbers boils down to nothing more than column reduction of a matrix to semiechelon form. Warning: I think I kept saying "row reduction" and "row echelon form" at the end of the lecture where I meant to say "column reduction" and "column echelon form". Apologies for that! 
10 
Video of the lecture. Went through an example of the use of persistent homology to analyze the space of natural images. Finished off with a statement of a stability theorem. A rough, nonmathematical, statement is: if data sets are changed just a small amount then the resulting barcodes only change a small amount. 
11 
Tutorial session 
12 

13 

14 

15 

16 

17 

18 

19 

20 

21 

22 

23 

24 