## Module Content

Data analysis is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering and communicating useful information, informing conclusions, and supporting decision-making. Data analysis involves many overlapping areas of study, such as descriptive statistics (concerned with summarizing information in a sample), statistical inference (concerned with inferences about a population based on properties of a sample), machine learning (concerned with performing tasks using algorithms that rely on patterns and inference), data mining (sometimes considered a subfield of machine learning and concerned with discovering patterns in large data sets), information visualization (concerned with the visual representations of abstract data to reinforce human cognition), and several other areas. See this interesting review for comments about the related term data science.

Geometry is concerned with questions of shape, size, relative position of figures, isometry, and the properties of space. It underlies much of data analysis, as can be seen from textbooks such as those by [Kendall], [Le Roux], [Kirby], [Tossdorff], [Hartmann], [Outot], [Tierny], [Edelsbrunner], [Patrangenaru], [Biau], [Wichura], [Dryden]. The recent online textbook Mathematical Foundations for Data Analysis by Jeff M. Phillips again emphasizes the importance of a geometric understanding of techniques when applying them to data analysis.

This module focuses on some geometric methods used in data analysis. It covers the geometric and algorithmic aspects of these methods, as well as their implementation as Python code on Linux computers, and their application to a range of different types of data. The first half of the course emphasizes geometric aspects of classical techniques such as least squares fitting, principal component analysis, hierarchical clustering, nearest neighbour searching, and the Johnson-Lindenstrauss Theorem for dimensionality reduction. The second half of the course covers more recent techniques that have been developed over the last two or three decades, and emphasizes topological aspects as well as geometric aspects. The second half of the course makes use of R interfaces to Python Mapper and to efficient C++ routines for persistent homology.

Part I: Classical Techniques (5 ECTS)
• Least Squares Fitting
• Principal Component Analysis
• Hierarchical Clustering and Persistence
• Nearest Neighbours and the Johnson–Lindenstrauss Theorem
Part II: Topological Data Analysis (5 ECTS)
• Topological Preliminaries
• Mapper Clustering
• Persistent Homology
• Fundamental Group

## Module Coordinates

• Lecturer: Graham Ellis & Emil Sköldberg
• Lectures:
Fri 14.00pm, ES, ADB1019 (or IT206)
• Tutorials: Wednesday and Friday lectures will often take the format of a tutorial and so no formal tutorials are scheduled.
• Recomended text: Part I is based on chapters from the textbook Multivariate Analysis by Sir Maurice Kendall and chapters from the online textbook Mathematical Foundations for Data Analysis by Jeff M. Phillips . Part II is based on the survey An introduction to Topological Data Analysis: fundamental and practical aspects for data scientists by Frédéric Chazal and Bertrand Michel .
• Problem sheet: available here. (A list of exam-type problems for self-study is available here.)
• Module Website: Information and module documents will be posted to this site, which is linked from the Blackboard MA500 Geometric Foundations of Data Analysis pages. Blackboard will also be used for announcements and for posting grades.

## Module Assessment

Part I will be assessed by a 2-hour written exam (52%) and three continuous assessment assignments (16% each).

Part II will be assessed by a 2-hour written exam (50%) and two continuous assessment assignments (25% each).

Each exam will consist of four questions, with full marks for four correct answers.

Each assignment will consist of a data analysis problem that needs to be tackled using the Python programming language, and submitted (by email to both lecturers) as a PDF document.

None so far.

## Emil's Lecture Notes

These will be posted here.

## Exam Details

A guide to what to expect on the MA500 exams can be found here.