Books

Books : reviews

Colleen M. Farrelly, Yaé Ulrich Gaba.
The Shape of Data: geometry-based machine learning and data analysis in R.
No Starch Press. 2023

rating : 2.5 : great stuff
review : 17 July 2025

Whether you’re a mathematician, seasoned data scientist, or marketing professional, you’ll find The Shape of Data to be the perfect introduction to the critical interplay between the geometry of data structures and machine learning.

This book’s extensive collection of case studies (drawn from medicine, education, sociology, linguistics, and more) and gentle explanations of the math behind dozens of algorithms provide a comprehensive yet accessible look at how geometry shapes the algorithms that drive data analysis. In addition to gaining a deeper understanding of how to implement geometry-based algorithms with code, you’ll explore:

• Supervised and unsupervised learning algorithms and their application to network data analysis
• The way distance metrics and dimensionality reduction impact machine learning
• How to visualize, embed, and analyze survey and text data with topology-based algorithms
• New approaches to computational solutions, including distributed computing and quantum algorithms

This clearly written book provides an excellent introduction to a variety of aspects of Topological Data Analysis: using the mathematics of topology to analyse the "shape" of data sets, in order to gain insights into their structure and properties. Each chapter covers a specific concept, with the underlying principles clearly explained, and with small examples of their application, using R.

We get the move from graphs to simplicial complexes (a name much scarier than the underlying concept, like much in maths), how to construct and filter them, how to measure distance between points of high dimensional data, how to apply them to data analysis, and more.

There is not enough her to start applying the techniques in anger. But this is definitely an excellent place to start. Reading this has enabled me to build a good mental framework of the underlying concepts, and I’m now well-prepared to read a much weightier book.