Daniel West
Using self-organising maps to cluster complex biological data
MSc thesis, University of York, 2021


Cancer is a common disease during the modern age which requires accurate detection and prediction of its development. Prostate cancer is an interesting form as it is rarely fatal, yet requires surgical excision to remove, which itself may have adverse effects. Therefore, it is important to assess correctly each patient to minimise risk from cancer progression and from treatment side effects. Raman spectroscopy is an analytical technique which has gained interest in the analysis of biological specimens, as it is a robust technique which produces distinct molecular signals which can be used to identify biomolecules. The sheer volume and dimensionality of spectral data necessitates computational analysis: this work covers the use of self-organising maps for investigating such data. Self-organising maps are a machine learning technique which spot patterns and reduce dimensionality in high dimensional datasets in an unsupervised manner. Their use can help to discern clusters within the dataset which may not be readily apparent. The use of self-organising maps to analyse Raman spectral data from human cell samples is an underexplored area of research. This work forms a feasibility study for the use of self-organising maps for such an application, and shows that they are able to correctly cluster cancer and non-cancer samples from a blinded dataset with optimum parameters. Moreover, the optimised SOM shows delineation into three clusters, one of normal prostate data and two of prostate cancer data. Analysis of these clusters shows spectral differences related to lipid composition, an observation which has been linked to more aggressive cancer progression.

Full thesis