Alessandra (Sandy) Cabassi

About

I recently completed my PhD and joined Google full time as a Data Scientist.

As a PhD student, I spent 3+ years at the MRC Biostatistics Unit of the University of Cambridge, where I was part of the Statistical Genomics (SOMX) and Precision Medicine (PREM) research groups. My PhD supervisor was Dr Paul Kirk.

During my PhD, I also spent three months working at The Alan Turing Institute, under the supervision of Dr Anthony Lee and Dr Ioannis Kosmidis, and six months at Google, hosted by Dr Matthew Pearce and Dr Georg Goerg.

Prior to starting my PhD, I completed a double degree in Engineering Mathematics at Politecnico di Milano and École Centrale de Nantes. I also had the chance to do my Master’s thesis at the Statistical Laboratory of the University of Cambridge, co-supervised by Dr Davide Pigoli and Prof Piercesare Secchi, and my undergraduate thesis at INRIA, where I worked with Dr Paola Goatin.

Research

My PhD research was concerned with the development of statistical methodology for the integration of multiple ‘omic datasets (e.g. genomic, transcriptomic, proteomic, etc.) in personalised medicine.

My goal was to tackle some of the challenges presented by the identification of relevant patient subgroups (e.g. patients that might be expected to respond similarly to treatments) on the basis of those datasets.

First, when combining different types of ‘omics datasets, it is crucial to take into account the different nature of each dataset. For this reason, I developed integrative clustering methods that explicitly weigh the contribution of each dataset to the final clustering according to the amount of information that it contains, and that allow to combine datasets of different type (e.g. continuous, categorical, etc.). These methods are based on the idea that the output of classical statistical techniques such as model-based Bayesian clustering can be used in combination with kernel methods from the machine learning literature to find a meaningful global clustering that summarises all the information available.

Second, because ‘omic datasets comprise measurements taken on a very large number of variables, many different patient subgroups can usually be identified, depending on which variables we include in our analysis. For this reason, I also worked on integrating genetic information with data on specific patient outcomes, to ensure that we identify truly relevant patient subgroups. To do so, I generalised the method above to the supervised case. A variational inference algorithm for outcome-guided model-based Bayesian clustering could be implemented as an alternative to that.

On a more applied note, I participated in a study on cardiovascular disease. My role in the project was to analyse data collected at the Cambridge Blood Donor Centre with the statistical methods mentioned above, to define a personalised cardiovascular disease risk score.

References:

Cabassi, A., Kirk, P. D. W., 2020. Multiple kernel learning for integrative consensus clustering of genomic datasets. Bioinformatics, btaa593. doi:10.1093/bioinformatics/btaa593.
Seyres, D., Cabassi, A., …, Frontini, M., 2020. Extreme phenotypes define epigenetic and metabolic signatures in cardiometabolic diseases. bioRxiv preprint, bioRxiv:2020.03.06.961805.
Cabassi, A., Seyres, D., Frontini, M., Kirk, P. D. W., 2020 Two-step penalised logistic regression for multi-omic data with an application to cardiometabolic syndrome. arXiv preprint, arXiv:2008.00235.
Cabassi, A., Richardson, S., Kirk, P. D. W. Kernel learning approaches for summarising and combining posterior similarity matrices. arXiv preprint, arXiv:2009.12852.

Previous research

High performance, large scale regression

During my internship at The Alan Turing Institute, I explored different methods and libraries to perform high-performance, large-scale regression on a supercomputer, with particular focus on Apache Spark and TensorFlow. The internship was funded by Cray Inc and carried out in close collaboration with the Cray EMEA Research Lab. You can find more details about our findings on the blog and the official webpage of the project.

Permutation tests for functional and network data

Cabassi A., Casa A., Fontana M., Russo M., Farcomeni A., 2018. Three Testing Perspectives on Connectome Data. In: Canale A., Durante D., Paci L., Scarpa B. (eds) Studies in Neural Data Science. Start Up Research 2017. Springer Proceedings in Mathematics & Statistics, vol 257. doi:10.1007/978-3-030-00039-4_3.
Cabassi, A., Pigoli, D., Secchi, P., Carter, P. A., 2017. Permutation tests for the equality of covariance operators of functional data with applications to evolutionary biology. Electron. J. Statist. 11, no. 2, 3815–3840. doi:10.1214/17-EJS1347.

Macroscopic traffic flow models

Cabassi, A., Goatin, P., 2013. Validation of traffic flow models on processed GPS data. Research report RR-8382, INRIA. hal-00876311.

858841_10202035531592558_1588519243_o — Team OPALE (now ACUMES), INRIA — Sophia Antipolis, France — Summer 2013.

Presentations

I presented my work at:

2021 ISBA World Meeting, virtual — 28 June – 2 July 2021 — Invited talk [slides]
Google’s PhD Intern Research Conference, Sunnyvale, USA — 31 July 2019 — Poster
Statistical Omics discussion group, Cambridge, UK — 24 April 2019 [slides]
ERCIM CMStatistics, Pisa, Italy — 14-16 December 2018 — Invited talk
Armitage week, Cambridge, UK — 14-15 November 2018 — PhD talk
The Alan Turing Institute, London, UK — 10 September 2018 — Joint talk with Junyang Wang [slides, recording]
Start Up Research, Palermo, Italy — 19 June 2018 — Joint talk with Alessandro Casa [slides]
ISNPS, Salerno, Italy — 11-15 June 2018 — Invited talk [slides]
University of Oslo, Norway — 1-2 March 2018 — Visiting
SMPGD, Montpellier, France — 11-12 January 2018 — Contributed talk
Biometrika workshop, Cambridge, UK — 10 November 2017 — Contributed talk
SMPGD, London, UK — 12-13 January 2017 — Poster

Laurea — Master’s thesis defence — Milan, Italy — July 2016.
Photo by Paolo Cabassi.

Events

Past events

Start Up Research was a collaborative project organised by y-SIS on Statistics for the Neurosciences that brought together 28 other young academics and seven professors from some of the most prestigious universities worldwide. The resulting research was subsequently published in a Springer volume entitled “Studies in Neural Data Science”. To learn more about the event, have a look at this article on the statistics magazine Significance.

: Start Up Research — Certosa di Pontignano, Italy — 25-27 June 2017.
From right to left: Alessandro Casa, Massimiliano Russo, me, and Matteo Fontana.

Stats Under the Stars is a statistical hackaton organised by the Italian Statistical Society. I took part in the competition with the Celtic team in 2016, when we won the prize for the best report, and in 2017.

: Celtic team, SUS2 — Vietri sul Mare, Italy — 7-8 June 2016.
From right to left: Anna Calissano, Giorgio Paulon, Tobia Boschi, Jacopo Di Iorio, and me.

More recently, I was selected for the Women As Tech Leaders Day at QuantumBlack, that took place on 16 November 2018 in their London office.

Societies

Mentoring

I have been a mentor at Lead The Future, a nonprofit providing 1:1 mentoring to high-potential Italian students, since 2019. I have been mentoring students at top European and American universities and research institutes. Forbes talked about Lead The Future in this article (in Italian).

I also mentor students through Central Nantes Alumni and A.I.M. (details below).

Scientific societies

SIS (Società Italiana di Statistica), the Italian society for promoting the development of Statistical sciences and y-SIS (young-SIS), the sub-group of SIS dedicated to young researchers;
ISNPS (International Society of Nonparametric Statistics);
ISBA (International Society for Bayesian Analysis);
AISUK (Association of Italian Scientists in the UK).

A.I.M.

pupetto_lavagna — Illustration by Marta Muscelli for A.I.M.

A.I.M. (Associazione Ingegneri Matematici) is a student society with the goal of promoting the concept of Engineering Mathematics and creating a network between students and alumni from Politecnico di Milano, to facilitate the exchange of ideas, opinions and advice about the academic and professional life. Find out more about what we do with this amazing presentation!

I was the president of the committee in 2014 and continued contributing to the Association ever since.

20140514-190540-Giorgio PAULON — A.I.M. Online party — Milan, Italy — May 2014.
Photo by Giorgio Paulon.