Skip to main content
NC3Rs | 20 Years: Pioneering Better Science
Strategic grant

DAR3T: Data Analysis of 3Rs Tools

""

At a glance

In progress
Award date
October 2024 - April 2025
Grant amount
£562,442
Principal investigator
Professor Crispin Miller

Co-investigator(s)

Institute
University of Glasgow

R

  • Replacement

Overview

The award will allow Crispin to launch a computational platform that supports large-scale cancer model data analysis, integrating data from various model systems including animal, in vitro and in silico approaches, helping oncology researchers select the most appropriate non-animal models to use that best represent patient cohorts and phenotypes.

This award was made as part of the 2024 non-animal methods infrastructure grants supported with funding from the Department for Science, Innovation and Technology (DSIT).

Application abstract

Technological advances including high throughput sequencing, imaging and spatial technologies, have led to the establishment of large public-domain cancer data collections, derived both from human tumours and pre-clinical models.

These datasets provide a significant opportunity to:

  • Reduce the use of animal models by referencing existing data,
  • to develop computational platforms that support Replacement, by identifying which models best represent which patient cohorts,
  • and to use data tools to Refine models by generating new insights and in silico models.

Our group has been using artificial intelligence and machine learning to stratify cancer patients into subsets according to the molecular and phenotypic characteristics of their tumours, and then to ‘disease position’ existing preclinical datasets against patient subgroups. This leads to better forward- and back-translation between mechanism, models, and clinic. Importantly, preliminary data positioning models from the Sequence Read archive (SRA), shows that Genetically Engineered Mouse Models (GEMMs), cell lines, single cell RNA-seq, and organoid collections are all amenable to these analyses, and our expectation is that the same will be true for other modalities including spatial transcriptomics data. These approaches, which require, amongst other things, the systematic re-processing, normalisation, dimensionality reduction, modelling, and clustering of multimodal datasets are computationally intense.

This work has also identifed both the varying levels of annotation, and the non-uniformity of the metadata used to describe samples in existing archiving resources, underlining the need to establish curated collections built with common metadata standards and uniform data dictionaries. In silico modelling of human tissue also shows great promise, particularly in terms of refinement, but as with in vivo and in vitro models, they must be effectively disease positioned against the models and patients they intend to represent. This requires high quality granular data (including metadata).

The goal of this infrastructure grant is to provide a computational platform to support disease positioning and modelling, at scale, of cancer models data, with a focus on in silico, organoid, and cell line collections. Supported outputs will include the establishment of a well-curated, Findable Accessible Interoperable and Reusable (FAIR) collection of non-animal methods data, aligned to existing GEMM models and human tumours. By doing this we will promote the 3Rs, not only by demonstrating which non-animal models accurately recapitulate human disease, but also by making their data available in a form that supports the development of new in silico models, and downstream analysis by bioinformaticians, computational biologists and machine learners.