Metrics that matter: quality over quantity
Introduction
Let us start by saying something obvious: models that predict effects in humans really, really matter for drug discovery. After all, that’s the whole point of the industry. Yet, we’re still not very good at it.
This should by no means be controversial. Drug discovery is an exceedingly expensive and risky proposition. Estimates vary on the actual numbers, but at a minimum, we’re talking about tens to hundreds of millions of dollars to take a de novo therapeutic intervention to market. A big part of the reason is that we are not very efficient at figuring out whether a drug will (or won’t) work before putting it in humans. Countless hypotheses enter the funnel, and few make it out the other end, ready to be tested in a clinical setting. For those that do pass the gauntlet of preclinical testing, still, less than 10% of candidates make it to market — not to mention the false negatives that got screened out prior to making it this far. Flaws with existing animal and cell models are a known problem that’s been discussed for decades. And yet, we think the magnitude of this problem – and the opportunities for those who solve it – still remain under-appreciated today.
Quantifying the importance of predictive validity
In a recent article, “Predictive validity in drug discovery: what it is, why it matters and how to improve it,” Jack Scannell et al. describe the influence of predictive validity on R&D productivity in the context of drug discovery. They defined predictive validity as “the degree to which the ordering of measures from a decision tool would match, across a population of therapeutic candidates, the ordering in terms of clinical utility in people…[the authors] operationalize predictive validity as the notional Pearson correlation coefficient between the decision tool output and the relevant measure of clinical utility.” There are many simplifying assumptions in the analysis, but it provides a rigorous perspective that gives us a sense of the order of magnitude of the problem we’re discussing. The bottom line is that the quality of models matters, and likely more than you would expect.
Small changes in the predictive validity of decision tools used in the R&D process can translate to significant differences in the NPV of a program; for example, by the authors’ calculations, a 0.1 change in predictive validity drives almost $100M in NPV. The authors also compare this to just increasing the number of candidates tested, and the outcome is somewhat surprising. You would need 40x more compounds to get to the same positive predictive value as increasing the predictive validity by 0.1. Again, the numbers here are likely wrong but directionally right; quality matters significantly, and investments here (in technologies, process, and culture) are worth it.
Scannell et al. describe successful drug discovery as “finding oases of safety and efficacy in chemical and biological deserts.” Put another way, when wading through the desert blind, it is helpful to have tools that help you discern an oasis from a mirage.
It turns out the importance of predictive model systems has been less obvious in practice than in theory
The decline in pharma R&D productivity between about 1950 and 2010 is well-documented and, as the authors rightly point out, striking given the transformative technical advances also made during that time. The authors propose a few hypotheses for what’s behind this decline in productivity:
First, we have often lost sight of the importance of data quality in pursuit of higher quantity. The explosion of high-throughput screening technologies in the late ‘80s permeated the entire industry and became a mainstay of many discovery programs. In a new flavor over the past few years, we’ve seen increasing pushes to quantify platform metrics (e.g., dataset sizes, perturbations screened). And yet, in all the talk about large combinatorial chemistry libraries or massively parallel perturbational experiments, for example, we think a few aspects often get lost — how much conviction you have in your targets, how confident you are in your therapeutic approach, and how much trust you have in your preclinical models. As Scannell et al. so aptly point out (note, the whole article is a fantastic read and well worth it), it is frustrating that we’ve such an explosion of discovery tools and novel therapeutic modalities, yet marginal to no improvement in the efficiency of the discovery process.
Second, some of the most predictive models have been victims of their own success, and some of the least have remained in use for much longer. For example, Scannell and Bosley hypothesize that the best assays and disease models identified good drugs which ultimately became generic and raised the commercial barriers to R&D (e.g. models of stomach acid secretion and the effective proton pump inhibitors identified as a result). In contrast, poor assays and models remain in use until we find solutions for those relevant diseases.
Third, it is much harder to capture value by developing more predictive model systems or assays than by developing assets.
As a result, solving this problem - and transforming the productivity of pharmaceutical R&D - will require technical innovation, commercial creativity, and a thoughtful approach to process and culture. We discuss each of these in turn.
Technical innovation
From a technical perspective, how might we go about improving the odds with which we discover and develop high-quality agents? We tend to think about three main categories:
Understanding human disease biology: developing a biological hypothesis that, if targeted, drives a therapeutic effect
Decision tools (credit terminology to Jack Scannell et al): stringing together relevant analyses and assays to better predict whether a target and intervention are likely to work
Engineering new medicines: while most of this post focuses on ways to get better at predicting whether a drug will work, we can also improve the odds by designing drugs in ways we know gives them a higher likelihood of success.
Understanding human disease biology
Disease biology in humans is often too complicated to understand perfectly - but understanding biology well enough to be confident in the targets you’re going after really, really improves your chances of success. To do this, there is no substitute for real human data.
Human genetics is a great starting point for target validation but not a silver bullet
Human genetics provides a rich starting point. It has been well-documented for a few years now that drug targets with genetic support are twice as likely to be successful in phases II and III. In 2021, 33 of the 50 FDA approved drugs had human genetic evidence support. In simple cases, a patient may have a loss of function variant of a particular gene that causes a disease. Inherited retinal diseases (like biallelic RPE65-mediated inherited retinal disease, now treated by LUXTURNA) provide an example where simply replacing the function of the missing gene restores the condition to baseline. We’ve similarly seen a strong focus on genetics in targeted oncology. In a broader sense, human genetics can be thought of as natural experiments to derive causal relationships between targets and outcomes. Mendelian randomization has generated genetic diversity across the human population; by interrogating the effects of variants across populations - as in genome-wide association studies — it is possible to estimate dose-response curves purely from an observational vantage point.
Leveraging human genetics for target validation is far from a silver bullet. It is hard to collect the depth of phenotypic data required to determine causality for complex diseases. There are numerous examples speaking to the power of observational research and human experiments, for example, the work undertaken by companies like Character Biosciences (more of our thoughts on the subject here). Also, gene-drug pairs need significant triage and have limitations; these are covered well in the article by Plenge et al., “Validating therapeutic targets through human genetics.” Still, human genetics provide strong starting evidence and can significantly increase your conviction in the targets you’re pursuing.
Deeper multi-omic profiling and novel validated ex vivo / in vitro systems
Beyond genetics, there are additional tools that can be used to characterize detailed elements of human biology. Over the past decade, our tools for dissecting these features have improved dramatically. We now have a wide range of high-throughput omics and imaging technologies that yield insights into resolutions from a single molecule to entire multicellular systems. Grounded in human samples, these can drive insights for therapeutic development.
An example in the context of cell therapies comes from the work of Cartography Biosciences. Cell therapies are an extremely powerful modality that have come to market in recent years; however, one of the main drawbacks has been apparent (and sometimes deadly) toxicity associated with the treatment. Through academic work in Ansu Satpathy’s lab, the team discovered a foundational insight that CD19 (the most commonly targeted antigen in approved cell therapies) is expressed in brain mural cells. This drives significant neurotoxicity for a subset of patients who receive these treatments. Cartography was born to find novel and safer antigens to target. This approach is made possible by new single-cell technologies and directly profiling human samples.
Additionally, deep profiling and novel in vitro or ex vivo systems often go hand-in-hand. Ochre Bio, for example, is taking the concept of deep phenotyping and human-relevant model systems to the next level. They work in the liver and broader metabolic diseases like NASH, where the existing animal models are poor, and we’ve seen consistent clinical failure. Their full pipeline includes deep molecular profiling (genomics, transcriptomics, single cell, and spatial technologies) of human samples and novel ex vivo model systems. One of the ex vivo systems is discarded livers that were too diseased to donate. They hook these up to perfusion machines that can run for days and use them to test potential drug candidates. This is a truly creative solution for generating human disease-relevant model systems; in the spirit of this post, we’d love to see the data on how they characterize and validate these systems.
Unfortunately for science, testing hypotheses directly on humans is not usually possible. We’ve leveraged cell and animal models for decades, but as we’ve seen with the clinical failure rate, many of our existing models don’t do a good enough job of representing the physiological conditions seen in humans. Cell lines, for example, have shown clonal variation and genetic instability that drive differential sensitivity to treatments. We need better model systems that readily pair biological realism with scale and reproducibility.
In the context of complex models, we’ll discuss two examples we’re excited about: organoids and organ-on-chips.
Organoids are 3D, self-assembling, human stem-cell-derived culture systems. In specific cases, these have the potential to be more realistic substrates and are significantly more scalable than animal models. Many biological phenomena are unique to humans and need to be studied in a human context. We are excited by the potential of these systems, but for discovery organizations leveraging these technologies, robust characterization is needed to ensure these models have high predictive validity. One example we particularly like comes from this work by Raghavan et al., Microenvironment drives cell state, plasticity, and drug response in pancreatic cancer. The authors leveraged single-cell RNA sequencing to characterize cell states found in patient samples and then used that phenotype as a reference for characterizing matched organoid models. In PDAC, the microenvironment supports cell states that range from basal to classical states; the population of cell states present drives sensitivity to therapy. Not surprisingly, the culture conditions influence the dynamics of the model system. Importantly here, the authors characterized the system across different conditions, compared the ex vivo model with human tissue samples, and thus were able to tease apart cell-intrinsic vs. TME-induced contributions to the malignant cell state. What they found was the “remarkable phenotypic and functional plasticity inherent in tumor models.” This is a perfect representation of the type of work that is required to improve our understanding and treatment of human disease. Leveraging human samples and characterizing genetics is not enough. Deep characterization is required to ensure insights will translate to a clinical setting.
Organ-on-chips are microfabricated cell culture devices designed to mimic the functional units of human organs and are a complexity step beyond organoids. Compared to organoids, these approaches offer a number of advantages: they can be vascularized; they enable biophysical cues that are often relevant for cell maturation; they allow for better control of nutrient supply to the entire system, and they enable the addition of immune cells. Some of these characteristics are included in next-generation organoids but are largely absent from most models. Organ-on-chips thus hold significant potential for mimicking complex physiological systems in a way no other ex vivo system can. A review by Don Ingber, one of the founders of this field, provides a great overview of a number of ways in which these systems can be used. Additionally, Organoids-on-a-chip, co-authored by another founder of the field, Dan Huh, provides a great perspective on where organoids and organs-on-chips can come together synergistically.
One specific example of the power of organ-chips comes from Emulate, built on some of the work out of Don Ingber’s lab. Preclinical models often miss toxicity signals that only become apparent in clinical trials, wasting precious time and resources. To understand how well the chip predicts observed toxicity, the team used the Emulate Liver-Chip to predict toxicity for a panel of 27 drugs with known toxicity profiles. Leveraging the chip alone, they were able to correctly predict those with hepatotoxic profiles with a sensitivity of 87% and a specificity of 100%. Understanding that a quarter of drugs fail in the clinic due to toxicity (though broader than just liver toxicity), you can begin to see how these technologies are so powerful.
However, despite the promise of organ-chips, there remain significant hurdles to overcome. We need more validation studies that outline the appropriate contexts in which to use these, and we need to improve the usability beyond the academic lab.
Making good decisions (about which model system to use and then which candidates to advance)
Model systems are helpful because they help us make decisions - for example, decide the likelihood that a particular candidate will lead to a useful human drug and prioritize R&D investments accordingly. Scannell et al. consider model systems as part of a larger set of “decision tools” - all the things that are used to help decide to optimize and advance some therapeutic candidates and to abandon others. A decision tool could be a test in a disease model or the gut feel of an experienced scientist.
As we discussed earlier, small changes in the predictive validity of decision tools can have large effects on the likelihood of success. To drive meaningful characterization of these tools, Scannell et al. highlight four criteria for evaluation:
Biological recapitulation: To what extent does the decision tool resemble the human clinical state in terms of epidemiology, symptoms and natural history, genetics, biochemistry, etiology, histology, biomarkers, and response to known human pharmacology (including positive and negative controls)?
Tests and endpoints: To what extent is the experimental protocol similar to the likely clinical treatment regimen; does drug dosing and tissue exposure match the likely clinical situation; are the endpoints used in the preclinical studies translatable to the likely clinical endpoints; are the methods used to measure preclinical endpoints comparable to the likely clinical measures; is there confidence in the go/no-go thresholds that we will apply to the measures that the decision tool yields?
Experimental and statistical hygiene:
To what extent is testing implemented with animals derived from trusted sources, confirmed genotype, randomized and blinded animal allocation and assessment, and appropriate sample size?
To what extent are the results repeatable, consistent with historical results derived from the same animal strain, and robust to modest changes in experimental conditions (for example, animal strain)?
To what extent is the statistical treatment pre-planned, methodologically appropriate, sufficiently powerful, and considering false discovery rates?
Domains of validity: For which disease states, treatment regimens, and clinical endpoints is decision tool output likely to correlate well with drug performance in people?
A few important points stand out here. First, with biological recapitulation, complexity is not always the right answer. What matters is a deep characterization of the system and an understanding of the causal biology that is desired. For example:
On the simple end of the spectrum, a cell-based assay may be the best method of interrogating a ligand-target interaction. Even here, though, characterization is important. Traditional biochemical assays often have the limitation of not being an accurate representation of engagement in vivo. Instead, cell-based target engagement assays are a step forward wherein the target can operate and interact in the cellular milieu in much the same way as in vivo. Through single particle tracking, Eikon Therapeutics is able to observe the complex dynamics in live cells. Promega’s NanoBRET assays present another useful tool for studying these interactions in a cellular context. Similarly, Think Bioscience and SyntheX use synthetic biology to develop different but related assays to obtain cell-based readouts.
In others, a more complex organ-on-chip may be the right answer. We discussed the toxicity example above. Another example might be studying cell mobility and extravasation in response to a particular stimulus. These functions are not as easily interrogated with other tools.
Note there are many ways that complex models need to improve to be leveraged broadly, but certainly, they show significant potential to improve the efficiency and effectiveness of the industrial drug discovery process. An area of great collaboration over the past few years has been the work of the IQ Microphysiological Systems Affiliate. They are a group of industry and academic researchers that are doing the hard work of defining the requirements of these systems for studying particular physiologies. These “contexts of use” are the domains of validity for a particular complex in vitro microphysiological system (which includes organoids, organ-on-chips, and other related systems).
As for experimental and statistical hygiene, we see automation playing a major role. It’s been discussed previously in the context of organ-on-chips, but we need higher-throughput systems capable of practically implementing statistically significant studies with many replicates. For the most part, the systems today are much too slow, manual, and low-throughput to do much beyond very late-stage preclinical validations, which eliminates the prospect of improving predictive validity much earlier in the discovery process.
In all cases, what is most important is tying together a holistic discovery and development platform that clearly articulates the value and limitations of each model system. One could imagine a system that starts with human genetics to identify high-conviction targets, modulating those targets in a few uncorrelated model systems (with defined complexity based on the specifics of the disease biology of interest), moving to more simple assays for triaging hits that demonstrate the relevant effect, and then ratcheting complexity back up to ensure the drug functions as it should in as near a human environment as possible. Ideally, you can do this with a long list of interventions in a cost-effective manner. As we’ve said, quality beats quantity; but quality with quantity wins out.
Engineering new medicines
While most of this post focuses on ways to get better at predicting whether a drug will work, we can also improve the odds by designing drugs in ways we know gives them a higher likelihood of success. This is much too short of a post to detail all the interventions that we’re excited about, but we wanted to briefly touch on two approaches we’re excited to see more of.
First, leveraging the decision-theoretic framework, we can increase the likelihood of success if we increase the quality of starting chemical matter. As such, we appreciate approaches that wade in pools of advantaged chemical matter, such as leveraging natural products. These are compounds that have been shaped by evolutionary processes and exist because they have a biological function. As a class, they have made significant contributions to the treatment of a number of diseases like cancer and infectious diseases. However, their utilization in pharma has slowly declined over the past two decades due to synthesis and deconvolution challenges with conventional assays. On the back of new technologies, companies like Enveda Biosciences are pioneering the new world of natural products drug discovery (and we recently wrote about our excitement in the context of our work with Think Bio). We look forward to seeing folks pairing advantaged chemical matter with the right assays and encourage anyone working on this problem to reach out.
Second, toxicity remains a challenging hurdle to test for in existing models, and unique methods for avoiding off-target activity can be powerful. For example, we’ve seen a resurgence in recent years in antibody-drug conjugates that are built on the premise that specific delivery of a potent agent can expand the therapeutic window, including Merck’s potential (though still in flux) $40B acquisition of Seagen. However, as Derek Lowe points out, ADCs are often not as straightforward as they seem. In another approach, Good Therapeutics (and spinout Bonum Therapeutics) built a platform for developing conditionally active biologics that are only active in proper context; this led to a merger with Roche worth $250M in cash and significantly more in downstream milestones. We imagine that those who come up with creative ways to solve toxicity for well-known targets have a meaningful shot at success in the clinic, and we look forward to seeing more folks innovate in the space.
Commercial creativity and new forms of collaboration
It should be clear by now that relevant model systems matter a lot in determining the success of a discovery program. And yet, as an investment community, we make it clear that we prefer to invest in drug companies over tools companies. As Scannell et al. point out, “the market capitalization of the top ten global contract research organizations was around 7.5% of the market capitalization of the top ten global pharmaceutical companies; investors expect companies that produce or acquire novel chemical matter to be able to capture far larger future profits than companies that supply the services that help decide whether the chemical matter is useful or not.”
So, many of the companies we see with novel model systems decide to vertically integrate. They intentionally decided to try to capture more of the economic value that their tool provides. However, many of these models are good for a single part of the preclinical pipeline, thus requiring companies to make large additional investments to get assets to market. In some cases, with a unique platform technology applicable across a wide variety of models, we believe there are paths to building large, venture-backable businesses without fully vertically integrating - it just takes some creativity. For example:
Capturing more value in partnerships - There are newer methods of engagement that enable more value capture for companies selling tools and services to biotech and pharma. The work of Alloy Therapeutics provides a good example. Twist is attempting a similar model with its antibody discovery services. In both cases, they license tools in exchange for some of the downstream economics associated with assets discovered using the platforms. Admittedly, this is easier for tools that directly touch the IP associated with the therapeutic asset. For tools like novel model systems, there aren’t many good examples. We would argue, however, that given the importance of these systems, it would be in companies’ best interest to exchange value in an asset for better predictive model systems.
Complementing equity with other types of financing - With revenue, early-stage companies can access financing with a different cost of capital. Equity financings are heavily dilutive and really expensive. Many Series A rounds for biotechs are over 50% dilutive. This makes sense given the risk profile of drug companies; there is a somewhat binary risk associated with a single therapeutic agent. Yes, some of that risk can be mitigated with a pipeline, but many companies succeed and fail with their first and furthest along asset. Tools companies, on the other hand, have the ability to access debt and other non-dilutive mechanisms for financing growth.
More generally, we see significant opportunities to improve collaboration and streamline regulation in ways that accelerate the industry at large. For example,
The IQ MPS consortia is a great starting point focused on evidence generation. We wonder if that collaboration model could also be extended to funding and value-sharing. For example, industry groups could fund model development for particular physiologies of interest that would benefit the whole industry. This should begin with things like predictive toxicity, where sponsors could also provide reference compounds that could be used as positive or negative controls. Meaningful incentives would drive meaningful investment, and as we’ve outlined here, focusing on these systems is highly relevant to better outcomes in this industry.
Congress and the FDA have similarly been thinking hard about model systems that are relevant for the approval of therapies. The recent FDA Modernization Act lays the groundwork for removing arbitrary requirements with low predictive validity from the drug approval process; e.g., in some cases, animal models should explicitly not be used for efficacy testing. We hope we’ll see more forward-looking legislation introduced to accelerate the industry toward decision tools that make sense.
A thoughtful approach to culture and process
Rationally, we all know that it makes sense to develop an approach that understands the limitations of preclinical models and rationally follows the data. However, we struggle to put this into practice in the industry at large. As a result, for companies in the space – especially young companies! – taking a thoughtful approach to process and culture is critical. We’ll mention two frameworks that offer starting points for improving translational success: AZ’s five-dimensional framework and Robert Plenge’s 4 component “disciplined approach.”
AstraZeneca focuses on what they describe as the five R’s: right target, right tissue, right safety, right patients, and right commercial potential. Robert Plenge breaks his framework into four categories: causal human biology, therapeutic modulation, biomarkers of target modulation, and proof of concept clinical trials. These both share the view that confidence in the target correlates well with positive outcomes. AZ noted that lack of solid disease understanding and thus efficacy was the most important factor of project failures in clinical trials, and Plenge notes that “targets should be selected on the basis of a deep understanding of causal human biology.”
In conclusion
To generate and test these hypotheses, as we’ve discussed, we need strong translational models that recapitulate the relevant human physiologies we intend to modulate and creative approaches to company building to bring those to bear at scale in the world. If you’re building in the space — or if you just have questions, comments, or ideas — please reach out to us at bio@innovationendeavors.com. We would love to hear from you.
Until next time!
What we’re reading and listening to
For the translational issue
💊Speaking of translational models, oxygen is a critical regulator of cellular metabolism and function in cell culture → Controlling oxygen availability can also improve the translatability of model systems
💊#Unshackled: The evolving definition of asset-centricity | Drug Baron → Being thoughtful about the data driving our decisions is easier said than done
💊What's the target anyhow? Understanding true MoA → Sometimes, biology finds mechanisms of action beyond what we originally designed for
💊Why are clinical trials so expensive? → Turns out that not all productivity barriers can be addressed by better drugs and model systems
Also on our minds
💊Another big approval for gene therapy - $3.5M treatment for haemophilia B
🧬A putative design for electromagnetic activation of split proteins for molecular and cellular manipulation → Turns out you can reconstitute split proteins with magnetic fields
🧬Modeling genetic stability in engineered cell populations → Interesting framework to shed light on ways that construct design can impact evolutionary stability
🌎Frontier’s carbon removal knowledge gaps → Lots of room for innovation around biological carbon removal methods
📜UPSIDE foods clears regulatory hurdle → First FDA approval for cultivated meat
🌎Funneling mixed waste with microbes → Working with mixed plastic waste is hard; here, researchers pair a chemical process with microbial degradation to break down a mixture of three common plastics
🐬Just for fun: it turns out Dolphins can tolerate a good amount of spice