Super evolution x biology - 2023 edition
A refresh on our investment thesis and some problems we’re excited to see tackled
Hello, friends, and happy not-so-new year! Rather than dive deeply into one topic, we thought we’d unpack our high-level investment thesis in more detail — and share a selection of some of the problems we’re excited to see folks tackle in the months ahead.
The super evolution, engineering biology
It is our foundational belief that data and compute will transform every part of the economy. After all, one of our partners, Eric Schmidt, pioneered this revolution in the world of bits at Google. Software increasingly drives how we work and communicate, consume, and create. Building in the digital world is increasingly frictionless. It is easy to experiment quickly, spin up complex software products from a laptop, and avoid large capital expenditures. Compared to the world of bits, building in the physical world of atoms and molecules is much harder. However, there is no way around it if we want to feed 10 billion people, turn the tide on climate change, cure and eradicate disease, and extend our healthspan.
At our core, we are data-at-scale investors. We are focused on new methods to generate data, make sense of that data, and ultimately use that information to build solutions in the physical world.
We believe this thesis to be especially salient in biological systems where the world is non-linear, dynamic, dense, stochastic, and operates on spatial-and-time-scales that vary by many orders of magnitude. Changes at the single molecule level (e.g., a single atom in a single protein) can propagate through multiscale networks and hierarchies to have an outsized effect at the organism level. The effects and outcomes of biological perturbations are often varied and unpredictable. A biological perturbation may lead to disease in one person, but health in another. A biological mutation might drive a certain function in one context but have a completely opposite effect in another. What’s clear is that biology is endlessly complex.
As a result of this complexity, we firmly believe that breakthroughs in the life sciences emerge at the intersection of technologies that drive more data, captured in a relevant context, that can be used to build new solutions. Cutting-edge computational methods serve as a foundational enabler to all parts of this flywheel.
Put another way, we believe super-evolution systems that drive smarter design-build-test cycles can transform what we can discover and create. Because while biology may be endlessly complex, it is representable. We just need the right methods.
Below, we briefly expand on what we believe to be the most important elements.
More data
Measuring the entirety of a biological system (i.e., the states and dynamics of all molecules in a biological system) is intractable, so we often rely on proxies — such as a half dozen cell surface markers for flow cytometry, or even a single chemical biomarker in a clinical trial. These measurements were (and still are) the workhorses of biological discovery. Next-generation sequencing (NGS) has unleashed a wave of data-driven biology over the past 15 years. NGS has also enabled experimental approaches to evolve from targeted (e.g., single gene knockouts) to vastly scaled (e.g., whole genome CRISPR screens). Additionally, with DNA synthesis and sequencing costs massively declining, it has become a favored barcoding approach that enables pooled assays with a much wider array of readouts.
Yet, the DNA sequence is just one piece of the system — chromatin accessibility, 3D structure, and histone modifications are just some of the additional DNA features we can now begin to interrogate. And there is a vast universe well beyond DNA — protein isoforms, abundances, post-translational modifications, locations, movements, interactions, the endless array of metabolites, cellular morphology, force, movement; the list goes on and on. Wide-ranging tools are rapidly emerging, and we’re still just beginning to scratch the surface.
So, when we say more data, we are thinking of two primary axes:
Scaled data — increased throughput and broader exploration of potential solution space.
e.g., automation; tools like droplet microfluidics and optical tagging that enable pooled assays; methods such as Perturb-Seq, CaRPool-Seq, and ENTER-seq that enable scaled experimental read-outs
Novel data — methods that enable interrogation of a new dimension of biology.
e.g., tools like super-resolution microscopy to enable the visualization of protein kinetics; FIB-SEM that enables fine-tuned resolution of large cellular volumes; cryo-EM to interrogate protein dynamics and complexes; fine-tuned mass spectrometry to scalably dissect post-translational modifications
Data in context
Data quality and data quantity are often flip sides of the same coin. Certainly, more data is better in most cases, but as we argued in a prior post on the topic, quality of data is more often the critical bottleneck. Continuing to develop tools that interrogate biology at high resolution is critical, but those tools are only as useful as they translate to a faithful recapitulation of the whole system. So, when we say better data, we mean not just novel data but data that’s actually predictive. We look for methods that embrace complexity and that are:
Contextualized — methods that enable measurement in most native environments possible (or, more likely, the most salient features of the in vivo context); native environments are constantly subject to many different dynamic contexts (e.g., anti- and pro-inflammatory conditions), so these external conditions should be considered in any experimental approach.
e.g., proteins are rarely static, yet many high-throughput screening campaigns (including virtual) try to hit an immobilized target isolated from the biochemical jello found in vivo
Functional — methods that directly measure activity rather than a proxy.
e.g., antibody campaigns seek the highest affinity binders, yet it is clear that less affinity is often required for functional activity
Multi-scale — methods that integrate data modalities to derive causality from small biochemical changes that cascade to function at the highest level of the system.
e.g., multi-modal clinical-genomic datasets allow us to interrogate the deep biology that drives therapeutic efficacy in the real world
Translational - methods that are aware of their own limitations and seek translational relevance to the ultimate design goal, whether that is drug activity or a scaled-up industrial process.
e.g., while humans are our best test tubes for translational relevance (in vivo veritas), complex in vitro models are increasingly predictive
Ability to build
For all of this information to make a difference, we need to be able to “close the loop” and write back into the physical world. We touch on this in two parts: 1) the ability to synthesize and test at a molecular level and 2) the ability to manufacture drugs or industrial products at scale.
Ability to synthesize & test: Over the past few decades, we’ve begun to characterize the unimaginable vastness of the chemical universe we could explore. Chemists estimate that >1060 compounds with drug-like characteristics could be made. For context, that’s more molecules than there are atoms in the solar system! To date, five major reactions represent more than 80% of reactions used for drug discovery purposes, meaning that current screening libraries are populated by compounds that only represent a tiny fraction of this cosmos. We’re confident there are gold mines in these unexplored galaxies and are excited by the diverse ways folks are innovating.
For example, Levin et al. are leveraging generative methods to design more efficient synthesis routes to compounds of interest, and a company in our portfolio, Think Bioscience, is using biological machinery to simultaneously synthesize and test novel compounds. Similar analogies can be applied to all molecular building blocks (e.g., DNA, RNA, proteins); our creativity is usually bounded by synthesizability, feasibility, and speed.
Manufacturing at scale: Today, 50% of new drugs are being produced biologically, with increasing complexity often driving astronomical costs. And the need to manufacture biological products extends far beyond pharma; if we are going to make 60% of the inputs to our physical economy using biology, we need to get much better at making huge volumes of product much more cheaply. We’ve written more about manufacturing innovations we hope to see in a previous blog post here.
Compute as the central enabler
As biological data grows in complexity and scale, we need computational approaches to manage, query, and learn across biological data streams. We hope it is becoming clear from the previous sections that we are pro-complexity. We believe the intersection of technologies that are emerging will enable the transition from reductionist approaches that often don’t translate to the real world. This is among the strengths of statistical learning approaches that have evolved over the last decade. More broadly, we are excited about computational strategies that leverage machine learning and other statistical learning methods (under the umbrella of AI) that:
Deconvolve signal from noise — biological data are noisy, convoluted, and include many overlapping signals; AI can help deconvolute true signals from the noise. A good example of this comes from the work of Lotfollahi et al. to enable single-cell reference mapping.
Make predictions and traverse search spaces — a core function of drug discovery is to make predictions; given a readout from a screen, what are the next best experiments to run? Machine learning approaches like Bayesian optimization are useful methods to efficiently navigate solutions, such as in this work on antibody design from Khan et al. One of our portfolio companies, Harmonic Discovery, is leveraging machine learning approaches to better predict ligand-target interactions to better design kinase inhibitors with rational polypharmacologic activity.
Generate something new — as we’ve seen with much of the recent excitement around ChatGPT and other large language models, methods can be enormously generative. In the context of drug discovery or engineering biology, these tools can learn general biological rules and imagine completely new constructs that might prove useful. A brilliant example of this comes from Yeh et al., out of the Baker lab working on novel luciferases. Ultimately, in an industrial setting, we’re keen to see predictions tied with high-velocity experimental feedback loops.
Ultimately, the goal of these approaches is to >10x scientific productivity and open new paths to discovery. We have already seen human workflows transformed by ML and automation over the last decade, but the advent of ever-improving foundation models (see an awesome primer here from Davis Treybig on our team) will transform the infrastructure we use to do science. Looking ahead, LLMs will change how we review the literature and extract insights from adjacent disciplines, program synthesis will democratize what people can build, and new tools paired with robotic automation could even generate hypotheses and run experiments at vastly higher throughput than previously possible. What will be critical, as one of our colleagues wrote elegantly about, is nailing the UX to integrate the strengths of both humans and machines. This will be the essential role of biotech data teams (check out Jesse Johnson’s and Jacob Oppenheim’s work on the subject).
What to build
So… what’s the takeaway? Invest in assets or platforms? Hardware? Software? Wetware? Full-stack drug discovery, next-gen diagnostics, or enabling tools?
Short answer: We’re excited to back companies that combine many of the above to enable scientists to traverse design spaces to solve problems more efficiently. We love platforms but don’t invest in platforms for platforms’ sake. Rather, we’re excited to see new tools and methods targeted towards impactful problems — whether that be a new drug for a clinically unmet need, plants that allow us to feed a growing population amidst a changing climate, or a tool that enables scientists to work more efficiently and effectively.
This may still sound like an ocean of possibility (and we think it is!), but of course, some types of companies aren’t a fit for this thesis. For example:
We are unlikely to invest in companies that don't place a strong emphasis on enabling data generation; we believe predictive models will get there one day, but we haven’t seen convincing evidence that biological systems are predictable enough solely in silico.
In drug discovery, we don’t invest in single (or few) asset plays; we invest in platforms that build differentiated pipelines and have unique technical defensibility.
We are unlikely to invest in companies whose core innovation is the discovery or optimization of a single production strain, e.g., for food or industrial biology applications.
There is a new generation of founders that view complex biological data in relevant contexts as their currency, and are building organizations with significant investments in engineering infrastructure to support these efforts. We think our portfolio companies are incredible examples of this (Eikon Tx, GROBio, Dewpoint Tx, Character Bio, Think Bio, Harmonic Discovery, and BioLoomics).
We can’t wait to see what tomorrow’s founders can build. We have quite a long wishlist (including the subset below) which we hope resonates with some of you. If you’re building in any of these spaces, please reach out. We’d love to hear from you.
Improved targets
We are always on the lookout for novel target biology identification and validation platforms. Only one drug in 10 makes it through the clinical trial gauntlet, often due to a wrong or not well-enough vetted understanding of disease biology. In particular, we are on the lookout for genetics-informed discovery platforms, better translational model systems, broader perturbational tools in a relevant context, novel assays that functionally understand in vivo biology, new sensing tools, and methods to help decode the wonderful complexity of our immune system. We need better tools that can take on the challenge of complex, polygenic diseases.
Better clinical translation tools
Clinical trials are the major cost driver for biotech, as well as the most important translational point. So, how do we ensure that each data point collected has the most marginal value? Real-world data, synthetic control arms, adaptive clinical trials - all of these are tools that should be increasingly relevant in discovery and clinical development into the next generation.
Drug delivery
Getting drugs to the right place at the right time is a massive feat and enables significantly safer and more efficacious treatments. New delivery vehicles and approaches hold significant promise for precise modulation of target biology — from novel vectors to conditionally active and logic-gated biologics to targeting moiety-drug conjugates.
Programmable medicines
As we’ve seen with the covid-19 mRNA vaccines, certain therapeutic modalities are uniquely programmable. In particular genetic medicine platforms enable significantly more efficient development paths as they are highly plug-and-play if designed appropriately. These approaches hold significant promise for decreasing the time and cost associated with drug discovery and expanding the aperture of MoAs; we’re excited by approaches that move beyond the monogenic disorders, that can edit, replace, or tune protein expression specifically and durably.
Next-generation effectors
Targeted inhibition via small molecules, antibodies, and ASOs has been the workhorse of drug discovery. The emergence of new tools for broader function - stabilizing proteins, agonising cell surface receptors, synthesizing macromolecules in vivo, editing faulty sequences, enabling polypharmacological activity, and ~medium molecules that punch above their weight, to name a few.
Intelligent software & automation for scientists
We’ve invested in data tooling in the broader tech ecosystem, where we’ve seen significant value accrue to companies by supercharging developers. It is clear that scientists in the life sciences are significantly under-tooled. We want products that allow scientists to work hand-in-hand with machines while automating and abstracting complexity that slows them down.
Computational diagnostics
Diagnosis drives decision-making. Diagnostics are the most essential piece of patient care, yet we lack the requisite resolution in a majority of diseases. This is partly due to the difficulty building, scaling, and funding diagnostics businesses. We’re excited by teams that integrate innovative assay and computational techniques to drive earlier and finer-grained diagnoses with a cost-structure that can support a scaled clinical assay. In a clinical context, technology must be paired with innovative business models that deeply integrate in patient, provider, and payer workflows to drive significant value.
Biology for the planet
2023 is, and will continue to be, a tough year for industrial synthetic biology, as companies face both a higher cost of capital and the tough reality that many early products have struggled to scale and become financially competitive. However, for teams that keep the hard-won lessons of the last decade in mind, we believe there’s an opportunity to build generational companies tackling some of our biggest planetary challenges — more on this topic to come in a future post (and in prior posts here, here and here).
What we’re reading and listening to
40,000 recipes for murder — Podcast & Paper — sometimes, we are grateful synthesis is a challenge
Functional biology in its natural context: A search for emergent simplicity — this awesome piece speaks to many of the concepts we draw inspiration from for our thesis
Dan Goodwin: Frontiers of AI-Powered Experimentation
Note: We are huge fans of the foundational work that Dan and Paul are doing at Homeworld Collective on behalf of climate biotech. They are hiring, so if you’re a biologist interested in working on climate and need somewhere to start, please reach out!
Sam Rodriques: Tasks & benchmarks of an AI Scientist
Andrew White: Assessment of chemistry knowledge in LLMs that generate code
Elliot Hershberg: Century of Biology on Solugen
Ruan et al.: Population-based heteropolymer design to mimic protein mixtures
Questions? Comments? Ideas?
As always, whether you have ideas, questions, or fiercely disagree with everything we’ve shared, we’d love to hear from you! Please feel to drop any of the authors of this post a note (emails below).
Until next time!
Nick (nolsen@innovationendeavors.com),
Carrie (cvonmuench@innovationendeavors.com), and
Galen (gxing@innovationendeavors.com), on behalf of the Innovation Endeavors team.
Portfolio jobs
Our companies are hiring! Peruse our job board if you’re interested in dynamic roles with teams doing incredible work in the life sciences and technology. And if you’re passionate about drug discovery, check out this VP of Drug Discovery and Development role at Harmonic Discovery.