From Prediction to Proof: Why AI/ML Drug Discovery Still Needs Experimental Data
Artificial intelligence, machine learning, and large language models (collectively referred to here as “AI/ML”) are genuinely reshaping how scientists imagine, prioritize, and advance new therapies. The promise of AI/ML is ubiquitous. We constantly see headlines touting how AI/ML will compress years of drug discovery into months and reduce the costs of identifying new therapies. However, there’s a fundamental truth that is not stressed enough: predictive models are only as powerful as the experimental data behind them. And right now, the data is the bottleneck.
Bridging the Gap Between Prediction and Biology
A predictive model can identify promising targets, rank candidate compounds, or predict binding interactions with a certain confidence. What the model cannot do is confirm that confidence within the noisy reality of biology. Validation requires experiments using assays designed with the computational question in mind and executed with a rigor that instills confidence to feed data back into the next model iteration.
This is where many AI/ML-driven programs quietly stall. The computational side is sophisticated, but the experimental infrastructure supporting it often isn’t built with the same intention. Bridging that gap should be a strategic imperative rather than an afterthought of a computational output.
Data Quality is the Limiting Factor for AI/ML
Computational power is rarely a bottleneck. Data quality almost always is. Part of the reason is the need for effective training sets that include sufficient true positives and true negatives and the contextualized metadata that can enhance model predictions. For example, understanding subtle variables, such as, but not limited to, reagent lot numbers, instrument settings, environmental conditions (temperature, humidity), compound origin, and experiment operator can influence assay performance and therefore any models trained on those datasets. When inconsistencies arise, the detailed level of traceability allows for rapid troubleshooting and ensures experimental signals remain interpretable for model development and optimization. For this reason, AI/ML teams are increasingly considering datasets as strategic assets that appreciate in value with every iteration rather than downstream considerations.
Speed and Rigor Are Not a Trade-off
When scientists are asked when they need their data, the most common answer is “Yesterday”. For AI/ML teams refining predictive models, speed matters even more. Rapid experimental feedback can determine whether a model iteration takes days, weeks, or even months. To support AI/ML teams, experimental workflows must be designed for adaptability and rapid iteration, enabling partners to move quickly from computational hypothesis to experimental validation. Moreover, rather than relying on rigid one-size-fits-all screening templates, each project should be tailored to the biological question being asked. That flexibility will generate the right data efficiently, without sacrificing the reproducibility and data fidelity that makes the results meaningful and valuable for modeling. And while speed is in mind, the goal is finding a balance of speed, flexibility, and meticulous attention to data fidelity to make partnerships productive and results trusted.
The Future: Closed Loops Between Models and Experiments
The next generation of drug discovery will belong to teams that master both sides of the equation: Smart algorithms and smart experiments that run in tight, continuous integration. We are moving towards closed-loop discovery, where computational predictions drive experimental design, and experimental results immediately sharpen model performance to accelerate the path towards viable drug candidates. The organizations that will lead this shift are building the infrastructure now, including experimental platforms designed for speed and adaptability, metadata frameworks that make datasets computable assets, and scientific teams that understand that they are not just running assays, they are training models that will define the next generation of medicine.
The therapies of the future will not emerge from code alone. They will come from the disciplined, iterative integration of computation and biology, where every prediction is tested, every result informs the next hypothesis, and the distance between insight and impact continues to shrink.
Three trends from SLAS2026 shaping assay development and HTS
While the life sciences community continues to process what we learned at the record-breaking SLAS2026 conference in Boston, it’s clear that assay development and high-throughput screening (HTS) are at an inflection point. Novel technologies, emerging biological questions, and new expectations for speed and decision-making are reshaping how we think about screening for small molecule drug discovery. Interestingly, while speed continues to dominate discussions, discovery strategy is also a core driver. Throughout the conference and exhibition, three themes consistently emerged in conversations with scientists, engineers, platform leaders, and drug hunters. 1. AI/ML Is Only as Good as the Biology Beneath It Artificial intelligence and machine learning (AI/ML) dominated many discussions at SLAS2026. AI is transforming how we design experiments, screen, analyze screening data, prioritize hits, and identify patterns across modalities. As enthusiasm for AI accelerates, it’s also becoming increasingly recognized that AI cannot compensate for weak biology. AI has not reduced the importance of assay development, rather, it has raised the bar. Predictive models rely on high-quality, reproducible, and biologically meaningful datasets. Therefore, there must be an emphasis on assay design, data quality, and data presentation (e.g., how well can the data be reused, reanalyzed, and integrated across other programs?). As a result, teams are thinking earlier about assay robustness, controls, and data structure as strategic assets. 2. Novel Targets Demand Creative Assay Thinking The second major trend is the growing focus on novel and biologically complex targets. Discovery teams are increasingly pursuing protein–protein interactions (creating them and disrupting them), transient complexes, intrinsically disordered proteins (IDPs), and pathway-level phenotypes. These targets and systems often resist traditional assay formats and therefore off-the-shelf solutions rarely apply. Instead, progress depends on creativity at the interface of biology, chemistry, and technology. Many novel targets may require innovative assay design, including those accessible through outsourcing models. High-throughput screening in this context is less about brute force and more about carefully engineered experiments that ask and answer the right biological question at scale. 3. Multi-Dimensional Readouts Are Becoming the Norm The third trend is an emphasis on designing data-rich assays. Cell-based assays shift further towards high-content screening, high-dimensional screening, and multi-parameter phenotypic screens. Multiplexing biochemical assays is more challenging, although continued innovation in the label-free space, particularly with mass spectrometry, offers a solution for screening two or more targets simultaneously. Finally, binding assays using DNA encoded library (DELs) and affinity selection mass spectrometry (ASMS) platforms can screen multiple targets in parallel and many compounds simultaneously. This evolution reflects a broader industry recognition that richer data enables deeper mechanistic insight, better triage of false positives, and earlier differentiation between compounds that merely modulate a signal and those that meaningfully impact biology. One challenge is that complex readouts magnify variability, sensitivity to experimental conditions, and consequences of poor assay design, and therefore intensifies the need for reproducibility and straightforward interpretability. Finding the balance between data-rich and overly complex is critical. Bringing It All Together: What ties these trends together is the nexus of technology and biology. AI, novel targets and biology, and high-content and/or high-throughput screening all depend on well-designed assays that faithfully represent, mirror, or mimic biological reality in the best way possible while remaining scalable and reproducible. As we continue our discussions from SLAS2026, the conversation should not be framed as “old versus new” approaches. Instead, it’s about recognizing that the future of screening is built on a deeper integration of innovative technology, biological insight, assay craftsmanship, and scalable execution.