Combocat: How to Install and Use the New AI Tool That Cuts Drug Screening Costs by 90%

Watch or Listen on YouTube
Combocat: The AI Tool Cutting Drug Screening Costs by 90%

1. Introduction: The End Of Trial And Error In Drug Combinations?

Every drug-combo project starts with optimism and ends with math. Two drugs? Easy. Ten doses each? Still manageable. You get a tidy 10 × 10 grid, a heatmap, and the comforting feeling that biology is a surface you can map. Then someone asks for “just a few more drugs.”

At that point, the problem stops scaling like a screen and starts scaling like a tax. The number of possible 2-drug pairs grows quadratically as you add agents, which makes exhaustive drug combination screening impractical fast.

Combocat is a very practical response to that pain. It stitches together two things labs already have, or wish they had: precise acoustic dispensing in the wet lab and a machine learning inference step in the dry lab. The paper frames the idea clearly, combine acoustic liquid handling protocols with machine learning-based inference so you can screen far more combinations without turning dose resolution into a blurry sketch.

This post is a straight path through the system: what it is, how “sparse mode” works, what hardware you truly need, and how to install the R package and run the workflow.

2. What Is Combocat? The Framework Explained

A close-up photograph of an Echo 655 liquid handler dispensing a nanoliter droplet into a 1536-well plate.
A close-up photograph of an Echo 655 liquid handler dispensing a nanoliter droplet into a 1536-well plate.

Combocat is an open source drug discovery software stack that treats screening as an end-to-end system, not a pile of scripts and plate maps taped to a fridge. It has two modes:

  • Dense mode, which measures every dose pair in a full 10 × 10 matrix, with replicates and controls baked into the 384-well template.
  • Sparse mode, which measures a much smaller set of points and uses predictive modeling to infer the rest.

The famous “90% fewer measurements” claim is not marketing fluff. Sparse mode measures only 10 diagonal dose pairs instead of all 100 dose pairs in a 10 × 10 matrix, a direct 90% reduction in measured dose pairs for each pair.

If you follow AI in drug discovery, you’ve seen two kinds of “AI” stories. One is glossy and vague. The other is boring in the best way, tied to pipelines, constraints, and failure modes. Combocat sits firmly in the second camp: high throughput screening designed around real plates, real volumes, and a model trained on real dense matrices.

Table 1. Dense Vs Sparse Screening Modes

Combocat Screening Modes Comparison

Dense Mode vs Sparse Mode, key differences that affect scale and cost.
Combocat table comparing dense mode and sparse mode features.
FeatureDense ModeSparse Mode
Plate format384-well1536-well
Per-well compound transfer200 nL total (Echo 655)20 nL total (Echo 655)
Measurements per drug pairFull 10 × 10 (100 dose pairs)Diagonal only (10 dose pairs)
Core purposeReference-quality mapsFast prioritization at scale

3. Understanding Sparse Mode: How The AI Predicts Synergy

A holographic visualization showing a diagonal line of measured data points extending to predict a full drug combination matrix using AI.
A holographic visualization showing a diagonal line of measured data points extending to predict a full drug combination matrix using AI.

Sparse mode sounds like you’re skipping the hard part. You are, but in a way that keeps the hard part recoverable. In sparse mode, it does two things:

  • It measures single-agent dose responses across ten doses.
  • It measures only the diagonal of each combination matrix, those 10 dose pairs at a relative 1:1 ratio.

3.1 The Diagonal Method

The diagonal rule is simple: dose 6 of Drug 1 pairs with dose 6 of Drug 2, dose 7 with dose 7, and so on. That choice is what makes large screens feasible. Ten observed combo points instead of one hundred, multiplied across thousands of pairs.

3.2 The Inference Engine

The “AI” part of Combocat is not one giant model. It’s an ensemble of 90 regression models, each trained to predict one of the 90 non-measured matrix entries. Training is supervised, grounded, and frankly sensible:

  • Start with fully measured dense matrices.
  • Downsample them to match the sparse features, including 30 measured values used as features.
  • Fit models (XGBoost via tidymodels), validate on held-out matrices, then serialize the ensemble so it can be deployed without retraining.

Model performance is reported as strong, with a median R² of 0.947 across the 90 models in cross-validation, and most models clustered tightly around that median.

Once the full response surface is reconstructed, Combocat quantifies synergy using Bliss independence in the main pipeline and ranks pairs so you can focus validation work where it matters.

If you do computational drug discovery for a living, this is imputation with an experimental backbone. If you live in virtual screening in drug discovery, treat it as a pragmatic ranking layer for combinations, except the inputs are measured biology, not docking scores.

4. Hardware Requirements: The Role Of Automated Liquid Handling

This is the part people try to hand-wave. Don’t. Combocat is built around automated liquid handling with an Echo 655 acoustic liquid handler, and the paper explicitly ties both dense and sparse workflows to Echo protocol files.

Dense mode transfers 200 nL total compound per well using the Echo 655, then adds 40 μL of cells. Sparse mode shrinks that to 20 nL compound and 4 μL cells in a 1536-well format. That miniaturization is the economic engine behind scaling. So yes, “automated liquid handling” belongs in the same sentence as “budget,” because it is the budget.

One more practical note: dense and sparse measurements largely agree, but they can differ because sparse mode uses smaller volumes and a different plate format, which can shift potency or dynamic range.

5. Step-By-Step Guide: How To Install Combocat

On the software side, Combocat is refreshingly straightforward. It’s an R package plus a pre-trained model file you can download.

5.1 Prerequisites

You need:

  • R (recent version)
  • RStudio (optional, helpful)
  • An internet connection that doesn’t sabotage GitHub downloads
  • The devtools package

5.2 Step 1: Get The Code

The paper states the source code and R package are open-source on GitHub under the Apache 2.0 license.

https://github.com/wcwr/combocat

5.3 Step 2: Install The R Package

install.packages(“devtools”)

devtools::install_github(“wcwr/combocat”)

5.4 Step 3: Download The Deployable Model And Echo Protocols

The deployable model file, Echo acoustic liquid handler protocols, and full documentation are hosted on the project site.

https://combocat.stjude.org

That “deployable model” detail matters. It turns Combocat from “cool methods section” into something you can actually run without building the ensemble yourself.

6. The Workflow: From Single Agents To Synergy Scores

Think of the workflow as three phases that line up cleanly with how people already work.

6.1 Phase 1: Single Agent Screening

Run each drug across ten doses on single-agent plates, with replicates appropriate to your assay. Sparse mode is designed so single agents are not re-measured for every pair, they can be mapped onto multiple combination matrices.

6.2 Phase 2: The Sparse Combination Screen

For each pair, measure the diagonal only, those ten 1:1 dose pairs.

This is where large-scale drug combination screening stops being theoretical. The methods even give a plate-count formula for planning screens, anchored to the fact that 135 unique drugs can fit in the usable wells of a 1536-well plate design.

6.3 Phase 3: Computational Prediction And Ranking

Feed the measured single-agent and diagonal data into the R package, use the serialized ensemble to infer missing matrix entries, then compute synergy (Bliss by default) and rank pairs.

This is where computational drug discovery meets wet lab reality. You get full surfaces and synergy summaries without paying the full measurement cost.

Table 2. Practical Inputs And Outputs For A Typical Run

Combocat Workflow: Inputs And Outputs

From raw reads to inferred surfaces and ranked synergy, in three steps.
Combocat workflow table listing each step with inputs and outputs.
StepInputOutput
QC + normalizationRaw readouts + controls
Cleaned % cell death values
Sparse inference30 measured features + modelFull 10 × 10 response surface
Synergy scoringReconstructed surfaceBliss synergy matrices and rankings

7. Validation: How Do We Know It Works?

Combocat earns trust the old-fashioned way, it measures a lot, then checks itself. Dense mode produced a reference dataset of 806 combinations with over 290,000 measurements, and the paper reports strong assay quality with mean Z′ of 0.747. That dataset is what makes sparse inference plausible.

The authors also built a QC pipeline specifically to handle spurious measurements, and they define concrete thresholds:

  • Standard deviation threshold of 29 for disqualifying noisy doses
  • Residual threshold of 15% from fitted dose-response curves
  • Monotonicity threshold of 16% decreases between consecutive doses, including checks along the diagonal in sparse mode

Flagged values can be excluded to compute an adjusted Bliss score that ignores disqualified measurements.

They also sanity-check against known biology. In their dense screen, the top-ranked combo includes sulfamethoxazole and trimethoprim in E. coli, a well-established synergistic pair, and the paper shows the expected strong synergy pattern.

Then comes the key test for any inference method: re-screen predicted hits in dense mode. They re-screened 40 combinations, the top 30 predicted hits plus 10 random excluded pairs, and report that most top-ranked pairs retained strong synergy patterns in dense mode.

8. Real-World Application: Screening 9,045 Combinations

Scientists in a massive automated lab review a large-scale heatmap of 9,045 drug combinations detected by Combocat.
Scientists in a massive automated lab review a large-scale heatmap of 9,045 drug combinations detected by Combocat.

The headline result is scale. Using sparse mode, Combocat screened 9,045 drug combinations in the neuroblastoma cell line CHP-134, described as the largest number of unique combinations tested in a single cell line to date.

They used 135 small molecules, and the applied screen description includes six replicates of each single-agent plate.

This is why the method matters for high throughput screening. It changes what’s feasible for a single lab, and it changes how you spend your validation budget. Screen wide, then validate deep.

9. Limitations And A Quick Hype Check

Combocat comes with constraints, and the paper is direct about them.

  • Training data quality sets the ceiling. As new data get added, continuous hold-out testing matters to prevent leakage and keep generalization real.
  • Plate physics can shift biology. Miniaturized volumes and plate layout differences can occasionally shift potency or dynamic range.
  • Synergy models have limits. Combocat supports Bliss and Loewe, but Bliss can misestimate synergy for drugs targeting similar pathways, and Loewe can be undefined without dose-equivalence, which the paper reports occurred in 61.5% of sparse mode dose combinations tested.

Here’s the hype check I use: does the method make you better at asking questions, or does it just make prettier plots?

Combocat makes you better at asking questions because it buys you scale without forcing you to give up dose resolution. It’s also not generative AI in drug discovery, and that’s fine. If you want generative systems, pair them upstream to propose compounds. Let virtual screening in drug discovery prioritize candidates computationally. Then let this workflow test combinations with fewer wet-lab measurements.

10. Conclusion: Where Combocat Fits In The Future Of Pharma

Combocat breaks a long-standing tradeoff: test few pairs with high resolution, or many pairs with low resolution. It keeps the resolution, then reduces experimental load with diagonal measurements plus an ensemble inference model trained on dense reference data.

The most “future-proof” part is that it’s designed for community improvement. The discussion explicitly points to anonymized dense datasets being contributed back for retraining and improving the ensemble model over time.

If you want to get value from Combocat this week, do one of these:

  • Wet-lab teams with Echo access: install the package, download the deployable model and protocols, then run a small sparse screen on a set of pairs you already know, and confirm the rankings make sense.
  • Teams with dense datasets: contribute anonymized matrices to strengthen the reference distribution and improve inference for everyone.
  • Computational teams: treat Combocat as a new interface layer. It turns sparse assays into full response surfaces you can analyze, model, and sanity-check.

It won’t replace good experimental design. It scales it. If that sounds useful, install it, run the example pipeline, and start turning “we should test this combo someday” into an actual ranked hit list.

Automated liquid handling: Robotic dispensing of liquids to improve speed, precision, and repeatability in assays.
Acoustic dispensing: Moving tiny droplets using sound waves instead of pipette tips, enabling nanoliter-scale transfers.
Echo 655: A specific acoustic liquid handler often referenced for high-precision, non-contact dispensing in miniaturized screens.
1536-well plate: A microplate format with 1,536 tiny wells, used to run very large experiments with very small volumes.
High throughput screening (HTS): Methods that test many conditions quickly, usually with automation and plate-based assays.
Drug combination screening: Testing pairs (or sets) of drugs across doses to find synergy, additivity, or antagonism.
Dose-response matrix: A grid of measured effects across two drugs’ dose ranges (often 10×10), used to see interaction patterns.
Sparse Mode: A strategy that measures only a small fraction of dose pairs (often the diagonal) and predicts the rest computationally.
Dense Mode: The “measure everything” approach, usually a full 2D matrix across doses, used for highest fidelity and model training.
Diagonal sampling: Measuring matched dose pairs (1:1 progression) across the matrix rather than every combination of doses.
Inference model: A model that estimates unmeasured values based on measured inputs, typically to fill gaps in experimental data.
Ensemble model: Multiple models working together, often improving accuracy and robustness versus a single model.
Synergy score: A numeric measure that estimates whether a combo outperforms what you’d expect from each drug alone.
Bliss independence: A common synergy framework that assumes drugs act independently, then compares observed effects to that expectation.
Loewe additivity: A synergy framework based on “a drug combined with itself,” often used when drugs share mechanisms or overlap.

What is the Combocat platform for drug discovery?

Combocat is an open source drug discovery software framework that combines acoustic dispensing protocols with machine learning to predict drug synergy. In practice, it connects automated liquid handling in the lab to an inference model that reconstructs full combination response maps from limited measurements.

How does AI reduce the cost of drug combination screening?

Combocat uses “Sparse Mode,” where you test only about 10% of the usual dose pairs by measuring the diagonal (a matched 1:1 dose series). The AI in drug discovery step then infers the missing points, so you spend less on reagents, plates, and instrument time while still getting a full matrix.

What hardware is required to run Combocat?

To run the full experimental workflow as designed, you typically need an Echo 655 acoustic liquid handler plus standard plate workflow gear (1536-well plates, incubation, and a viability readout). The computational side is separate, the software is an open-source R package you can install and run anywhere.

Is Combocat a generative AI or an inference model?

Combocat is not generative AI in drug discovery in the “make new molecules” sense. It’s an inference engine built on regression models that predict missing experimental response values from real measured data, so it does not “hallucinate” chemistry, it completes matrices from sparse physical screens.

Can Combocat be used with FDA-approved drugs?

Yes. Combocat is designed for drug combination screening with real compounds, including FDA-approved drugs, which makes it useful for repurposing and finding new synergies. The platform’s large screening design is explicitly aimed at scaling combinations across many existing molecules.

Leave a Comment