Scientific
methodology,
automated.

Formulate falsifiable hypotheses. Design controlled experiments. Test against 28 real databases. Grade the evidence. 91 computational tools across 4 scientific domains. Grounded in Popper, Fisher, and GRADE.

Open source (AGPL-3.0) · Self-hostable · COSS

91 tools28 data sources4 domains17 visualizations

How It Works

Six phases. Each grounded in established science.

Every investigation follows a structured scientific protocol. Ehrlich doesn't just search the internet — it formulates hypotheses, designs experiments, tests them against real data, validates with controls, and grades the evidence using peer-reviewed frameworks.

01

Classification & PICO

Sackett (1996)

Decompose your question into Population, Intervention, Comparison, Outcome. Auto-detect domains. Multi-domain questions merge configs automatically.

02

Literature Survey

GRADE + AMSTAR-2

Systematic search with citation chasing. GRADE-adapted evidence grading. AMSTAR-2 quality self-assessment. Haiku compresses and classifies.

03

Hypothesis Formulation

Popper + Platt + Bayes

Falsifiable hypotheses with predictions, null predictions, success/failure criteria, scope, type, and prior confidence. You approve before testing starts.

04

Experiment Execution

Fisher (1935)

Experiments with independent/dependent variables, controls, confounders, and analysis plans. Two experiments run in parallel. 91 tools across 4 domains.

05

Validation & Controls

Zhang (1999) + Y-scrambling

Negative controls with known-inactive compounds. Z'-factor assay quality. Permutation significance testing. Scaffold-split vs random-split comparison.

06

Synthesis

GRADE synthesis

Certainty grading (5 downgrading + 3 upgrading domains). Priority tiers. Limitations taxonomy. Knowledge gap analysis. Follow-up recommendations.

Every Hypothesis Carries

statement The core claim
prediction What should be true if correct
null_prediction What to expect if wrong
success_criteria Measurable threshold for support
failure_criteria Measurable threshold for refutation
prior_confidence Bayesian prior (0-1)

Every Experiment Carries

independent_var What is being manipulated
dependent_var What is being measured
controls Positive and negative controls
confounders Known confounding variables
analysis_plan Statistical approach + thresholds
sensitivity How robust to parameter changes

Console

What you see while it runs.

SSE events stream into the console in real time. Hypotheses update live. Candidates rank as experiments complete. Charts render when visualization tools fire. You approve hypotheses before testing begins.

localhost:5173/investigation/inv_8f3a2b
Investigation
Research Question

Find antimicrobial compounds effective against MRSA with low resistance risk and favorable ADMET profiles

Investigation Timeline~30s total
PICO
LIT
FORM
TEST
CTRL
SYNTH
[PICO]Domain detected: Molecular Science0.8s
[LIT]23 papers found via Semantic Scholar3.2s
[LIT]GRADE assessment: moderate certainty4.1s
[FORM]3 hypotheses formulated (Opus)8.4s
[FORM]Awaiting approval...8.5s
[TEST]ChEMBL screen → 47 candidates12.3s
[TEST]AutoDock Vina → 12 binding hits18.7s
[TEST]XGBoost trained (scaffold-split AUC: 0.84)22.1s
[CTRL]Z'-factor: 0.72 (excellent)25.6s
[SYNTH]GRADE synthesis: ⊕⊕⊕⊖ moderate30.2s
Hypothesis Board
H1SUPPORTED

Compound X MIC < 4 µg/mL against MRSA

0.89
H2REFUTED

Resistance risk via efflux pump mutations is low

0.23
H3TESTING

ADMET profile permits oral bioavailability

0.65
localhost:5173/investigation/inv_8f3a2b#candidates
Candidates
Candidate Ranking
IDScoreDockingADMETLipinski
CMP-12470.94-8.7 kcalPass5/5
CMP-08930.87-7.9 kcalPass5/5
CMP-21560.81-7.2 kcalWarn5/5
CMP-04120.74-6.8 kcalPass4/5
localhost:5173/investigation/inv_8f3a2b#admet
ADMET Profile
ADMET Radar
AbsorptionMetabolismToxicitySolubilityPermeabilityStability
Multi-Model Architecture

Choose your team, match the task.

Every investigation assembles a team of three specialized models. Pick the tier that fits your question -- from fast exploration to maximum reasoning power.

Director
Opus 4.6
Formulates hypotheses with predictions, criteria, and Bayesian priors. Designs experiments with controls and confounders. Evaluates evidence and synthesizes findings with GRADE certainty. Streaming with 10K token extended thinking.
WHY: Hypothesis quality requires deep reasoning. No tool access -- pure scientific thinking.
Parallel Execution · 2 experiments per batch
Researcher A
Sonnet 4.5
Executes experiment protocol. Queries databases, trains models, runs statistical tests, screens candidates. Max 10 tool calls per experiment.
WHY: Fast tool execution. Domain-filtered to only relevant tools.
Researcher B
Sonnet 4.5
Independent experiment on a different hypothesis. Cross-references literature, validates controls, computes metrics.
WHY: Parallel execution halves wall-clock time per batch.
Summarizer
Haiku 4.5
Compresses tool outputs over 2000 characters. PICO decomposition. Domain classification. GRADE evidence grading. Keeps the Director focused on reasoning, not parsing.
WHY: Compression is mechanical, not creative. Haiku is 60x cheaper than Opus.

Scientific Domains

Four domains. Domain-agnostic engine.

Each domain brings its own tools, scoring definitions, and prompt examples. The orchestrator, methodology, and persistence work identically across all. Multi-domain questions are auto-detected and merged.

MOLECULAR SCIENCE

22 TOOLS

Drug discovery, antimicrobial resistance, environmental toxicology, agricultural biocontrol.

Molecular property analysis (structure, 3D shape, fingerprints)
Drug effectiveness screening across thousands of assays
3D binding simulation (how molecules fit into proteins)
Drug safety and absorption prediction (ADMET profiling)
Environmental toxicity data (EPA CompTox)
Protein target discovery (200K+ structures)
Visualizations
Binding scatterADMET radarForest plotEvidence matrix
Example:Find drug candidates effective against antibiotic-resistant Klebsiella

TRAINING SCIENCE

11 TOOLS

Exercise physiology, protocol optimization, injury risk assessment, clinical trial evidence.

Combine multiple studies to measure overall training impact
Side-by-side protocol comparison with evidence ranking
Injury risk scoring (sport, load, history, age)
Training load monitoring (cumulative stress, recovery, fatigue)
Clinical trial + PubMed literature search
Performance modeling (predict fatigue dips and peak readiness)
Visualizations
Training timelineMuscle heatmapPerformance chartDose-response
Example:Compare periodized vs non-periodized resistance training in trained athletes

NUTRITION SCIENCE

10 TOOLS

Supplement evidence, nutrient adequacy, drug interactions, inflammatory scoring, safety monitoring.

Supplement label ingredient lookup (120K+ products)
Nutrient profiling across 1.1M+ foods
Intake adequacy vs recommended targets (minimum, safe upper limit)
Drug-supplement interaction screening
Adverse event reports from FDA database
Inflammatory index scoring for dietary patterns
Visualizations
Nutrient comparisonNutrient adequacyTherapeutic windowFunnel plot
Example:Assess safety and efficacy of vitamin D3 + K2 supplementation at high doses

IMPACT EVALUATION

9 TOOLS

Causal analysis of social programs: education, health, employment, housing, sports. Four causal methods (DiD, PSM, RDD, Synthetic Control), 13 data sources across US and Mexico.

4 causal methods to measure real program impact (not just correlation)
Automated checks for hidden biases in study design
US federal data: spending, education, housing, health, labor
Mexico data: INEGI, Banxico, datos.gob.mx + indicator quality validation
World Bank + WHO global indicators (190+ countries)
Cross-program comparison + cost-effectiveness analysis
Visualizations
Program dashboardGeographic comparisonParallel trends
Example:What is the causal effect of conditional cash transfers on school enrollment in Latin America?

Multi-Domain Investigations

Ask a question that spans multiple domains and Ehrlich detects it automatically. DomainRegistry.detect() returns all matching domains. merge_domain_configs() creates a synthetic config with the union of tool tags, concatenated scoring definitions, and joined prompt examples. The researcher sees tools from all relevant domains.

Example:“Evaluate creatine supplementation for resistance training performance and renal safety”→ Nutrition + Training domains merged

Add Your Domain

Register a DomainConfig with tool tags, data sources, scoring definitions, and prompt examples. The engine handles orchestration, persistence, visualization, and reporting. Connect external tools via MCP servers — community-built domains plug in without modifying the core engine.

Contributing Guide

Visualizations

The system picks the right visualization.

The orchestrator intercepts tool results and renders the matching visualization automatically. 3D molecular viewers for docking results. Statistical plots for meta-analysis. Anatomy diagrams for training. Node graphs for hypothesis tracking. No configuration needed.

3D Molecular Viewers

3Dmol.js WebGL
  • Live Lab ViewerSSE-driven scene: protein targets load, ligands dock, candidates color by score
  • 3D Conformer ViewerMMFF94-optimized 3D structures with interactive rotate/zoom
  • Docking ViewerProtein + ligand overlay showing binding pocket and interactions

Statistical Charts

Recharts + Visx
  • Forest PlotMeta-analysis effect sizes with confidence intervals
  • Funnel PlotPublication bias assessment across studies
  • Dose-Response CurveDose-response with confidence band (Visx)
  • Evidence MatrixHypothesis-by-evidence heatmap (Visx)

Domain-Specific Charts

Recharts + Custom SVG
  • Binding ScatterCompound binding affinities across targets
  • ADMET RadarDrug-likeness property profiles (6 axes)
  • Training TimelineTraining load with ACWR danger zones + brush
  • Performance ChartBanister fitness-fatigue model (CTL/ATL/TSB)
  • Muscle HeatmapAnatomical front/back body diagram with activation intensity
  • Nutrient ComparisonGrouped bar chart comparing foods
  • Nutrient AdequacyHorizontal bars showing % RDA with MAR score
  • Therapeutic WindowEAR/RDA/UL safety zones per nutrient
  • Program DashboardMulti-indicator KPI view with target tracking
  • Geographic ComparisonRegion bar chart with benchmark line
  • Parallel TrendsDiD treatment vs control over time

Investigation UI

React Flow + Custom
  • Investigation DiagramHypothesis/experiment/finding node graph with status colors and revision edges
  • Hypothesis BoardKanban grid with expandable confidence bars and approval cards
  • Candidate TableThumbnail grid with 2D SVG + expandable 3D viewer + Lipinski badge
  • Candidate ComparisonSide-by-side scoring view for 2-4 candidates with best-in-group highlighting
  • Investigation Report8-section structured report with full audit trail and markdown export

Add Your Own

When you register a new domain, you can create custom visualization components using any rendering library: Recharts, Visx, D3, custom SVG, WebGL, maps, network graphs. Register them in the VizRegistry by viz_type string. The orchestrator auto-intercepts any tool result containing that type and renders it inline. Suspense boundaries, grid layout, and error fallbacks are handled for you.

Ground Truth

Every claim
has a source.

Ehrlich queries trusted global databases in real time. Findings link to ChEMBL compound IDs, PDB structure codes, DOIs, and PubChem CIDs. No hallucinated citations. No invented data points.

18
External APIs
+1
Institutional Memory

Self-Referential Research

Every investigation's findings are indexed in a full-text search database. Future investigations query past findings via search_prior_research. Knowledge compounds over time.

ChEMBL
Bioactivity data for any assay type
2.5M compounds·Free
Semantic Scholar
Literature search + citation chasing
200M+ papers·Free
RCSB PDB
Protein target discovery
200K+ structures·Free
PubChem
Compound search by target/activity
100M+ compounds·Free
EPA CompTox
Environmental toxicity + bioaccumulation
1M+ chemicals·API Key
UniProt
Protein function + disease associations
250M+ sequences·Free
Open Targets
Disease-target associations (scored)
12K+ targets·Free
GtoPdb
Expert pharmacology (pKi, pIC50)
Curated·Free
ClinicalTrials.gov
Exercise/training RCT evidence
500K+ studies·Free
PubMed
Biomedical literature with MeSH
36M+ articles·Free
wger
Exercise database (muscles, equipment)
800+ exercises·Free
NIH DSLD
Supplement label ingredients
180K+ products·Free
USDA FoodData
Nutrient profiles (macro + micro)
300K+ foods·API Key
OpenFDA CAERS
Supplement adverse event reports
Ongoing·Free
RxNav
Drug-nutrient interaction screening
RxNorm DB·Free
World Bank
Development indicators by country (GDP, poverty, education)
16K+ indicators·Free
WHO GHO
Global health statistics (mortality, disease, life expectancy)
2K+ indicators·Free
FRED
US economic time series (GDP, unemployment, CPI)
800K+ series·API Key
Census Bureau
US demographics, poverty, education
ACS 5-year·Free
BLS
US labor statistics (unemployment, CPI, wages)
130K+ series·Free
USAspending
Federal spending awards and grants
All agencies·Free
College Scorecard
US higher education outcomes
6K+ schools·API Key
HUD
Fair Market Rents, income limits
All counties·Free
CDC WONDER
US mortality, natality, public health
National·Free
data.gov
US federal open dataset discovery
300K+ datasets·Free
INEGI
Mexico economic/demographic time series
400K+ series·API Key
Banxico
Mexico central bank series
Financial·API Key
datos.gob.mx
Mexico federal open datasets
1000+ datasets·Free
Ehrlich tsvectorSelf-referential
Past findings (institutional memory)
Growing·Internal

Who It's For

Same product at every level.

All 91 tools, all 28 data sources, and the full 6-phase methodology at every tier. The only variable is the Director model quality.

Student

Free Haiku. 3 investigations/month.

Learn scientific methodology by doing it. Every investigation teaches hypothesis design, experimental controls, and evidence evaluation. Same tools the professionals use.

Academic Researcher

Monthly credits. Sonnet for routine, Opus for publications.

Run systematic reviews, test hypotheses across domains, build on prior findings through self-referential search. Full audit trail for reproducibility.

Industry / Government

BYOK. Your Anthropic key, our methodology + tools.

91 computational tools, 28 data sources, structured reporting. Commercial license for private modifications. Self-host or use the hosted instance with your own Anthropic key.

Why Ehrlich

What makes this different

The AI implements the scientific methodology. It doesn't invent it. Tools execute on real data. Findings link to real sources.

Real Computation

91 tools that compute, not summarize.

Ehrlich trains ML models, runs causal inference, executes statistical tests, and validates with controls. Every tool returns structured data from real computation or real APIs -- not summaries.

  • Molecular docking + drug-likeness profiling
  • ML classifiers on any structured data (train, predict, cluster)
  • Causal inference: Difference-in-Differences, Propensity Score Matching, RDD
  • Statistical testing (t-test, Mann-Whitney, Fisher, chi-squared)
  • Nutrient interaction screening + adverse event monitoring

Open Source, Self-Hostable

COSS. Same code, two paths.

Self-host with your own API key for free -- no limits, no credits, no account. Or use the hosted instance where credits cover Anthropic API costs. A student in Mexico and a pharma company in Boston get the same 91 tools, the same 28 data sources, the same methodology.

  • Self-host: clone, bring your API key, no limits
  • Hosted: credits cover Anthropic costs (Opus is expensive)
  • Credits: Haiku (1), Sonnet (3), Opus (5)
  • AGPL-3.0: inspect, modify, extend, contribute
  • Commercial license for private modifications

Structured Methodology

Popper, Fisher, GRADE. Not conversation.

Every investigation follows a 6-phase protocol with falsifiable hypotheses, controlled experiments, evidence hierarchies, and GRADE certainty grading. Findings link to real source IDs. You approve hypotheses before testing begins.

  • Falsifiable hypotheses with predictions + criteria
  • Controlled experiments with confounders + analysis plans
  • 8-tier evidence hierarchy traced to original sources
  • GRADE certainty grading on final synthesis
  • User approval gate before experiment execution

Open Source

Ehrlich is COSS -- Commercial Open-Source Software. The same model used by Supabase, PostHog, Cal.com, and GitLab. The entire codebase is open source under AGPL-3.0. There is no proprietary version.

domain_config.py
MATERIALS_SCIENCE = DomainConfig(
    name="Materials Science",
    tool_tags=frozenset({"materials", "simulation"}),
    score_definitions=[
        ScoreDefinition(
            name="hardness",
            label="Vickers Hardness",
            unit="HV",
        ),
    ],
    prompt_examples=[
        "Discover alloys with high-temperature stability..."
    ],
)

registry.register(MATERIALS_SCIENCE)

AGPL-3.0 (Free Use)

Students, academics, and individual researchers use Ehrlich freely. Self-host internally without restrictions. If you offer Ehrlich as a network service, modifications must be open-sourced.

Commercial License

Companies that want private modifications purchase an AGPL exemption. Includes commercial support, SLA, and custom domain development. Precedent: MongoDB, Confluent, GitLab, Spree Commerce.

91 Tools, 4 Domains

Molecular, training, nutrition, and impact evaluation. Each domain brings its own tools, scoring, and visualization. Add a DomainConfig and the engine handles the rest.

github.com/Sequela02/ehrlich

Roadmap

Three domains today. Any domain tomorrow.

The engine is domain-agnostic. Register a DomainConfig with tools, data sources, and scoring definitions. The orchestrator, methodology, and visualization pipeline work identically across all domains.

Planned Domains

Materials Science

Alloy design, polymer properties, crystal structure prediction. ICSD, Materials Project, AFLOW databases.

Genomics

Gene expression analysis, variant interpretation, pathway enrichment. NCBI, Ensembl, UniProt cross-referencing.

Environmental Science

Pollution monitoring, climate data analysis, biodiversity assessment. EPA, NOAA, GBIF integration.

Platform Features

MCP Ecosystem

Connect external MCP servers as tool providers. Community-built domains plug in without code changes to the core engine.

REST API

Programmatic access to investigations. Start, monitor, and retrieve results via API. Webhook notifications on completion.

Multi-Provider

Swap the Director, Researcher, or Summarizer to any LLM provider. OpenAI, Google, open-weight models. Mix providers per role for cost or capability.

Team Collaboration

Shared investigations, commenting, branching hypotheses. Build on each other's findings across your research group.

Public Beta

Hosted instance pricing.

Self-hosting is free with your own API key. The hosted instance uses Pay-as-you-go Credits (Haiku=1, Sonnet=3, Opus=5) to cover Anthropic costs. Alternatively, use Bring Your Own Key (BYOK) for free, unlimited hosted access (subject only to your Anthropic API limits).

Credits

Pay-as-you-go
Haiku (1), Sonnet (3), Opus (5)

Hosted infrastructure with no setup. Credits cover Anthropic API costs.

  • Haiku investigation = 1 credit
  • Sonnet investigation = 3 credits
  • Opus investigation = 5 credits
  • Full 6-phase methodology
  • Hosted high-performance infrastructure
Buy Credits

BYOK

Freeduring beta
Unlimited (Subject to Anthropic limits)

Bring Your Own Key. Use your Anthropic API key directly. Ideal for judges and heavy testing.

  • Your own Anthropic API key
  • No Ehrlich credit limits
  • We cover the compute/hosting cost
  • Full 91 tool access
  • Perfect for hackathon evaluation
Use Own Key

Or self-host.

Clone the repo, add your API key, run the server. No account needed. Full AGPL-3.0 access to everything.

terminal
$ git clone https://github.com/Sequela02/ehrlich
$ cd ehrlich/server && uv sync
$ export ANTHROPIC_API_KEY=sk-...
$ uv run uvicorn ehrlich.api.app:create_app --factory --port 8000

Run your first investigation.

Free tier. No credit card. 3 Haiku investigations per month. Full methodology. All tools.