Evidence Grading Methodology

Design Philosophy

ClinEvident grades the quality of published evidence for longevity interventions. It does not grade efficacy, make treatment recommendations, or endorse products. The distinction matters: a compound can have weak evidence and still work, or strong evidence for a limited indication. Our grades tell you how much you can trust what the research says — not what the research says.

The methodology is published openly because transparency is the foundation of credibility. A grading system that hides how it reasons is asking you to trust the grader. We publish the framework — the dimensions we evaluate, what each one means, and how we think about evidence — so you can evaluate the method itself. The precise scoring weights and thresholds are proprietary, the way a rating agency publishes its rating methodology without publishing its internal model coefficients.

The Six Dimensions

Every compound is scored across six dimensions. The first four plus Dimension 6 produce the composite signal grade. Dimension 5 grades compound-compound interaction pairs separately.

Dimension 1: Human Outcome Weight

Does the compound change hard clinical endpoints in humans? This dimension rewards randomized controlled trials with mortality, morbidity, or functional outcomes. Observational data, animal studies, and surrogate biomarker endpoints score progressively lower. This is deliberately the most heavily weighted dimension — because human outcomes are what ultimately matter.

A compound with Phase 3 RCT data showing reduced all-cause mortality sits at the top of this dimension. A compound with only preclinical lifespan extension in mice sits near the bottom, regardless of how impressive the animal data is.

Evidence Hierarchy: Why Rankings Differ

Not all evidence is interchangeable. The pharmaceutical industry has long assessed how well a finding is likely to translate to humans — whether a model reproduces the relevant human biology, and whether a result in that model predicts the human response. ClinEvident applies the same logic: human clinical outcomes carry the most weight, and evidence from animal or cell models carries progressively less.

This is not a stylistic choice. Published work in aging research has shown that lifespan effects translate poorly across species — a compound that extends life in worms or flies is, on its own, a weak predictor of benefit in mammals, and results vary even between genetic strains of the same species. Programs that test compounds across genetically diverse animals exist precisely because single-lab results so often fail to replicate.

This is why a compound widely promoted on the strength of animal data may receive a modest grade here. Popular longevity rankings often elevate compounds on animal lifespan results; ClinEvident leads with human evidence. Grading against that standard — rather than against popularity — is the more scientifically conservative position, and occasionally an unflattering one.

Dimension 2: Biomarker Relevance

Are the biomarkers the compound moves actually predictive of aging or disease? Validated surrogate endpoints (eGFR for kidney disease, HbA1c for diabetes) score higher than exploratory markers (NAD+ blood levels, telomere length). This dimension distinguishes between measuring something real and measuring something convenient.

Dimension 3: Signal Source Quality

Where does the evidence come from? Peer-reviewed publications in high-impact journals (NEJM, Lancet, Nature) score highest. Preprints, conference abstracts, and company press releases score lowest. This dimension captures the reproducibility and rigor of the evidence source, not just the finding itself.

Dimension 4: Organ Specificity

How precisely does the evidence map to specific organ systems? The FLOW trial showing semaglutide's kidney benefit scores high on organ specificity because the endpoint, population, and mechanism are kidney-specific. A general "anti-inflammatory" claim with no organ-level data scores low.

Dimension 5: Interaction Certainty (Reported Separately)

How well-characterized are the interactions between this compound and other compounds or behaviors? This dimension grades compound-compound pairs at the combinatorics level, not individual compounds. It powers the Interaction Checker tool and is critical for safe protocol design.

The metformin-exercise AMPK antagonism (validated by the MASTERS RCT) is an example of a high-certainty interaction. The protein-rapamycin mTOR tension is an example of a moderate-certainty interaction where the dose-governance question remains unresolved.

Grading Interactions

Interactions are graded in both directions. An interaction grade can be favorable, neutral, or unfavorable — and an unfavorable interaction is a first-class result, not an omission. Some of the most valuable findings are combinations the evidence suggests you should avoid, and the framework is built to surface those as prominently as beneficial ones.

This includes adjunct combinations — an intervention used alongside an approved therapy. Where clinical evidence exists for such a combination, ClinEvident grades that evidence, including where it points to harm or reduced efficacy. The framework does not position any supplement or behavior as a therapy-enhancer; it grades what the combination evidence actually shows.

And when no combination evidence exists, that is itself reported. An interaction the framework has not graded returns an explicit "no evidence" result — never a default assumption of safety. Absence of evidence is not evidence of safety.

None of this is medical guidance. Interaction grades describe what the combination evidence shows; whether to combine anything is a decision for you and your clinician.

Dimension 6: Translational Maturity

How close is the evidence to clinical actionability? A compound with completed Phase 3 trials and FDA approval for a related indication sits at the top of this dimension. A compound with only in-vitro studies sits near the bottom. This dimension captures the distance between "interesting science" and "something a physician can act on."

Composite Grade Calculation

The composite signal grade combines the dimensions above into a single letter grade (A through E, with +/− modifiers). The dimensions are weighted — Human Outcome carries the most influence, reflecting that human clinical evidence matters most — and combined into a composite that maps to the letter grade you see on each compound. The specific weights and score thresholds are proprietary.

The per-dimension bands shown on each compound page are summaries. Each band reflects multiple underlying levels of analysis — the type and phase of study, the endpoint, the source, and the strength of the signal — distilled into a single strength rating. The band tells you where the evidence stands on that dimension; the full evidence trail behind it is what the analysis is built on.

How to Read a Grade — and How Not To

The grading scale is easy to misread, so it is worth being explicit. A ClinEvident Evidence Grade is not a school grade. It does not tell you how good, safe, effective, or worthwhile something is. It tells you one thing: the strength and maturity of the published clinical evidence for a specific intervention and use.

That distinction matters in both directions:

A low grade does not mean "bad" or "unsafe." A D or E means the published clinical evidence is currently limited or absent — not that the intervention is harmful or useless. Something with weak evidence today may prove valuable later; the grade reflects what has been demonstrated, not what is ultimately true.

A high grade does not mean "you should take this." An A means the evidence is strong. Whether an intervention is right, safe, or advisable for you is a decision for you and your healthcare provider — not something a grade can answer. And an absence of evidence is never evidence of safety: a low or missing grade simply means the data does not yet exist.

In short: we grade the evidence, not the product, and not your decision.

Independence from Commercial Assessment

Evidence grades are editorially independent from the Scientari Business Case Score (BCS), which evaluates commercial viability separately. A compound with strong evidence and no commercial path (e.g., generic metformin for aging) is graded honestly on its evidence — the BCS captures the commercial reality without contaminating the evidence assessment.

No Sponsorship, No Paid Placement

ClinEvident accepts no sponsorship, paid placement, or advertising from any party we evaluate or list — no compound manufacturer, supplement brand, diagnostic company, or clinic can pay to be graded, graded favorably, listed, featured, or ranked. Grades reflect only the published clinical evidence. Directory listings (such as clinics) are editorial selections, never purchased, and a listing is not an endorsement. This separation is structural, not discretionary: the function that assigns grades is walled off from any commercial relationship, because independence is the only thing that makes an evidence grade worth reading.

The Rx–Supplement Divide

No supplement in the current registry scores above C+. This is not editorial bias — it is a structural reflection of the evidence landscape. Prescription drugs undergo large, well-funded Phase 3 RCTs with hard clinical endpoints. Supplements typically have small, short-term trials with surrogate biomarkers, funded by the supplement manufacturer. The grading framework rewards evidence quality, and the Rx evidence infrastructure systematically produces higher-quality evidence.

NMN's D+ grade will surprise consumers who assume popularity correlates with evidence. The senolytic field (D+Q, fisetin, quercetin) is the most over-hyped relative to evidence in the entire longevity space. These are not opinions — they are what the published data shows through the lens of a transparent, reproducible methodology.

Limitations

This framework grades evidence quality, not truth. A compound with insufficient evidence to grade may still have real biological effects — it simply lacks the published clinical data to score well. Grades can and will change as new research is published. Replicability/population diversity is not currently a standalone dimension but may be added as a Dimension 1 modifier in a future version.