Your p-value,
plain and simple.

Pvalr is a free p-value calculator that runs entirely in your browser. Pick a distribution, enter your test statistic, get the result. Nothing to install, nothing to sign up for.

A paper reports t = 2.05, df = 18, p = 0.027, and the methods section never says whether the test was one-tailed or two-tailed. Two-tailed gives p = 0.055. One-tailed in the predicted direction gives p = 0.027. One crosses 0.05 and one does not, and the choice was almost certainly made after the data came in. That is the most common p-value error in working science, and most of the others are variations on the same theme: the number is right for the calculation that was run, wrong for the one that should have been.

Pvalr is built around the five mistakes below.

Two-tailed vs. one-tailed: where most errors start

If a hypothesis predicts a direction before data collection (drug A reduces blood pressure more than placebo), a one-tailed test is defensible. If the hypothesis is non-directional (drug A produces a different blood pressure than placebo), it has to be two-tailed. The one-tailed p is exactly half the two-tailed value when the statistic sits in the predicted tail, so the temptation to switch after seeing the data is real, and the audit trail is almost invisible.

Type t = 2.05 with df = 18 into Pvalr, leave the tail selector on two-tailed: p = 0.0548. Switch to one-tailed: p = 0.0274. Same data, two conclusions at alpha = 0.05.

Pick the tail before you compute the statistic, write it into the pre-registration or lab notebook, and keep it. The garden-of-forking-paths critique (Gelman and Loken, 2013) is about exactly this kind of post-hoc switch, multiplied across hundreds of small analytic choices.

What the p-value does not tell you

A p-value is the probability of observing a test statistic at least as extreme as the one you computed, assuming the null hypothesis is exactly true. That definition rules out almost every popular interpretation.

It is not the probability the null is true. It is not the probability your result was due to chance. It is not an effect size. A p of 0.001 from n = 50,000 can correspond to a correlation of 0.014: real, but uninteresting. A p of 0.08 from n = 12 can correspond to a correlation of 0.52: interesting, but not conclusive. The American Statistical Association's 2016 statement lays this out across six principles. The one our users hit most often is principle four, that a p-value does not measure the size of an effect or its importance.

The disclaimer in the result panel exists for this reason. Pvalr reports a number and a verdict at the alpha you set. It does not report effect size, confidence interval width, or power, and none of those can be recovered from the p-value alone.

Choosing between Z and T when both seem valid

The Z test assumes the population standard deviation is known. In practice it almost never is. Most teaching examples that “use Z” are quietly substituting the sample standard deviation, which is what the T distribution was invented for in 1908 (Gosset, writing as “Student,” in Biometrika 6:1).

The decision table below is the one we keep pinned next to the build, because the framing in most stats textbooks is muddier than it needs to be.

SituationSample sizePop. SD known?UseWhy
Single meann < 30NoT, df = n-1T accounts for estimating sigma
Single meann >= 30NoT, df = n-1 (Z acceptable)T converges to Z as df grows
Single meanAnyYes (rare; calibrated QC)ZTextbook Z; assumption satisfied
Proportionn*p and n*(1-p) >= 10N/AZNormal approximation to binomial
Goodness-of-fit, contingencyExpected cell >= 5N/AChi-squareObserved vs. expected counts
Variance ratio, ANOVAN/AN/AF, two df valuesRatio of two scaled chi-squares

A working rule for coursework: if you computed your test statistic by dividing by s (the sample standard deviation), use T. If you divided by sigma (a population value handed to you in the problem), use Z. Mixing these up is the second-most-common error our users self-report when a pasted number does not match their answer key.

Degrees-of-freedom mistakes in T, chi-square, and F

Degrees of freedom is where p-values quietly go wrong when the calculation itself is right.

For a one-sample T test, df is n minus 1. For an independent two-sample T with pooled variance, df is n1 plus n2 minus 2. For Welch's T (unequal variances), df is the Welch-Satterthwaite approximation, almost always a non-integer (for example, df = 17.34). For a chi-square test of independence on an r-by-c table, df is (r minus 1) times (c minus 1), not the total cell count. For an F test in one-way ANOVA, df1 is the number of groups minus 1 and df2 is the total observations minus the number of groups.

Pvalr asks for df1 (and df2 for F) without naming the formula. That is deliberate. The calculator cannot tell a pooled two-sample T from a Welch design, and quietly assuming one would be worse than asking you to supply the value. Compute the statistic and df in the same software you used for the rest of the analysis (R's t.test, scipy.stats.ttest_ind, the statsmodels library at Statsmodels.org, or SPSS output), then paste both in to confirm.

Borderline results: what p = 0.049 actually means

A p of 0.049 is not meaningfully different from a p of 0.051. The 0.05 threshold is a convention from R. A. Fisher's 1925 Statistical Methods for Research Workers, where he called it “convenient” rather than principled. Calling one result significant and the other not, because they sit on opposite sides of an arbitrary line, is the failure mode the ASA statement reacts against.

Pvalr's verdict block says "significant at alpha = 0.05" when the computed p is at or below the threshold, because the convention is too widely used to refuse. The recommendation we keep giving in our own work: report the exact p to three significant figures (0.049, not "p < 0.05"), include the effect size or confidence interval next to it, and let the reader judge the borderline. The 2018 ASA "Moving to a World Beyond p < 0.05" supplement collects 43 papers on alternatives. The most actionable is the simplest: state the p, state the effect, and stop pretending 0.05 is a wall.

Alpha is not fixed at 0.05 either. Particle physics uses the 5-sigma standard (p < 3e-7); parts of biomedicine argue for 0.005. Pvalr's alpha control goes from 0.001 to 0.10 so the threshold can match your field or pre-registration.

Frequently Asked Questions

How do I calculate a p-value from a test statistic?

Pick the distribution that matches your test: Z for large-sample normal tests, T for small-sample means with unknown variance, Chi-square for goodness-of-fit and independence, and F for ANOVA and regression overall significance. Enter the test statistic, any required degrees of freedom, and choose one-tailed or two-tailed. Pvalr computes the cumulative probability using jStat in your browser and returns the p-value plus a plain-English verdict at your chosen alpha. No lookup tables, no manual interpolation, and no need to open R or SPSS just to resolve a single number.

What is the difference between a one-tailed and two-tailed test?

A one-tailed test evaluates whether the test statistic falls in a single tail of the distribution — useful when you have a directional hypothesis like "the mean is greater than 100". A two-tailed test evaluates both tails and is appropriate when you only care whether the parameter differs from the null value in either direction. Two-tailed p-values are roughly double the one-tailed equivalent for symmetric distributions (Z, T). Pvalr lets you toggle between the two for Z and T distributions; Chi-square and F are inherently one-tailed because the statistics are non-negative.

What does a p-value actually mean?

A p-value is the probability of observing a test statistic at least as extreme as the one you computed, assuming the null hypothesis is true. A small p-value (traditionally under 0.05) means the observed data would be unlikely if the null were correct, giving evidence against it. It is not the probability that the null hypothesis is true, not the probability your result is "real", and not a measure of effect size. Pvalr shows the raw p-value plus an interpretation relative to your alpha, but interpreting significance in context — sample size, prior evidence, effect magnitude — is on you.

Which degrees of freedom should I enter?

For a one-sample T-test, df = n − 1. For a two-sample T-test with pooled variance, df = n1 + n2 − 2. For Chi-square goodness-of-fit, df = k − 1 where k is the number of categories; for a contingency table, df = (rows − 1) × (columns − 1). For ANOVA F-tests, there are two degrees of freedom: numerator (groups − 1) and denominator (total observations − groups). Pvalr prompts for the right df inputs per distribution so you do not have to remember all the variants — but you still need to know which test design you ran.

Is Pvalr accurate enough for published research?

Pvalr uses jStat's implementations of the Z, T, Chi-square, and F cumulative distribution functions, which are the same algorithms that back most stats teaching software. The numbers match R's pnorm, pt, pchisq, and pf to roughly 10+ significant digits for typical inputs. For research publication, most journals expect the analysis to be run in a reproducible environment (R, Python, SAS, SPSS) with code or output files attached — Pvalr is excellent for verification, spot-checking a reviewer comment, or teaching the calculation, but the official analysis should live in your stats package of record.

How do I choose an alpha level?

Alpha is the Type I error rate you are willing to accept — the probability of falsely rejecting a true null. The conventional defaults are 0.05 for most social and biological sciences, 0.01 for more conservative medical or regulatory work, and 0.001 or lower for exploratory genome-wide studies where you are running thousands of tests. Pvalr lets you set any alpha and reports whether your p-value clears that threshold. The choice should be made before you see the data; picking alpha after the fact to make a result significant is a well-known form of p-hacking.

Does Pvalr store my test statistics?

No. Every calculation runs client-side in your browser via jStat. Your test statistics, degrees of freedom, and alpha choices are never transmitted to a server, never written to a database, and never logged. Pvalr uses PostHog for anonymous page-view analytics only — which URL was visited and general navigation, not any of your numeric inputs. That means you can use it for sensitive research data without worrying about third-party exposure.

How do I find the p-value for a t-test given a t-statistic and degrees of freedom?

Select the T distribution in Pvalr, enter your t-statistic (for example, 2.31), enter your degrees of freedom (for a one-sample test, df = n − 1), and choose one-tailed or two-tailed based on your hypothesis. Pvalr computes the cumulative t-distribution in your browser using jStat and returns the p-value plus a plain-English verdict at your chosen alpha. This matches R's pt() function and SPSS output to roughly 10 significant digits, so you can use it to verify a result you already have or to produce one when you do not have a stats package handy.

How do I calculate a one-tailed p-value?

Select your distribution (Z or T), enter the test statistic, set any required degrees of freedom, and switch the tail direction to one-tailed. Pvalr returns the single-tail area beyond your statistic — the probability of observing a value at least that extreme in one direction under the null. Use one-tailed when your hypothesis specifies a direction (e.g., "the mean is greater than 100"); otherwise use two-tailed. Chi-square and F tests are inherently one-tailed because their statistics are non-negative, so the tail selector is hidden for those.

How do I find the p-value for a chi-square test?

Select the Chi-square distribution, enter your chi-square statistic, and enter the degrees of freedom. For a goodness-of-fit test, df = k − 1 where k is the number of categories. For a contingency table test of independence, df = (rows − 1) × (columns − 1). Pvalr returns the upper-tail probability — the chi-square test is always one-tailed because the statistic is non-negative. A small p-value means the observed frequencies deviate from the expected frequencies more than chance alone would predict.

No accounts. No data collection.

Pvalr is free and ad-supported. There are no sign-ups and no email captures. We use PostHog for anonymous page-view analytics and that is it. Full details in the Privacy Policy.

For educational use. A low p-value means the result would be unlikely under the null hypothesis. It does not measure effect size or practical importance. Interpret results in context and consult a statistician for high-stakes decisions.