Your p-value,
plain and simple.
Pvalr is a free p-value calculator that runs entirely in your browser. Pick a distribution, enter your test statistic, get the result. Nothing to install, nothing to sign up for.
A paper reports t = 2.05, df = 18, p = 0.027, and the methods section never says whether the test was one-tailed or two-tailed. Two-tailed gives p = 0.055. One-tailed in the predicted direction gives p = 0.027. One crosses 0.05 and one does not, and the choice was almost certainly made after the data came in. That is the most common p-value error in working science, and most of the others are variations on the same theme: the number is right for the calculation that was run, wrong for the one that should have been.
Pvalr is built around the five mistakes below.
Two-tailed vs. one-tailed: where most errors start
If a hypothesis predicts a direction before data collection (drug A reduces blood pressure more than placebo), a one-tailed test is defensible. If the hypothesis is non-directional (drug A produces a different blood pressure than placebo), it has to be two-tailed. The one-tailed p is exactly half the two-tailed value when the statistic sits in the predicted tail, so the temptation to switch after seeing the data is real, and the audit trail is almost invisible.
Type t = 2.05 with df = 18 into Pvalr, leave the tail selector on two-tailed: p = 0.0548. Switch to one-tailed: p = 0.0274. Same data, two conclusions at alpha = 0.05.
Pick the tail before you compute the statistic, write it into the pre-registration or lab notebook, and keep it. The garden-of-forking-paths critique (Gelman and Loken, 2013) is about exactly this kind of post-hoc switch, multiplied across hundreds of small analytic choices.
What the p-value does not tell you
A p-value is the probability of observing a test statistic at least as extreme as the one you computed, assuming the null hypothesis is exactly true. That definition rules out almost every popular interpretation.
It is not the probability the null is true. It is not the probability your result was due to chance. It is not an effect size. A p of 0.001 from n = 50,000 can correspond to a correlation of 0.014: real, but uninteresting. A p of 0.08 from n = 12 can correspond to a correlation of 0.52: interesting, but not conclusive. The American Statistical Association's 2016 statement lays this out across six principles. The one our users hit most often is principle four, that a p-value does not measure the size of an effect or its importance.
The disclaimer in the result panel exists for this reason. Pvalr reports a number and a verdict at the alpha you set. It does not report effect size, confidence interval width, or power, and none of those can be recovered from the p-value alone.
Choosing between Z and T when both seem valid
The Z test assumes the population standard deviation is known. In practice it almost never is. Most teaching examples that “use Z” are quietly substituting the sample standard deviation, which is what the T distribution was invented for in 1908 (Gosset, writing as “Student,” in Biometrika 6:1).
The decision table below is the one we keep pinned next to the build, because the framing in most stats textbooks is muddier than it needs to be.
| Situation | Sample size | Pop. SD known? | Use | Why |
|---|---|---|---|---|
| Single mean | n < 30 | No | T, df = n-1 | T accounts for estimating sigma |
| Single mean | n >= 30 | No | T, df = n-1 (Z acceptable) | T converges to Z as df grows |
| Single mean | Any | Yes (rare; calibrated QC) | Z | Textbook Z; assumption satisfied |
| Proportion | n*p and n*(1-p) >= 10 | N/A | Z | Normal approximation to binomial |
| Goodness-of-fit, contingency | Expected cell >= 5 | N/A | Chi-square | Observed vs. expected counts |
| Variance ratio, ANOVA | N/A | N/A | F, two df values | Ratio of two scaled chi-squares |
A working rule for coursework: if you computed your test statistic by dividing by s (the sample standard deviation), use T. If you divided by sigma (a population value handed to you in the problem), use Z. Mixing these up is the second-most-common error our users self-report when a pasted number does not match their answer key.
Degrees-of-freedom mistakes in T, chi-square, and F
Degrees of freedom is where p-values quietly go wrong when the calculation itself is right.
For a one-sample T test, df is n minus 1. For an independent two-sample T with pooled variance, df is n1 plus n2 minus 2. For Welch's T (unequal variances), df is the Welch-Satterthwaite approximation, almost always a non-integer (for example, df = 17.34). For a chi-square test of independence on an r-by-c table, df is (r minus 1) times (c minus 1), not the total cell count. For an F test in one-way ANOVA, df1 is the number of groups minus 1 and df2 is the total observations minus the number of groups.
Pvalr asks for df1 (and df2 for F) without naming the formula. That is deliberate. The calculator cannot tell a pooled two-sample T from a Welch design, and quietly assuming one would be worse than asking you to supply the value. Compute the statistic and df in the same software you used for the rest of the analysis (R's t.test, scipy.stats.ttest_ind, the statsmodels library at Statsmodels.org, or SPSS output), then paste both in to confirm.
Borderline results: what p = 0.049 actually means
A p of 0.049 is not meaningfully different from a p of 0.051. The 0.05 threshold is a convention from R. A. Fisher's 1925 Statistical Methods for Research Workers, where he called it “convenient” rather than principled. Calling one result significant and the other not, because they sit on opposite sides of an arbitrary line, is the failure mode the ASA statement reacts against.
Pvalr's verdict block says "significant at alpha = 0.05" when the computed p is at or below the threshold, because the convention is too widely used to refuse. The recommendation we keep giving in our own work: report the exact p to three significant figures (0.049, not "p < 0.05"), include the effect size or confidence interval next to it, and let the reader judge the borderline. The 2018 ASA "Moving to a World Beyond p < 0.05" supplement collects 43 papers on alternatives. The most actionable is the simplest: state the p, state the effect, and stop pretending 0.05 is a wall.
Alpha is not fixed at 0.05 either. Particle physics uses the 5-sigma standard (p < 3e-7); parts of biomedicine argue for 0.005. Pvalr's alpha control goes from 0.001 to 0.10 so the threshold can match your field or pre-registration.
Frequently Asked Questions
How do I calculate a p-value from a test statistic?
What is the difference between a one-tailed and two-tailed test?
What does a p-value actually mean?
Which degrees of freedom should I enter?
Is Pvalr accurate enough for published research?
How do I choose an alpha level?
Does Pvalr store my test statistics?
How do I find the p-value for a t-test given a t-statistic and degrees of freedom?
How do I calculate a one-tailed p-value?
How do I find the p-value for a chi-square test?
No accounts. No data collection.
Pvalr is free and ad-supported. There are no sign-ups and no email captures. We use PostHog for anonymous page-view analytics and that is it. Full details in the Privacy Policy.
For educational use. A low p-value means the result would be unlikely under the null hypothesis. It does not measure effect size or practical importance. Interpret results in context and consult a statistician for high-stakes decisions.