AppCrib
Design Tools

K-Means vs Median-Cut vs Octree: How Color Quantization Algorithms Pick Different Palettes From the Same Image

Domain knowledge·Published by AppCrib··
:rt
HexrootDrop an image, paste a :root.

Pick any image — a sunset photo, a brand mockup, a UI screenshot. Run it through three free palette extractors, ask each for a six-color palette, and lay the results next to each other. The three palettes won't agree. One leans toward the dominant background tones. One surfaces unexpected mid-tones from a corner of the image. One produces something that reads as an average of the other two.

This isn't a bug. The tools are running different algorithms, usually one of median-cut, octree quantization, or k-means clustering, and "what counts as a dominant color" turns out to be a question with three reasonable answers.

The differences are most visible in exactly the cases where palette extraction is hardest: photos with a strong dominant background, brand assets with subtle accent colors, illustrations with hand-picked color choices. The algorithms have known failure modes. Picking the right one for the image type is the difference between "this palette looks weird" and "this palette is usable."

The same image, three different palettes

Color quantization is the technical name for the problem of reducing an image with millions of unique RGB values to a palette of N representative colors. There is no closed-form solution. There are effectively an infinite number of valid N-color subsets the algorithm could pick, and the job is to pick one that minimizes the visible difference between the original and a version recoloured to use only the chosen palette.

Three algorithms dominate practical use:

  • Median-cut, designed by Paul Heckbert at MIT in 1980 and published at SIGGRAPH 1982 in the paper "Color Image Quantization for Frame Buffer Display." Heckbert was solving the problem of displaying full-color images on early 8-bit displays that could only show 256 colors at a time. Median-cut is what most modern libraries reach for when speed matters and the input is photographic.
  • Octree quantization, published by Michael Gervautz and Werner Purgathofer in 1988 in "A Simple Method for Color Quantization: Octree Quantization." Octree was designed to be streamable. The palette can be built as the image is decoded, without holding the entire pixel set in memory.
  • K-means clustering predates both as a general algorithm. Stuart Lloyd's original formulation dates to 1957 at Bell Labs; James MacQueen's 1967 paper coined the "k-means" name. Applied to color, k-means treats every pixel as a point in 3D RGB (or CIELAB) space and finds N cluster centroids.

Three algorithms, three different definitions of "the right palette." The differences come down to what each one optimizes for and what each one ignores.

What median-cut is doing under the hood

Median-cut treats the image's pixels as a cloud of points in RGB space. The algorithm starts by putting all the pixels into a single "box" — the smallest 3D box that contains every color in the image. Then it cuts the box in half along its longest axis at the median value, splitting the pixels into two new boxes of equal pixel count. It picks the longer of the two, cuts that one again, and so on until there are N boxes. The final palette is the average color of the pixels inside each box.

The clever bit is the *median* part. Cutting at the median (not the midpoint) gives every box the same number of pixels regardless of how the image's colors are distributed. A photo with a huge expanse of blue sky and a small splash of red won't end up with a palette that's almost all blue. The median split forces the algorithm to spend some of its color budget on the underrepresented values.

That's also median-cut's failure mode. If a color is genuinely rare (appears in three pixels out of two million), median-cut won't surface it. Those three pixels get absorbed into a larger box and lose their identity. For most photos this is the right call. For brand assets with a tiny but important accent color, median-cut can miss the accent entirely.

Heckbert's 1982 paper described two variants. Strict median-cut splits at the median pixel. Modified median-cut weights the split toward areas of high variance, improving results on images with strong color clusters at a small speed cost. Most production implementations, including the color-thief JavaScript library and Leptonica's quantization routines, use the modified variant, usually without making the distinction explicit.

Where octree quantization fits

Octree quantization is the streaming option. It builds a tree where each node represents a region of RGB space. The root represents the entire space. Each node has up to eight children, one for each octant of the parent's region. As pixels stream in, each one walks down the tree, creating new nodes as needed.

The palette emerges by pruning. Once the tree has more than N leaf nodes, the algorithm starts merging the leaves with the lowest pixel counts back into their parents. The final N leaves become the palette.

Octree's strength is memory. The tree never has to hold the full pixel set, only the structure of the color distribution. The weakness is that the regions are fixed by the octree subdivision, which doesn't always line up with where the actual color clusters are. A dominant teal that straddles the boundary between two octree cells can end up split across two palette entries that are perceptually almost identical, wasting palette budget.

Octree also has an asymmetry that surprises people. The output depends on pixel order. Two implementations of "the same" octree algorithm can produce different palettes from the same image if one reads pixels left-to-right and the other reads them top-to-bottom. The merging-which-cell-first decision when leaf counts tie is order-dependent.

Why k-means feels different (and slower)

K-means is iterative. The algorithm starts by picking N initial colors, often at random, sometimes by k-means++ initialization, which spreads the initial centers more intelligently across the distribution. Then it loops. Assign every pixel to its nearest center. Move each center to the mean of its assigned pixels. Repeat until the centers stop moving, or until a max iteration count hits.

The result is a palette where each color is the *centroid* of a cluster of pixels: the true average of the cluster, not a representative chosen by splitting boxes or pruning trees. This tends to produce palettes that look more visually balanced for images with clean, well-separated color regions. Brand assets and vector illustrations are where k-means tends to win.

The cost is speed. Median-cut and octree finish in a single pass over the pixels. K-means takes anywhere from five to a few hundred iterations to converge, and each iteration is a full scan of every pixel. On a 4-megapixel image, that's the difference between roughly 30 milliseconds and 800 milliseconds. Client-side palette extractors in the browser almost universally pick median-cut for this reason. K-means would stall behind every slider drag.

K-means also has a less-discussed failure mode. The random initialization means two runs of the same algorithm on the same image can produce slightly different palettes. With a good initialization (k-means++) the variance shrinks but doesn't vanish entirely. For a tool that needs to be reproducible, like a CI pipeline generating tokens from a brand asset, this is occasionally a real problem.

A side-by-side at six colors

Here are the six-color palettes three different algorithms produce from the same input: a 1200×800 photo of a sunset, dominant warm tones with a few cool highlights.

AlgorithmOutput palette (HEX, lightest to darkest)Notable
Modified median-cut#f7c873 #e89a48 #c5663d #8a3c2f #4d2b29 #1f1814Even tonal spread; missed a small lavender cloud entirely
Octree#fef2d0 #f5c66e #d9824a #a05036 #6a3128 #2c1c1cCaught a near-white that's only ~80 pixels in the original
K-means (k-means++, 50 iters)#f4b863 #db8b48 #b3563b #ad7560 #75332b #3a2426More gradient-like; the warm mid-tone is a true centroid, not present in the median-cut output

None of the three is "wrong." Median-cut produced the cleanest equal-step tonal ramp, which is probably what most users want for a default. Octree caught a highlight the others missed, which is great if you want to preserve the brightest pixel and a problem if that pixel is a JPEG compression artifact. K-means produced the most perceptually balanced palette, at roughly 25× the runtime.

The differences are subtle in absolute terms. Every palette is recognizably "the sunset's colors." But visible enough that the choice of algorithm changes what gets pasted into a stylesheet. For a designer iterating on a brand palette, those small shifts matter.

When each algorithm wins

Trade-offs, condensed:

  • Median-cut is the right default. Fast, deterministic, produces visually balanced palettes for photographic input. Used by color-thief (the most popular browser-side library) and by ImageMagick's default -colors flag. Pick this unless there's a reason not to.
  • Octree wins when memory is the constraint: embedded systems, very large images that don't fit in RAM, streaming decoders. Modern web tools rarely need this, but ImageMagick still exposes octree-based quantization for users who want it.
  • K-means wins when palette quality matters more than runtime and the image has clean, distinct color regions. Brand asset extraction in a build pipeline is the canonical case. Photoshop's Indexed Color "Perceptual" mode and Leptonica's general-purpose quantizer with the right flags use k-means or k-means-derived approaches.

A footnote on a fourth class. *Neural quantization* uses a small trained network to produce a palette. NeuQuant (Anthony Dekker, 1994) was the first widely used neural approach and is still in some GIF encoders. Quality is good but rarely better than well-tuned k-means, and runtime is unpredictable. Most production work has moved back to the classical algorithms.

What "dominant colors" actually means

Every algorithm above is approximating an undefined question. There is no agreed-on definition of "the dominant colors of an image." It can mean the colors that cover the most area. Or the colors a human would describe as visually striking. Or the colors that, when used as a palette, produce the least visible error when the image is recoloured. These are different objectives, and a single algorithm can only optimize for one.

That's also why every tool you compare disagrees. They aren't all wrong. They're answering different questions. The right palette for "tell me the colors in this image" is different from the right palette for "give me a set of colors to use in a design that should feel related to this image," and different again from "give me the colors that will reproduce this image with the least visible degradation if I downsample to N colors." Most tools don't tell you which question they're answering. The ambiguity is in the field, not in any one implementation.

If you want to see what modified median-cut produces from your own images without setting up a build environment, Hexroot runs it in the browser and exposes a slider that re-extracts at sizes from three to twelve. Useful for getting a feel for how palette size interacts with the choice of algorithm before committing to a workflow.

:rt
Hexroot
Drop an image, paste a :root.
Try Hexroot