Why SQL Formatters Still Disagree on the Same Query

Take a five-line query with a join and a CTE. Paste it into four different SQL formatters. You get four visibly different results.

WITH active_users as (select id, email from users where status='active')
select au.email, count(o.id) as order_count from active_users au left join orders o on o.user_id = au.id group by au.email order by order_count desc;

Run it through pgFormatter, the sql-formatter npm package, the Aiven web formatter, and the Redgate online formatter. Same input, four outputs. pgFormatter right-aligns FROM, LEFT JOIN, GROUP BY, and ORDER BY to a visual line down the middle of the page. The sql-formatter npm package puts every keyword flush left and indents two spaces underneath. Aiven's tool does something similar but with four-space indents. Redgate's tool lowercases the keywords and offers a leading-comma option in the SELECT list.

None is wrong. They are all picking from a menu of decisions the SQL standard never made.

What SQL-92 actually settled, and what it left alone

ISO/IEC 9075, the formal SQL standard, has gone through nine editions since 1986. The 1992 revision (commonly called SQL-92) is the one that formalized the explicit INNER JOIN and LEFT JOIN keyword syntax, replacing the comma-and-WHERE join pattern of SQL-86. SQL:1999 added recursive CTEs. SQL:2003 added window functions. SQL:2016 added row pattern matching. SQL:2023 added JSON functions and property graphs.

None of those editions said anything about how to lay the query out on a page.

The standard specifies that keywords are case-insensitive. SELECT, select, Select, and SeLeCt all parse identically. It says that identifiers can be quoted with double quotes, that whitespace between tokens is mostly irrelevant, and that line breaks are not part of the grammar. It says nothing about where the line breaks should go. Nothing about whether SELECT should be uppercase. Nothing about whether indentation should be two spaces, four spaces, or a tab. Layout is an informal convention that grew up around the language, decades after the language itself was standardized.

The earliest widely-cited style guide for SQL came in 2005, when Joe Celko's *SQL Programming Style* was published by Morgan Kaufmann. It codified a set of opinions: uppercase keywords, comma at the end of a line, joins indented under FROM. Those opinions reflect what was already common in the Oracle and DB2 shops Celko worked with, but they were never authoritative. They were one team's house style, written down. Twenty years later, the situation is the same, except now every shop has its own house style and every formatter ships with its own defaults.

Three schools of JOIN indentation

If you sort formatters by how they lay out joins, almost every tool ends up in one of three camps.

River style right-aligns the major clause keywords to a visual line running down the page:

  SELECT au.email, count(o.id)
    FROM active_users au
    LEFT JOIN orders o ON o.user_id = au.id
GROUP BY au.email

The visual river is the column where FROM, LEFT JOIN, and GROUP BY all line up. pgFormatter implements this as a configurable mode, and several internal-team SQL style guides at large companies use it. The argument for river style is that the eye lands immediately on the clause boundaries. You don't need to read the keyword to know which clause you're in.

Lead-edge style puts every clause flush left and indents the body underneath:

SELECT
  au.email,
  count(o.id)
FROM active_users au
LEFT JOIN orders o
  ON o.user_id = au.id
GROUP BY au.email

This is the default in the sql-formatter npm package, in the Aiven web tool, and in most of the formatter UIs built on top of either of them. The argument is that the left edge of the page is the easiest place for the eye to scan, and that consistency with how most other programming code is laid out matters more than visual cleverness.

Block style is the minority position: each clause keyword on its own line, body flush left underneath. You see it in some Snowflake and BigQuery shops where queries are generated by an ORM and the formatter is mostly trying not to fight with the upstream tool. Almost no automated formatter ships block style as a default.

The comma argument that won't die

The most-argued formatting question in SQL has nothing to do with joins. It is where the comma goes.

Trailing commas read like every other programming language:

SELECT
  email,
  created_at,
  status
FROM users

Leading commas put the punctuation at the start of each line after the first:

SELECT
  email
, created_at
, status
FROM users

The argument for leading commas, made repeatedly in the PostgreSQL and BigQuery communities since the early 2000s, is that they make it easier to comment out a line (no trailing-comma cleanup on the previous line) and easier to spot a missing or extra comma. They also work better in code review diffs: adding a new column adds one new line, not a one-character edit to the previous line. The argument against is that they look strange to most readers and they break the "comma terminates" convention that holds in nearly every other language.

The default in sql-formatter v15 is trailing, with a commaPosition: 'before' option for teams that want the other behavior. Some dbt-shop style guides pick leading. Others pick trailing. There has been no industry consensus for three decades and there is no working group converging on one. Both styles have working production codebases at large companies, and both have advocates who will argue at length about why the other camp is wrong.

UPPERCASE is a holdover that won't die

The case convention is the other long-running argument. SQL keywords are case-insensitive, so the choice is purely visual. The convention of writing keywords in uppercase comes from the punched-card era, when terminals could not produce lowercase letters and IBM database documentation was printed in all caps. It carried forward through COBOL into the first SQL implementations on mainframes and survived into the modern era through pure inertia.

The case axis is one of four where formatters reliably disagree. Here is how those four axes line up across some of the more visible reference points:

Decision	Conservative default	Modern alternative	Where the disagreement lives
Keyword case	UPPER (mainframe convention)	lower (ergonomic-readability argument)	Old codebases vs newer style guides
Comma placement	Trailing (most languages)	Leading (PostgreSQL / dbt circles)	Authoring ergonomics vs reading ergonomics
Indentation width	4 spaces (older docs and books)	2 spaces (modern web tools)	Print tradition vs screen real estate
JOIN layout	Lead-edge (most formatters)	River (pgFormatter, some internal guides)	Left-edge scanning vs clause-keyword scanning

The lowercase argument is mostly ergonomic. Lowercase is easier to read in large blocks of text, modern syntax highlighting can emphasize keywords without case, and most code lives in lowercase anyway. The uppercase argument is mostly cultural. It is what looks like SQL to anyone who learned the language before the mid-2010s, and the visual contrast between keyword and identifier still matters when there is no syntax highlighting (in a Slack message, a pull request comment, a log line, a printed report).

Most automated formatters split the difference. They default to UPPER and ship a keywordCase: 'lower' option. Teams pick whichever side matches their existing codebase. The decision is sticky. Once a codebase has tens of thousands of queries in one style, switching is expensive enough that it rarely happens.

Where the dialects pull in different directions

Even if every team agreed on keyword case and JOIN indentation, the fourteen SQL dialects that show up in commercial databases pull formatters apart through their own syntactic quirks.

PostgreSQL identifiers are quoted with double quotes ("user_id"). MySQL and MariaDB use backticks. SQL Server uses square brackets. All three are syntactically incompatible. A formatter that wants to be dialect-aware has to track the active dialect through the entire query, change the quote character correctly, and not break when string literals happen to contain the wrong kind of quote.

PostgreSQL added dollar-quoted strings in version 7.4, released November 2003, for embedding multi-line text without escaping. A naive formatter that does not know about $tag$ ... $tag$ will mis-tokenize the contents as SQL and produce garbage output. The sql-formatter npm package handles this correctly. Some older Perl-based and PHP-based formatters don't.

BigQuery has its own hazards. Backticks for table names. || as string concatenation in some modes and the same || as logical OR in MySQL's default mode, which means a cross-dialect formatter cannot tokenize || without knowing the dialect first. EXCEPT as a SELECT modifier rather than a set operator. Snowflake adds $$ ... $$ blocks for stored procedure bodies that should be left alone entirely. Oracle PL/SQL has its own BEGIN ... END; block syntax. T-SQL has variables prefixed with @. Each dialect quirk is another opinion the formatter has to encode.

What sql-formatter v15 chose as the middle ground

The sql-formatter npm package, at v15, around 2.8k GitHub stars, MIT-licensed, with roughly 18.5k dependents on npm, is the closest thing the ecosystem has to a default. Its choices are visible in the published configuration: uppercase keywords, two-space indentation, trailing commas, lead-edge JOINs, fourteen supported dialects with their parsing rules wired in. Those choices are not neutral. They are an opinion, shaped by what the maintainers saw as the most defensible position when each option was added.

The library exposes most of the disagreements as configuration. A team can override keywordCase, tabWidth, indentStyle, commaPosition, linesBetweenQueries, and a dozen others. But the defaults are what most users get without reading the docs, and the defaults shape what most SQL on the modern web in 2026 ends up looking like. When a CodeBeautify clone or an Aiven-style web tool ships, it almost always inherits sql-formatter's defaults wholesale and exposes maybe three of those options in the UI.

That is the actual reason two SQL formatters disagree on the same query. They are inheriting different opinions from different decades, encoded into different defaults, applied by different parsers to a standard that never specified the layout. The output is consistent within any one formatter and inconsistent across all of them, which is the historical accident the standard left behind.

To see how a particular query reformats under a specific set of those choices, with keyword case, indentation, and dialect exposed as live controls and a before/after diff showing which lines moved, Tidysql runs sql-formatter v15 in the browser. Useful when you are trying to figure out which of the four disagreements above is the one your team actually cares about.

tidysql

Format SQL in seconds, right in the browser.

Try tidysql

ToklenDeveloper Tools

Why SQL Formatters Still Disagree on the Same Query

What SQL-92 actually settled, and what it left alone

Three schools of JOIN indentation

The comma argument that won't die

UPPERCASE is a holdover that won't die

Where the dialects pull in different directions

What sql-formatter v15 chose as the middle ground

Related posts

Why GPT, Claude, and Gemini Disagree on the Same Token Count

Unix Time Isn't UTC: How POSIX Skipped Leap Seconds and the 2038 Cliff Ahead

RBAC0 Through RBAC3: What the NIST Standard Actually Defines