AppCrib
Developer Tools

JSONL vs NDJSON vs JSON Sequences: When the Newline-Delimited Formats Diverge

Domain knowledge·Published by AppCrib··
RowsonCSV to JSON in one paste.

A common stumble: you have a file of one-JSON-object-per-line records, your pipeline tool says it accepts NDJSON, you point it at the file, and the first record fails to parse with Unexpected token on a control character. The file was produced as a JSON Text Sequence, which prefixes every record with U+1E (record separator). Your NDJSON parser is splitting on newlines, hits the U+1E byte at the start of the first chunk, and chokes. The file is valid. The parser is correct. The two formats are not interchangeable, even though they encode the same logical data and use the same record shape.

The category of "newline-delimited JSON" hides three formats that look identical until they don't: JSONL, NDJSON, and JSON Text Sequences (RFC 7464). They share a common goal (streamable, append-only records of JSON values) and diverge on the bytes that separate the records. The differences matter the moment a producer and consumer disagree about which format is in play.

What each format actually requires

JSONL is the most common of the three. The spec lives at jsonlines.org and predates a formal RFC by about a decade. It defines a file as a sequence of valid JSON values, each on its own line, separated by \n (U+0A). Each record must fit on a single line; pretty-printed JSON is not allowed. UTF-8 is mandatory. The file extension is .jsonl. There is no leading byte, no trailing record terminator, no header. Just JSON values with newlines between them.

NDJSON (Newline-Delimited JSON) has its own spec at ndjson.org, written around the same time as JSONL. The rules are functionally identical: one JSON value per line, UTF-8, no leading byte, no internal newlines within records. The only meaningful spec-level differences are that NDJSON allows blank lines (they should be ignored by parsers) and explicitly mandates that producers emit \n rather than \r\n for the line separator. In practice, every JSONL parser will read NDJSON and vice versa.

JSON Text Sequences are different. RFC 7464, published by the IETF in February 2015, defines a format where each JSON value is prefixed by U+1E (record separator) and optionally followed by U+0A (line feed). The U+1E prefix is mandatory. The registered MIME type is application/json-seq rather than application/x-ndjson. Because U+1E unambiguously marks the start of each record, the JSON value itself is allowed to contain internal newlines. Pretty-printed JSON is legal in a sequence file in a way it is not in JSONL.

Where the three formats came from

JSONL was popularized around 2013 as a convention for log files and streaming exports. The site at jsonlines.org is essentially a single-page spec maintained by the community. It became the dominant format in machine learning workflows because every major training framework reads "one example per line" without configuration. OpenAI's fine-tuning API, Anthropic's batch API, and the Hugging Face Datasets library all consume JSONL by default.

NDJSON appeared around the same time in the JavaScript ecosystem. The original spec at ndjson.org saw heavy use in D3-adjacent tools (TSV/CSV/JSON streaming) and CouchDB's _changes feed. The format is functionally JSONL, but the name spread independently in the Node.js community. Most tools that say "NDJSON" today mean exactly the same bytes as JSONL.

JSON Text Sequences came from the IETF in 2015. The motivation was a recognized gap: existing newline-delimited JSON formats could not handle pretty-printed records, and there was no MIME type registered for JSON streaming. RFC 7464 solved both. The U+1E prefix is the same control byte ASCII has used as a record separator since the standard's first publication in 1963, so the design is conservative rather than novel.

FormatSpecYearRecord separatorPretty-print allowedMIME type
JSONLjsonlines.org (informal)~2013\nNoapplication/jsonl (unofficial)
NDJSONndjson.org (informal)~2013\nNoapplication/x-ndjson (unofficial)
JSON SequencesRFC 74642015U+1E before record, optional \n afterYesapplication/json-seq (registered)

The unprefixed-vs-prefixed parsing problem

The practical wedge between JSONL/NDJSON and JSON Sequences is how a parser locates record boundaries.

A JSONL parser reads bytes until it sees \n, then parses what came before as a JSON value. This works only if no record contains an unescaped newline. JSONL enforces that constraint at the producer side: each record must serialize to a single line. The parser is simple, fast, and brittle. A pretty-printed JSON value with internal newlines will fragment into garbage.

A JSON Sequences parser reads U+1E, then parses the next JSON value (which may contain newlines, indentation, anything legal in JSON), then reads U+1E again or hits EOF. The parser is slightly more complex but tolerates pretty-printed records, multi-line objects, and any whitespace JSON itself allows.

A file that has been written as a sequence but read as JSONL will fail on the leading U+1E byte. A file written as JSONL but read as a sequence will fail on the first record because no U+1E ever appears. There is no Postel-style "be liberal in what you accept" fallback. The two formats are byte-incompatible despite both being "newline-delimited JSON."

Where consumers actually disagree

If a pipeline says it accepts JSONL, the safe assumption is: no leading byte, \n between records, no internal newlines, UTF-8 encoded. OpenAI's fine-tuning uploader is strict about this. A file rejected with a Unicode error has almost always been produced with the wrong format setting somewhere upstream.

If a pipeline says it accepts NDJSON, the same assumption usually holds, but some tools that emit application/x-ndjson over HTTP also accept blank lines as record separators (the NDJSON spec allows it). Streaming consumers like fetch event-stream readers may treat a double-newline as a flush signal, which JSONL parsers don't do.

If a tool says JSON Sequences or sends application/json-seq, the U+1E delimiter is mandatory and pretty-printed JSON inside each record is legal. Kubernetes' streaming watch endpoints, several IETF protocols, and some scientific data formats use this. A consumer expecting JSONL will read the entire file as one record and reject it.

The fastest way to identify which format a file is in: open it in a hex viewer and look at the first byte. If it's 7B ({) or 5B ([), the file is JSONL or NDJSON. If it's 1E, the file is a JSON Text Sequence. There is no other ambiguity at the boundary.

Why three names persist for nearly the same idea

JSONL and NDJSON would have merged into one name a decade ago if the two specs had emerged in the same community. They did not. JSONL came out of the Python and ML communities. NDJSON came out of JavaScript and Node. Search results for either term return the same Stack Overflow answers, the same parser libraries, and frequently the same authors using both names interchangeably depending on the language they're writing in.

JSON Sequences exists because the IETF needed a registrable MIME type and a format that handled pretty-printing. The IETF process does not retroactively standardize community conventions; it produces a parallel spec. So now there are three. The naming will not converge until one consumer ecosystem dies, which has not happened in the eleven years since RFC 7464 shipped.

How CSV-to-JSON tools pick a variant

A CSV-to-JSON converter outputting "JSONL" should produce: bytes starting with { or [, one JSON value per line, \n separator, no leading U+1E, no internal newlines within records. The .jsonl extension is conventional. If the tool offers a "JSON Sequences" mode separately, it should prefix records with U+1E and use .json-seq or the explicit application/json-seq content type.

In practice, most converters and editors only emit JSONL. ConvertCSV labels its output as JSONL and produces JSONL bytes. ConversionTools.io labels output as NDJSON but produces functionally identical bytes. Neither emits RFC 7464 sequences because the consumer demand is heavily skewed toward LLM training pipelines and log ingestion tools, both of which expect JSONL bytes. A tool that produces "newline-delimited JSON" without naming the variant is almost always producing JSONL bytes, regardless of which label sits in the dropdown.

If you spend any time moving rows between spreadsheets and streaming pipelines, the difference between these three formats stops being academic the first time a consumer rejects a file that looked correct. Rowson outputs JSONL bytes when you pick the JSONL format, which is the variant every LLM fine-tuning API and log ingestor accepts without configuration. It will not produce JSON Sequences, because almost no one is asking for them.

Rowson
CSV to JSON in one paste.
Try Rowson