Delimiter
Input CSV
Output JSON
Output will appear here

Reference

CSV dialect and JSON output reference

Rowson above converts CSV, TSV, and pipe-delimited input into four JSON shapes in the browser. The reference below covers the underlying specs: which CSV dialects exist, what their quoting and escape rules are, what each JSON output format is for, and which parser libraries implement which behavior.

CSV dialect overview

DialectField separatorRecord separatorQuote charEscape ruleDefined by
RFC 4180,\r\n"double the quote: ""IETF RFC 4180 (2005)
Excel CSV, (locale-dependent)\r\n"double the quotede facto, Microsoft Office
TSV (IANA)\t\r\n or \noptional "backslash or doubled quoteIANA text/tab-separated-values
TSV (Unix)\t\nnonetabs and newlines forbidden in fieldsde facto, Unix tools
MySQL CSV (OUTFILE),\n"backslash: \"MySQL SELECT INTO OUTFILE
PostgreSQL COPY CSV,\n"double the quotePostgreSQL COPY ... CSV
European semicolon;\r\n"double the quoteExcel, locales using , as decimal separator
Pipe-delimited|\r\n or \nusually nonevaries; often forbid pipes in fieldsde facto, mainframe and ETL exports
ASCII record separator\x1F (US, unit) and \x1E (RS, record)\x1Enonenon-printable separators avoid collisionASCII control characters (1963)

The most common silent failure happens at the boundary between RFC 4180 (quote-doubled) and MySQL (backslash-escaped). A field containing " survives the parser the original tool wrote it with, then breaks when read by the other dialect’s parser.

Field separator characters at a glance

CharacterHexDecimalUsed byDetection priority in Rowson
Comma0x2C44RFC 4180, Excel (en-US), MySQL, PostgreSQL1
Tab0x099IANA TSV, Unix TSV2
Semicolon0x3B59Excel (most European locales)3
Pipe0x7C124mainframe, ETL, Apache Hive default4
US (unit sep)0x1F31ASCII record format, rare modernnot detected
RS (record sep)0x1E30ASCII record format, rare modernnot detected
Caret0x5E94some legacy financial exportsnot detected
Colon0x3A58/etc/passwd, rare for data exportsnot detected

Rowson detects the top four. Inputs using one of the bottom four need a pre-processing step that swaps the separator for a tab or comma before paste.

Quoting and escape rules by dialect

RFC 4180 / Excel / PostgreSQL COPY CSV

Quote a field when it contains a delimiter, a quote, or a CR/LF. Inside a quoted field, a literal quote is written as two adjacent quote characters.

id,name,note
1,"Smith, John","She said ""hello""."
2,Bob,"Line 1
Line 2"

A field with no special characters can be left unquoted. Producers that quote every field unconditionally also conform; consumers must accept both forms.

MySQL SELECT INTO OUTFILE

Default escape is backslash; the FIELDS ESCAPED BY clause overrides it.

id,name,note
1,"Smith, John","She said \"hello\"."
2,Bob,"Line 1\nLine 2"

Backslash-escaped CSV is not RFC 4180. Tools that round-trip via MySQL OUTFILE and then load with PapaParse default settings will mis-parse any field with a literal backslash.

TSV (IANA vs Unix)

The IANA spec backslash-escapes tab, newline, and backslash inside fields. The looser Unix convention forbids those characters entirely and uses no quoting at all, which is what awk -F'\t', cut, and sort assume.

Semicolon-delimited European CSV

Same quoting and escape rules as RFC 4180; only the field separator changes. Often paired with a decimal comma ( 1,50 instead of 1.50 ) in numeric fields, which stays as a string after parsing.

JSON output formats and where each is used

FormatSpecificationMIME typeFile extensionTypical consumer
Array of objectsRFC 8259 / ECMA-404application/json.jsonJavaScript apps, REST API request bodies
JSONL (JSON Lines)jsonlines.orgapplication/jsonl or application/x-ndjson.jsonl, .ndjsonBigQuery bq load, DuckDB read_json_auto, OpenAI fine-tuning, streaming pipelines
Array of arraysRFC 8259application/json.jsonChart libraries (Plotly, Highcharts), tabular React components
Keyed objectRFC 8259application/json.jsonLookup-table consumers, ID-indexed cache loads

The four formats are not interchangeable. Each one trades a property the others keep.

PropertyArray of objectsJSONLArray of arraysKeyed object
Preserves all rows when keys collideyesyesyesno
Preserves header namesyesyesfirst row onlyyes
Parseable line by linenoyesnono
Parseable as a single JSON valueyesnoyesyes
Round-trips back to CSV cleanlyyesyesyesno (collision lossy)
Preserves row order in strict JSON consumersyesyesyesnot for numeric-string keys
Supports streaming insertnoyesnono

JSONL is the only line-orientable format; the other three require the consumer to buffer the entire document before parsing. For files larger than a few hundred megabytes destined for a data warehouse, JSONL is the only viable choice.

Header row conventions

ConventionWhat the first row containsCommon in
Headers presentcolumn namesExcel exports, web app downloads, REST API mock fixtures
No headersdata starts at row 1sensor logs, raw exports from older databases, server access logs
Multi-row headers2+ rows of header text before datascientific data, financial reports
Headers commented#-prefixed rows above the dataUnix tools, R read.csv with comment.char="#"

Rowson assumes headers present. For headerless input, either prepend a synthetic header row or pick array-of-arrays output mode and let the downstream consumer assign keys. Multi-row headers are not supported by any common CSV parser and must be flattened by hand.

Character encoding and BOM behavior

EncodingBOM bytesWhat modern parsers doExcel behavior
UTF-8 (no BOM)noneparse as-isguess locale; often mis-decodes accented characters
UTF-8 with BOM0xEF 0xBB 0xBFPapaParse 5.4.1+, Python csv 3.9+, Go encoding/csv 1.20+: strip the BOMopen as UTF-8 reliably
UTF-16 LE0xFF 0xFEmost CSV parsers fail; pre-decode to UTF-8open as UTF-16 if BOM is present
UTF-16 BE0xFE 0xFFmost CSV parsers fail; pre-decode to UTF-8open as UTF-16 if BOM is present
Windows-1252 (CP1252)nonedecoded as Latin-1 by parsers that don’t sniffExcel’s default save on western Windows builds
Shift_JISnonerequires explicit decode step before parsingExcel’s default save on Japanese Windows

When Excel saves a CSV and the receiving tool reads it back as mojibake, the issue is almost always Windows-1252 vs UTF-8. The fix is to save from Excel as “CSV UTF-8 (Comma delimited)” rather than “CSV (Comma delimited)”.

CSV and JSON parser libraries by language

LanguageCSV libraryJSON output mode supportNotes
JavaScript / TypeScriptPapaParse (Rowson uses this)array of objects, header true/falseStreaming via step callback; auto-delimiter via delimiter: ""
JavaScript (alternative)csv-parse (node-csv)bothMore configurable; better for backend pipelines
Pythoncsv (stdlib)manual conversion to dict via DictReaderNo JSON output; pair with json stdlib
Pythonpandas.read_csvdf.to_json(orient="records") for AoO, orient="records", lines=True for JSONLThe de facto data-science tool
Goencoding/csv (stdlib)manual; pair with encoding/jsonReader exposes FieldsPerRecord for strict mode
JavaOpenCSV, Apache Commons CSVmanual; pair with JacksonOpenCSV supports CsvToBean for object mapping
RubyCSV (stdlib)CSV.read(..., headers: true).map(&:to_h) for AoOJSON.generate for the second step
Rustcsv (BurntSushi)manual; pair with serde_jsonStrict by default; loose mode via flexible(true)
PHPfgetcsv (stdlib)manual; pair with json_encodeReturns arrays only; key by hand for AoO
C# / .NETCsvHelperJsonConvert.SerializeObject for AoOAttributes drive header mapping
Rread.csv (stdlib)jsonlite::toJSON for AoO, stream_out for JSONLread_csv from readr is the modern alternative
Shellcsvkit (csvjson)both AoO and JSONL via flagscsvjson --stream for JSONL output

Common conversion failure patterns

SymptomCauseFix
Numeric columns become quoted strings in outputCSV has no numeric type; parser preserves stringsCast after parsing: Number(row.amount) or parseFloat
Leading zeros stripped on IDs in Excel-edited CSVExcel auto-coerces 00123 to 123 on cell editImport as text in Excel (Data → From Text/CSV → Column type: Text)
Boolean column reads as "true" / "false" stringsSame as numeric — no boolean type in CSVCast: row.active === "true" or normalize at write time
Date column reads as "2026-05-13" stringNo date type in CSVParse with new Date(row.created_at) or Date.parse; UTC vs local is the next trap
Field with both a literal newline and a delimiter parses wrongField is not quotedQuote the field at source; or pre-process to escape
Output JSON has fewer rows than the CSV hadKeyed-object mode collapsed duplicate first-column valuesSwitch to array of objects or JSONL
Special characters render as é instead of éUTF-8 file decoded as Latin-1Re-save source as UTF-8 with BOM
First field name appears as idUTF-8 BOM was not strippedStrip BOM in preprocessing: input.replace(/^/, "")
null values come out as the string "null"CSV has no null literal; parser preserves the stringNormalize at parse: treat empty, "null", "NULL", "\\N" (MySQL) as JSON null
JSONL output rejected by BigQueryTrailing blank line or non-UTF-8 byteStrip trailing newline; verify with iconv -f utf-8 -t utf-8

Related concepts

  • RFC 4180: the closest thing to a CSV specification. Two pages long. Describes the comma-quote-doubled-CRLF dialect that most CSV libraries default to. Does not cover encoding, headers, or escape variants.
  • JSON Lines (jsonlines.org): the informal spec for newline-delimited JSON. Predates RFC 7464 (application/json-seq) and is more widely used.
  • ND-JSON (application/x-ndjson): identical content to JSONL, different MIME type and trailing-newline rules. Often interchangeable.
  • text/csv MIME type: registered in RFC 7111, defines an optional charset parameter and a header=present|absent parameter that almost no producer sets.
  • CSV-LD / CSVW: W3C effort to add schema and semantics to CSV via a sidecar JSON manifest. Adopted in government open-data publishing; rare elsewhere.