JSON Validation: Beyond Pretty-Printing
Common JSON syntax mistakes, the difference between syntactic and semantic validation, and a practical introduction to JSON Schema for catching bad data before it reaches your database.
A JSON validator that only checks syntax catches about a quarter of the bugs that will hit your system in production. The rest are shape bugs: a number arrived where a string was expected, a required field was missing, an enum received a value your code never considered. Real validation means checking both syntax and shape, and doing it at the boundary of your system where the cost of a bad message is still recoverable.
Syntax: what JSON actually requires
JSON is stricter than JavaScript object literals, and that difference is the source of most syntax errors. The specification is short (RFC 8259, readable in an hour), but the rules people forget are:
- All strings are double-quoted. Single quotes are not legal. This is the number-one "but it works in Python / JavaScript" confusion.
- Object keys are strings, double-quoted. A bare identifier like
{name: "ada"}is not JSON. - No trailing commas.
[1, 2, 3,]is not JSON. The restriction is intentional — it avoids ambiguity with sparse arrays in some languages. - No comments. Not
// ..., not/* ... */. If you need comments, your format is JSONC (config files often allow it) or YAML, not JSON. - No
undefined. Usenullor omit the field entirely. - Numbers are finite.
NaN,Infinity,-0are not valid JSON numbers. Careful with JavaScript'sJSON.stringify; it emitsnullforNaN, which may not be what you want.
When to pretty-print, when to minify
Pretty-print during development, minify for the wire. The JSON Formatter on ToolPad does both. Formatted JSON is easier to diff visually and easier to read in logs; minified JSON is smaller and faster to parse. Most APIs send minified responses and accept either.
For APIs that return large arrays, you may see a third option: newline-delimited JSON (NDJSON). One JSON value per line, no enclosing array. Streamable and easy to tail, but not parseable by a plain JSON reader — you need to split on newlines first.
The shape-bug category
Once the JSON parses, you still have to believe what it says. Typical failure modes:
- A field that was supposed to be a number arrives as a string ("42"), because the sender rendered it with a locale-aware formatter or because a CSV-to-JSON pipeline did not coerce types.
- An enum field arrives with a new value. Version 2 of the producer added a new
statusstate, but your consumer was only prepared for the old three. - A field you thought was required is simply absent in some responses. There was no schema, so nobody enforced it; now your code calls
.toLowerCase()onundefinedin production. - A field has the right type but a pathological value: an empty string where non-empty was assumed, a date far in the future, a deeply nested object whose recursion blows your stack.
JSON Schema: a practical introduction
JSON Schema is a vocabulary, itself written in JSON, for describing the expected shape of a JSON document. A minimal schema for a user record:
{
"type": "object",
"properties": {
"id": { "type": "string", "format": "uuid" },
"email": { "type": "string", "format": "email" },
"age": { "type": "integer", "minimum": 0, "maximum": 150 },
"status": { "enum": ["active", "invited", "disabled"] }
},
"required": ["id", "email", "status"],
"additionalProperties": false
}A validator such as ajv (JavaScript), jsonschema (Python), or opis/json-schema (PHP) will compile this schema and tell you, for any input JSON, whether it conforms — and if not, which field failed which rule.
The attractive part of JSON Schema is additionalProperties: false. With it, any unexpected field is an error. Without it, a producer can bolt on new fields and your consumer silently ignores them, which is either a feature (forward compatibility) or a bug (silent data loss), depending on your situation. Decide deliberately.
Validate at the edge, trust inside
The most effective architectural pattern is: validate once, at the boundary where JSON enters your process (HTTP handler, message-queue consumer, file import). Convert the validated JSON into a typed internal representation (a Go struct, a Rust struct, a Pydantic model, a TypeScript class). From then on, the rest of your code assumes the data is well-formed — because it was validated at the edge.
The anti-pattern is re-validating the same data at every layer, or — worse — reaching into untyped JSON with string keys deep inside business logic. The further from the boundary you are, the harder it is to reason about what "valid" means.
Comparing two JSON objects
When debugging, you often have two JSON blobs — a baseline and a new response — and you want to know what changed. Diffing them as strings works, but whitespace and key ordering produce misleading noise. The JSON Diff tool on ToolPad parses both sides and compares them structurally, so a reordered object or a reformatted number is not reported as a difference. This is usually what you want when reviewing a config change or an API-version bump.
Try the tools
The JSON Formatter handles syntax validation and pretty-printing in the browser. For structural diffs, use the JSON Diff. For converting an array of records to a spreadsheet-friendly format, the JSON to CSV converter handles the usual type-coercion edge cases locally. For full schema-based validation on production data, reach for a JSON Schema library in the language of the service that owns the boundary.