{"id":"019d2f43-3746-7adb-a2c1-c13788ae5923","title":"Two Articles on JSON Query Optimization: a tool and an approach","slug":"2026/03/two-articles-on-json-query-optimization-a-tool-and-an-approach","renderedHtml":"<p>Most developers treat JSON querying as free. It isn't, and two recent articles make the case from opposite directions. One used a regular language and an optimal traversal mechanism - the other used AI to rewrite data access and transformation in Go to remove RPC calls.</p>\n<h2>jsongrep: Compile the Query, Not the Interpretation</h2>\n<p>Micah Kepe's <a href=\"https://micahkepe.com/blog/jsongrep/\" title=\"a Rust command-line tool for searching JSON documents using a regular-language query syntax compiled into a DFA, enabling single-pass O(n) traversal with no backtracking.\">jsongrep</a> attacks the problem at the algorithm level. The core insight: a JSON path query is a <strong>regular language</strong> - it describes paths through a tree using a grammar with no ambiguity, no recursive lookahead, no edge cases that blow up on unusual input.</p>\n<p>That matters because regular languages can be compiled into a deterministic finite automaton. A DFA processes input with O(1) work per symbol -no backtracking, no interpretation at runtime, and crucially, no surprises. Tools like <a href=\"https://jqlang.org/\" title=\"a lightweight command-line processor for JSON supporting filtering, transformation, and computation via a functional expression language. The de facto standard tool for JSON manipulation in shell pipelines.\">jq</a> and <a href=\"https://jmespath.org/\" title=\"a query language for JSON that supports path expressions, wildcards, filters, and projections. Widely used in AWS CLI and SDKs for extracting values from API responses.\">jmespath</a> interpret path expressions as they traverse the document; <code>jsongrep</code> compiles the query first, then walks the document exactly once, pruning entire subtrees in O(1) when the DFA says they can't match.</p>\n<p>The formal constraint is also a feature: by keeping the query language strictly regular, <code>jsongrep</code> guarantees predictable performance regardless of query complexity. You give up expressiveness - no filters, no arithmetic - and get a tool you can reason about.</p>\n<p>On a 190MB dataset, the end-to-end benchmark isn't close.</p>\n<p>Kepe is upfront about the tradeoffs: <code>jsongrep</code> is a search tool, not a transformation tool, just <em>really</em> fast path matching. The article also walks through <a href=\"https://en.wikipedia.org/wiki/Glushkov%27s_construction\" title=\"an algorithm for building an epsilon-free NFA directly from a regular expression. Unlike Thompson's construction, every transition consumes a symbol, which simplifies downstream determinization into a DFA.\">Glushkov</a>'s construction and subset construction in practical detail, which is worth the read on its own as a comparison to <a href=\"https://en.wikipedia.org/wiki/Thompson%27s_construction\" title=\"an algorithm for building an NFA from a regular expression using epsilon transitions to compose sub-automata. The most widely taught NFA construction, but requires epsilon-closure computation at every step.\">Ken Thompson's NFA construction</a>.</p>\n<h2>gnata: Kill the Language Boundary</h2>\n<p>Nir Barak's <a href=\"https://www.reco.ai/blog/we-rewrote-jsonata-with-ai\">rewrite of JSONata in Go</a> for <a href=\"https://www.reco.ai/\">Reco</a>'s pipeline is a different problem solved at the architecture level.</p>\n<p>Reco's policy engine evaluated <a href=\"/factoids/JSONata\">JSONata</a> expressions against billions of events. The reference implementation is JavaScript; the pipeline is Go. Every evaluation crossed a language boundary via RPC - roughly 150 microseconds of overhead before any actual work happened.</p>\n<blockquote>\n<p>Rear Admiral Grace Hopper has a famous presentation about <a href=\"https://kottke.org/19/05/grace-hopper-explains-a-nanosecond\">how time adds up</a>. 150 microseconds isn't much, but when you have a <em>lot</em> of 150 microsecond traversals...</p>\n</blockquote>\n<p>Their solution was <a href=\"https://github.com/RecoLabs/gnata\">gnata</a>: a pure-Go JSONata 2.x implementation with a two-tier evaluator. Simple expressions take a fast path that operates on raw bytes without parsing the document. Complex expressions go through a full parser, and the RPC fleet is gone.</p>\n<p>Their claim: the total build time was one day, with a token cost of $400, saving them potentially <em>half a million dollars a year</em>.</p>\n<p>The article is also honest about what came after: building <code>gnata</code> was day one; shadow-mode validation against production traffic was the rest of the week. But there's more; they also caught bugs in the reference JSONata implementation along the way - wins on multiple fronts.</p>\n<h3>The Common Thread</h3>\n<p>JSON querying at scale rewards investment that most teams defer until it's already expensive. That's a pattern that goes beyond JSON, too.</p>\n<p><code>jsongrep</code> is interesting as a computer science artifact - the DFA approach is elegant and the benchmarks back it up. <code>gnata</code> is interesting as a production story - the methodology (port the test suite, implement until it passes, validate in shadow mode) is directly reusable for any language-boundary problem in your own pipeline.</p>\n<p>Both are worth reading.</p>","excerpt":"Two recent articles tackle JSON querying performance from opposite ends: one applies automata theory to eliminate interpretation overhead at query time, the other kills a $500K/year language-boundary tax by rewriting a JavaScript reference implementation in Go. The methodologies are different; the lesson is the same.","authorId":"019c5c8a-609d-7cd4-975b-50bbcc412a33","authorDisplayName":"dreamreal","status":"APPROVED","publishedAt":"2026-03-27T12:27:10.942Z","sortOrder":0,"createdAt":"2026-03-27T12:27:07.462192Z","updatedAt":"2026-03-27T12:29:16.184766Z","commentCount":0,"tags":["go","json","optimization","query","rpc","rust"],"categories":[],"markdownSource":null}