Building a SQL Tokenizer and Formatter From Scratch — Supporting 6 Dialects
Try it: devprix.dev/tools/sql-formatter This is part of DevPrix — 56 free developer tools that run entirely in your browser. No sign-up, no tracking, no server calls. SQL formatting seems simple un...

Source: DEV Community
Try it: devprix.dev/tools/sql-formatter This is part of DevPrix — 56 free developer tools that run entirely in your browser. No sign-up, no tracking, no server calls. SQL formatting seems simple until you try to build it. Keyword capitalization? Easy. Proper indentation of subqueries, CASE expressions, and JOINs across PostgreSQL, MySQL, SQL Server, Oracle, SQLite, and BigQuery? That's a compiler problem. Architecture: Tokenizer + State Machine I chose a two-stage approach: tokenize the SQL into a stream of typed tokens, then format by iterating through tokens with a state machine. No AST (Abstract Syntax Tree) needed — SQL formatting doesn't require understanding query semantics, just structure. Stage 1: The Tokenizer The tokenizer is a single-pass, character-by-character scanner. It produces an array of typed tokens: type TokenType = | "keyword" | "identifier" | "string" | "number" | "operator" | "punctuation" | "comma" | "open_paren" | "close_paren" | "comment_single" | "comment_mul