CSV Query Engine

Level: Advanced 60–90 min

Concepts: AlgorithmsEdge CasesBoundariesRefactoringDesign Patterns

Solutions: C# | TypeScript | Python


Build a query engine that operates on CSV data.

Requirements

  1. Parse CSV — accept a CSV string with a header row and data rows
  2. Select columnsselect("name", "age") returns only specified columns
  3. Where clausewhere("age", ">", 30) filters rows by condition
    • Support operators: =, !=, >, <, >=, <=
    • Numeric comparisons for numeric values, string comparisons otherwise
  4. Order byorderBy("age", "asc") sorts results ascending or descending
  5. Limitlimit(5) returns only the first N results
  6. Countcount() returns the number of matching rows
  7. Operations are chainable and applied in order

Sample Data

name,age,city,salary
Alice,35,London,75000
Bob,28,Paris,55000
Charlie,42,London,90000
Diana,31,Berlin,65000
Eve,28,Paris,60000

Test Cases

QueryResult
select(“name”)Alice, Bob, Charlie, Diana, Eve
where(“city”, ”=”, “London”)Alice and Charlie rows
where(“age”, ”>”, 30).select(“name”)Alice, Charlie, Diana
orderBy(“age”, “asc”).select(“name”, “age”)Bob(28), Eve(28), Diana(31), Alice(35), Charlie(42)
where(“city”, ”=”, “Paris”).count()2
limit(2).select(“name”)Alice, Bob
where(“age”, ”>=”, 35).orderBy(“salary”, “desc”)Charlie(90000), Alice(75000)
where(“city”, ”=”, “Tokyo”).count()0
select(“invalid_column”)Error or empty results

Edge Cases

  • Empty CSV (header only, no data rows)
  • CSV with a single row
  • Numeric fields used in string comparisons
  • Column names with spaces
  • Data containing commas within quoted fields: "Smith, Jr.",45,London

Bonus

  • Add sum("salary") and avg("salary") aggregation functions
  • Add groupBy("city").count() — group rows and aggregate per group
  • Add join(otherCsv, "id") — join two CSV datasets on a common column
  • Support like operator for pattern matching: where("name", "like", "A%")

Reference Walkthrough

Full C#, TypeScript, and Python implementations live at tddbuddy-reference-katas/csv-query. Twenty-five scenarios across parsing, select, where (six comparison operators with automatic numeric/string detection), orderBy, limit, count, and chaining — shared across all three languages — with a CsvParser, CsvTable, Query (chainable pipeline), Row value type, UnknownColumnException domain exception, and two fluent test builders (RowBuilder, QueryBuilder).

This kata ships in Agent Full-Bake mode: one commit per language with the full domain design landing together. The walkthroughs read as design rationalewhy Row is a type rather than a bare dictionary, why numeric detection happens at comparison time rather than parse time, why UnknownColumnException beats silently returning empty. See the repo’s Gears section for when middle gear is the right call.