URL Parts
Level: Intermediate 30–60 minConcepts: ParsingStrings
Solutions: C# | TypeScript | Python
Decomposes a given URL into its parts.
For example when the URL http://www.tddbuddy.com is decomposed into its parts.
| Part | Value |
|---|---|
| Protocol | http |
| Subdomain | www |
| Domain | tddbuddy.com |
| Port | 80 (Default for HTTP) |
| Path | ” (Empty in our case) |
Please be sure to handle the following:
- Only top level domains like .com or .net.
- Do not worry second level domains like .co.uk or co.za
- Only the protocols specified in the default ports section below.
- Be sure to deal with local network hostname only cases. E.g. http://localhost
Do not use built-in classes like Uri to solve this.
Default Ports
http: 80, https: 443, ftp: 21, sftp: 22
Examples
| Examples |
|---|
|
URL: http://foo.bar.com/foobar.html Protocol: http Subdomain: foo Domain name: bar.com Port: 80 Path: foobar.html |
|
URL: https://www.foobar.com:8080/download/install.exe Protocol: https Subdomain: www Domain name: foobar.com Port: 8080 Path: download/installer.exe |
|
URL: ftp://foo.com:9000/files Protocol: ftp Subdomain: '' (empty string) Domain name: foo.com Port: 9000 Path: files |
|
URL: https://localhost/index.html#footer Protocol: https Subdomain: '' (empty string) Domain name: localhost Port: 443 Path: index.html |
Hints
Exclude the leading / when handling path. E.g. /download becomes download.
URL Grammar
Below is a EBNF like grammar for a URL as per this kata.
url = protocol ”://” [subdomain] host [top-level-domain] [”:” port] [path] [”?” parameters] [”#” anchor]
protocol = “http” | “https” | “ftp” | “sftp”
subdomain = alphanumeric string starting with alpha
host = alphanumeric string
top-level-domain = “.com” | “.net” | “.org” | “.int” | “.edu” | “.gov” | “.mil”
port = numeric
path = alphanumeric string
parameters = alphanumeric string
anchor = alphanumeric string
Reference Walkthrough
Reference implementations in C#, TypeScript, and Python live at tddbuddy-reference-katas/url-parts. Ten shared scenarios cover the four protocols (http, https, ftp, sftp) with their default ports, explicit ports, paths with and without query strings, fragments, and the bare localhost host — satisfied identically across all three languages.
- C# (.NET 8, xUnit, FluentAssertions 6.12.0) — walkthrough
- TypeScript (Node 20, Vitest 1.6, TS 5 strict) — walkthrough
- Python (3.11, pytest) — walkthrough
This kata ships in Agent Full-Bake mode at high gear (F1 tier): the algorithm is small enough to land as one commit per language. There are no builders — the parse returns a typed UrlParts record (C# record, TS interface, Python @dataclass(frozen=True)) which is a data shape, not a fluent construction API. No regex and no built-in URL classes (System.Uri, new URL, urllib.parse) per the prompt: each language hand-rolls a left-to-right peel — anchor, then parameters, then path, then port, then host — so the order of operations stays visible in the code. See the repo’s Gears section for when high gear is the right call.