URL Parts

Level: Intermediate 30–60 min

Concepts: ParsingStrings

Solutions: C# | TypeScript | Python


Decomposes a given URL into its parts.

For example when the URL http://www.tddbuddy.com is decomposed into its parts.

PartValue
Protocolhttp
Subdomainwww
Domaintddbuddy.com
Port80 (Default for HTTP)
Path” (Empty in our case)

Please be sure to handle the following:

  • Only top level domains like .com or .net.
    • Do not worry second level domains like .co.uk or co.za
  • Only the protocols specified in the default ports section below.
  • Be sure to deal with local network hostname only cases. E.g. http://localhost

Do not use built-in classes like Uri to solve this.

Default Ports

http: 80, https: 443, ftp: 21, sftp: 22

Examples

Examples
URL: http://foo.bar.com/foobar.html
Protocol: http
Subdomain: foo
Domain name: bar.com
Port: 80
Path: foobar.html
URL: https://www.foobar.com:8080/download/install.exe
Protocol: https
Subdomain: www
Domain name: foobar.com
Port: 8080
Path: download/installer.exe
URL: ftp://foo.com:9000/files
Protocol: ftp
Subdomain: '' (empty string)
Domain name: foo.com
Port: 9000
Path: files
URL: https://localhost/index.html#footer
Protocol: https
Subdomain: '' (empty string)
Domain name: localhost
Port: 443
Path: index.html

Hints

Exclude the leading / when handling path. E.g. /download becomes download.

URL Grammar

Below is a EBNF like grammar for a URL as per this kata.

url = protocol ”://” [subdomain] host [top-level-domain] [”:” port] [path] [”?” parameters] [”#” anchor]

protocol = “http” | “https” | “ftp” | “sftp”

subdomain = alphanumeric string starting with alpha

host = alphanumeric string

top-level-domain = “.com” | “.net” | “.org” | “.int” | “.edu” | “.gov” | “.mil”

port = numeric

path = alphanumeric string

parameters = alphanumeric string

anchor = alphanumeric string

Reference Walkthrough

Reference implementations in C#, TypeScript, and Python live at tddbuddy-reference-katas/url-parts. Ten shared scenarios cover the four protocols (http, https, ftp, sftp) with their default ports, explicit ports, paths with and without query strings, fragments, and the bare localhost host — satisfied identically across all three languages.

This kata ships in Agent Full-Bake mode at high gear (F1 tier): the algorithm is small enough to land as one commit per language. There are no builders — the parse returns a typed UrlParts record (C# record, TS interface, Python @dataclass(frozen=True)) which is a data shape, not a fluent construction API. No regex and no built-in URL classes (System.Uri, new URL, urllib.parse) per the prompt: each language hand-rolls a left-to-right peel — anchor, then parameters, then path, then port, then host — so the order of operations stays visible in the code. See the repo’s Gears section for when high gear is the right call.