URL Parts

Level: Intermediate 30–60 min

Concepts: ParsingStrings


Decomposes a given URL into its parts.

For example when the URL http://www.tddbuddy.com is decomposed into its parts.

PartValue
Protocolhttp
Subdomainwww
Domaintddbuddy.com
Port80 (Default for HTTP)
Path” (Empty in our case)

Please be sure to handle the following:

  • Only top level domains like .com or .net.
    • Do not worry second level domains like .co.uk or co.za
  • Only the protocols specified in the default ports section below.
  • Be sure to deal with local network hostname only cases. E.g. http://localhost

Do not use built-in classes like Uri to solve this.

Default Ports

http: 80, https: 443, ftp: 21, sftp: 22

Examples

Examples
URL: http://foo.bar.com/foobar.html
Protocol: http
Subdomain: foo
Domain name: bar.com
Port: 80
Path: foobar.html
URL: https://www.foobar.com:8080/download/install.exe
Protocol: https
Subdomain: www
Domain name: foobar.com
Port: 8080
Path: download/installer.exe
URL: ftp://foo.com:9000/files
Protocol: ftp
Subdomain: '' (empty string)
Domain name: foo.com
Port: 9000
Path: files
URL: https://localhost/index.html#footer
Protocol: https
Subdomain: '' (empty string)
Domain name: localhost
Port: 443
Path: index.html

Hints

Exclude the leading / when handling path. E.g. /download becomes download.

URL Grammar

Below is a EBNF like grammar for a URL as per this kata.

url = protocol ”://” [subdomain] host [top-level-domain] [”:” port] [path] [”?” parameters] [”#” anchor]

protocol = “http” | “https” | “ftp” | “sftp”

subdomain = alphanumeric string starting with alpha

host = alphanumeric string

top-level-domain = “.com” | “.net” | “.org” | “.int” | “.edu” | “.gov” | “.mil”

port = numeric

path = alphanumeric string

parameters = alphanumeric string

anchor = alphanumeric string