Skip to content

SQL - PostgreSQL - RegEx to Parse URL String

Source: sql-snippets/regex-parse-url.md at main · count/sql-snippets (github.com)

View an interactive version of this snippet here.

Description

URL strings are a very valuable source of information, but trying to get your RegEx exactly right is a pain.

This re-usable snippet can be applied to any URL to extract the exact part you’re after.

Snippet ✂️

SELECT
  SUBSTRING(my_url, '^([a-zA-Z]+)://') AS protocol,
  SUBSTRING(my_url, '(?:[a-zA-Z]+:\/\/)?([a-zA-Z0-9.]+)\/?') AS host,
  SUBSTRING(my_url, '(?:[a-zA-Z]+://)?(?:[a-zA-Z0-9.]+)/{1}([a-zA-Z0-9./]+)') AS path,
  SUBSTRING(my_url, '\?(.*)') AS query,
  SUBSTRING(my_url, '#(.*)') AS ref
FROM <schema.tablename>

where:

  • <my_url> is your URL string (e.g. "https://www.yoursite.com/pricing/details?myparam1=123&myparam2=abc#Ref1"
  • protocol is how the data is transferred between the host and server (e.g. https)
  • host is the domain name (e.g. yoursite.com)
  • path is the page path on the website (e.g. pricing/details)
  • query is the string of query parameters in the url (e.g. myparam1=123)
  • ref is the reference or where the user came from before visiting the url (e.g. #newsfeed)

Usage

To use the snippet, just copy out the RegEx for whatever part of the URL you need - or capture them all as in the example below:

WITH data as 
    (SELECT 'https://www.yoursite.com/pricing/details?myparam1=123&myparam2=abc#newsfeed' AS my_url)

SELECT
  SUBSTRING(my_url, '^([a-zA-Z]+)://') AS protocol,
  SUBSTRING(my_url, '(?:[a-zA-Z]+:\/\/)?([a-zA-Z0-9.]+)\/?') AS host,
  SUBSTRING(my_url, '(?:[a-zA-Z]+://)?(?:[a-zA-Z0-9.]+)/{1}([a-zA-Z0-9./]+)') AS path,
  SUBSTRING(my_url, '\?(.*)') AS query,
  SUBSTRING(my_url, '#(.*)') AS ref
FROM data

Output:

protocol host path query ref
https www.yoursite.com pricing/details myparam1=123&myparam2=abc#newsfeed newsfeed

Backlinks:

list from [[SQL - PostgreSQL - RegEx to Parse URL String]] AND -"Changelog"