TRIM, SUBSTRING and REPLACE: Cleaning Up Strings in SQL

Dirty strings are an everyday reality: stray whitespace from imports, phone numbers full of dashes, codes in mixed case. Three functions — TRIM, SUBSTRING and REPLACE — cover almost all of that routine cleanup. We'll work them over a schema with users(id, email, name, country, created_at), orders(id, user_id, amount, status, created_at) and employees(id, name, manager_id, dept, salary).

TRIM: cut whitespace and junk

By default TRIM removes spaces from both ends. It's the first thing to apply to any text field coming from an external source.

SELECT TRIM(name) AS clean_name
FROM users;

You can trim only one side, and even supply your own character set:

-- Leading vs trailing
SELECT TRIM(LEADING FROM name)  AS no_left,
       TRIM(TRAILING FROM name) AS no_right
FROM users;

-- Strip a specific character, not just spaces
SELECT TRIM(BOTH '0' FROM '00description') AS no_zeros;  -- 'description'

Handy relatives: LTRIM and RTRIM for one-sided trimming, and BTRIM in PostgreSQL as a short synonym for TRIM(BOTH ...).

TRIM only strips edge characters, never interior ones: 'a b' keeps its double space in the middle.
Invisible characters — tabs (\t), newlines, the non-breaking space — are left untouched by plain TRIM. For those, combine with REPLACE or REGEXP_REPLACE.

SUBSTRING: slice by position

SUBSTRING pulls out a piece of a string. The canonical standard syntax uses the keywords FROM (start, 1-indexed) and FOR (length):

-- First 3 chars of the country code
SELECT SUBSTRING(country FROM 1 FOR 3) AS region_prefix
FROM users;

The comma-positional form gives the same result; it's shorter and more familiar to many:

SELECT SUBSTRING(country, 1, 3) AS region_prefix
FROM users;

A common trick is grabbing the domain from an email by finding the @ position with POSITION:

SELECT SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain
FROM users;

REPLACE: swap substrings

REPLACE(source, from, to) replaces EVERY occurrence of a literal. It's perfect for stripping formatting — for example turning a phone number into clean digits:

-- Strip dashes and spaces from a phone-like field
SELECT REPLACE(REPLACE(name, '-', ''), ' ', '') AS compact
FROM users;

REPLACE works on literal strings only, no patterns. If you need regular expressions, reach for REGEXP_REPLACE in PostgreSQL:

-- Keep digits only
SELECT REGEXP_REPLACE(name, '[^0-9]', '', 'g') AS digits_only
FROM users;

Building a normalized key

The real power is in combining these. A typical task is building a stable key for deduplication or joins: lower case, no spaces, no separators.

-- A normalized join key from country + status
SELECT o.id,
       LOWER(TRIM(u.country)) || '_' ||
       REPLACE(LOWER(o.status), ' ', '_') AS norm_key
FROM orders o
JOIN users u ON u.id = o.user_id;

For a department report you can assemble a tidy code from name and dept:

SELECT id,
       UPPER(SUBSTRING(TRIM(dept) FROM 1 FOR 3)) || '-' ||
       UPPER(SUBSTRING(TRIM(name) FROM 1 FOR 2)) AS emp_code
FROM employees;

Differences across engines

PostgreSQL: supports both SUBSTRING(x FROM a FOR b) and SUBSTRING(x, a, b); SUBSTR exists as a synonym. 1-indexed.
MySQL: SUBSTRING and SUBSTR are equivalent, and the FROM ... FOR syntax works too. A negative start is supported — counting back from the end of the string.
ClickHouse: the function is substring(s, offset, length), positional form only; replaceAll instead of REPLACE, and trimBoth/trimLeft/trimRight instead of TRIM.

The big gotcha: SQL string indexing starts at 1, not 0. SUBSTRING(x FROM 0 FOR 3) behaves unintuitively — position zero eats one character of the length. Always count from one, and FROM 1 FOR n will give you exactly the first n characters.