Dirty strings are an everyday reality: stray whitespace from imports, phone numbers full of dashes, codes in mixed case. Three functions — TRIM, SUBSTRING and REPLACE — cover almost all of that routine cleanup. We'll work them over a schema with users(id, email, name, country, created_at), orders(id, user_id, amount, status, created_at) and employees(id, name, manager_id, dept, salary).
TRIM: cut whitespace and junk
By default TRIM removes spaces from both ends. It's the first thing to apply to any text field coming from an external source.
SELECT TRIM(name) AS clean_name
FROM users;
You can trim only one side, and even supply your own character set:
SELECT TRIM(LEADING FROM name) AS no_left,
TRIM(TRAILING FROM name) AS no_right
FROM users;
SELECT TRIM(BOTH '0' FROM '00description') AS no_zeros;
Handy relatives: LTRIM and RTRIM for one-sided trimming, and BTRIM in PostgreSQL as a short synonym for TRIM(BOTH ...).
TRIM only strips edge characters, never interior ones: 'a b' keeps its double space in the middle.
- Invisible characters — tabs (
\t), newlines, the non-breaking space — are left untouched by plain TRIM. For those, combine with REPLACE or REGEXP_REPLACE.
SUBSTRING: slice by position
SUBSTRING pulls out a piece of a string. The canonical standard syntax uses the keywords FROM (start, 1-indexed) and FOR (length):
SELECT SUBSTRING(country FROM 1 FOR 3) AS region_prefix
FROM users;
The comma-positional form gives the same result; it's shorter and more familiar to many:
SELECT SUBSTRING(country, 1, 3) AS region_prefix
FROM users;
A common trick is grabbing the domain from an email by finding the @ position with POSITION:
SELECT SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain
FROM users;
REPLACE: swap substrings
REPLACE(source, from, to) replaces EVERY occurrence of a literal. It's perfect for stripping formatting — for example turning a phone number into clean digits:
SELECT REPLACE(REPLACE(name, '-', ''), ' ', '') AS compact
FROM users;
REPLACE works on literal strings only, no patterns. If you need regular expressions, reach for REGEXP_REPLACE in PostgreSQL:
SELECT REGEXP_REPLACE(name, '[^0-9]', '', 'g') AS digits_only
FROM users;
Building a normalized key
The real power is in combining these. A typical task is building a stable key for deduplication or joins: lower case, no spaces, no separators.
SELECT o.id,
LOWER(TRIM(u.country)) || '_' ||
REPLACE(LOWER(o.status), ' ', '_') AS norm_key
FROM orders o
JOIN users u ON u.id = o.user_id;
For a department report you can assemble a tidy code from name and dept:
SELECT id,
UPPER(SUBSTRING(TRIM(dept) FROM 1 FOR 3)) || '-' ||
UPPER(SUBSTRING(TRIM(name) FROM 1 FOR 2)) AS emp_code
FROM employees;
Differences across engines
- PostgreSQL: supports both
SUBSTRING(x FROM a FOR b) and SUBSTRING(x, a, b); SUBSTR exists as a synonym. 1-indexed.
- MySQL:
SUBSTRING and SUBSTR are equivalent, and the FROM ... FOR syntax works too. A negative start is supported — counting back from the end of the string.
- ClickHouse: the function is
substring(s, offset, length), positional form only; replaceAll instead of REPLACE, and trimBoth/trimLeft/trimRight instead of TRIM.
The big gotcha: SQL string indexing starts at 1, not 0. SUBSTRING(x FROM 0 FOR 3) behaves unintuitively — position zero eats one character of the length. Always count from one, and FROM 1 FOR n will give you exactly the first n characters.
Dirty strings are an everyday reality: stray whitespace from imports, phone numbers full of dashes, codes in mixed case. Three functions —
TRIM,SUBSTRINGandREPLACE— cover almost all of that routine cleanup. We'll work them over a schema withusers(id, email, name, country, created_at),orders(id, user_id, amount, status, created_at)andemployees(id, name, manager_id, dept, salary).TRIM: cut whitespace and junk
By default
TRIMremoves spaces from both ends. It's the first thing to apply to any text field coming from an external source.SELECT TRIM(name) AS clean_name FROM users;You can trim only one side, and even supply your own character set:
-- Leading vs trailing SELECT TRIM(LEADING FROM name) AS no_left, TRIM(TRAILING FROM name) AS no_right FROM users; -- Strip a specific character, not just spaces SELECT TRIM(BOTH '0' FROM '00description') AS no_zeros; -- 'description'Handy relatives:
LTRIMandRTRIMfor one-sided trimming, andBTRIMin PostgreSQL as a short synonym forTRIM(BOTH ...).TRIMonly strips edge characters, never interior ones:'a b'keeps its double space in the middle.\t), newlines, the non-breaking space — are left untouched by plainTRIM. For those, combine withREPLACEorREGEXP_REPLACE.SUBSTRING: slice by position
SUBSTRINGpulls out a piece of a string. The canonical standard syntax uses the keywordsFROM(start, 1-indexed) andFOR(length):-- First 3 chars of the country code SELECT SUBSTRING(country FROM 1 FOR 3) AS region_prefix FROM users;The comma-positional form gives the same result; it's shorter and more familiar to many:
SELECT SUBSTRING(country, 1, 3) AS region_prefix FROM users;A common trick is grabbing the domain from an email by finding the
@position withPOSITION:SELECT SUBSTRING(email FROM POSITION('@' IN email) + 1) AS domain FROM users;REPLACE: swap substrings
REPLACE(source, from, to)replaces EVERY occurrence of a literal. It's perfect for stripping formatting — for example turning a phone number into clean digits:-- Strip dashes and spaces from a phone-like field SELECT REPLACE(REPLACE(name, '-', ''), ' ', '') AS compact FROM users;REPLACE works on literal strings only, no patterns. If you need regular expressions, reach for
REGEXP_REPLACEin PostgreSQL:-- Keep digits only SELECT REGEXP_REPLACE(name, '[^0-9]', '', 'g') AS digits_only FROM users;Building a normalized key
The real power is in combining these. A typical task is building a stable key for deduplication or joins: lower case, no spaces, no separators.
-- A normalized join key from country + status SELECT o.id, LOWER(TRIM(u.country)) || '_' || REPLACE(LOWER(o.status), ' ', '_') AS norm_key FROM orders o JOIN users u ON u.id = o.user_id;For a department report you can assemble a tidy code from name and dept:
SELECT id, UPPER(SUBSTRING(TRIM(dept) FROM 1 FOR 3)) || '-' || UPPER(SUBSTRING(TRIM(name) FROM 1 FOR 2)) AS emp_code FROM employees;Differences across engines
SUBSTRING(x FROM a FOR b)andSUBSTRING(x, a, b);SUBSTRexists as a synonym. 1-indexed.SUBSTRINGandSUBSTRare equivalent, and theFROM ... FORsyntax works too. A negative start is supported — counting back from the end of the string.substring(s, offset, length), positional form only;replaceAllinstead ofREPLACE, andtrimBoth/trimLeft/trimRightinstead ofTRIM.The big gotcha: SQL string indexing starts at 1, not 0.
SUBSTRING(x FROM 0 FOR 3)behaves unintuitively — position zero eats one character of the length. Always count from one, andFROM 1 FOR nwill give you exactly the firstncharacters.