Skip to content
UoL CS Notes

PHP Regular Expressions

COMP284 Lectures

PHP uses PCRE (Perl-compatible regular expressions). You can try out your expressions in a live environment at regex101 (ensure that PCRE2 is checked).

PHP RE Syntax

We can use the following syntax to define regular expressions in PHP:

'/a*bc/'

The expression is enclosed in two /.

Quantifiers

We can request a certain number of characters:

Quantifier Description
* Match 0 or more times.
+ Match 1 or more times.
? Match 1 or 0 times.
{n} Match exactly $n$ times.
{n,} Match at least $n$ times.
{n,m} Match between $n$ and $m$ times.

These quantifiers are greedy:

  • Match the longest leftmost sequence of characters possible.

We can match lazily by following a quantifier with a ? character:

'=/\*.*?\*/='

Any non-whitespace, non-backslash character can be a regex delimiter. = is used here for convenience.

this would match:

x = /* one */ y*z; /* three */

as opposed to the following using a greedy expression:

x = /* one */ y*z; /* three */

Capture Groups & Back-references

We can capture groups and use them with backreferences like so:

"=<(?<c1>\w+)>/*?</\g{c1}>="

This captures a group of w+ called c1 and uses it later using \g{c1}.

This would match the whole string:

"<li><b>item</b></li>"

Available syntax includes:

Expression Description
(regex) Creates a capturing sub-pattern and automatically names starting from 1.
(?<name>regex) Creates a named capturing sub-pattern.
(?:regex) Creates a non-capturing sub-pattern.
\N, \gN, \g{N} Back-reference to a capturing pattern called $N$, where $N$ is a natural number.
\g{name} Back-reference to a named capture group.

(?:regex) is used to enclose alternations without making another capture group:

"/(?:regex1|regex2)/"

Modifiers

Usually we might enclose our regular expression in / /. We can use the following modifiers to match in different ways like so:

"/hello/i"

This performs a case-insensitive match.

Modifier Description
i Perform case-insensitive match.
s Treat multiline string as a single line.
m Treat string as a set of multiple lines (multi-line mode).

You can combine these modifiers one after the other.

PHP Regex Functions

preg_match()

preg_match(rx, str [,&$matches [,flags [,offset]]])

Attempts to match a regular expression rx against the string str starting at offset:

  • Returns 1 is there is a match, 0 if not and False if there is an error.
  • $matches is an array containing all the matching groups where group/key 0 is the full matching string.
  • flags modify the behaviour of the function.

It can be useful to run this in the expression of an if statement so that your output will only run if the pattern matches.

preg_match_all()

preg_match_all(rx, str, [,&$matches [,flags [,offs]]])

Retrieves all matches of the regular expression rx against teh string str starting at offs:

  • Returns the number of matches and False in the case of an error.
  • $matches is a multi-dimensional array containing all matches indexed from 0; with each match in the same format at preg_match().
  • flags modify the behaviour of the function.

preg_replace()

preg_replace(rx, rpl, str [,lmt [, &$num]])

Returns the result of replacing matches of rx in str by rpl:

  • lmt specifies the maximum number of replacements.
  • On completion, $num contains the number of replacements performed.
  • You can use groups captured from str in rpl.

preg_replace_callback()

You can also use preg_replace_callback() with the result of a function:

$old = "105 degrees Fahrenheit is quite warm";

$new = preg_replace_callback(
	'/(\d+) degrees Fahrenheit/',
	function ($match) {
		return round(($match[1] - 32) * 5 / 9) . " degrees Celsius";
	},
	$old
);

echo $new;
41 degrees Celsius is quite warm

preg_split()

preg_split(rx, str [,lmt [,flags]])

Splits str by the regular expression rx and returns the result as an array:

  • lmt specifies the maximum number of split components.
  • flagsmodify the behaviour of the function.