enabling precise pattern matching and text manipulation
Continuing on with our built-in functions with preg_match
. To review the previous functions, read P101.
https://blog.devgenius.io/php-p101-trim-htmlspecialchars-and-call-built-in-functions-7b4f25728db9
trim(),
ltrim()
,rtrim()
htmlspecialchars()
__call()
preg_match()
filter_var()
addslashes()
str_replace()
strlen()
strtolower()
strtoupper()
ucfirst()
strpos()
,stripos()
,strrpos()
,strripos()
- Array Functions like:
array_chunk()
,array_diff()
,array_key_exists()
,array_key_first()
,array_key_last()
,array_map()
,array_merge()
,array_push()
,array_sum()
,asort()
,arsort()
,count()
,in_array()
,ksort()
,krsort()
,sort()
,rsort()
,shuffle()
,sizeof()
,is_array()
,explode()
,implode()
- Magic Methods like:
__invoke()
,__toString()
Regex Intro
I have been debating whether to even cover this one since it requires knowledge of regular expressions. We want to be as thorough as possible, so we’ll need to include some basic knowledge of regular expressions. A regular expression is just a sequence of characters that is used to search for specific patterns in a string.
A simple pattern might be /Dino/
where we search for Dino
in some string like, Hey there Dino. How's it going?
What is the forward slash? It’s what’s called a delimeter. Each pattern must begin and end with a delimeter. You can choose a delimeter of your choice, but it cannot be:
- alphanumeric characters, i.e.
A
or3
- backslash,
\
- whitespace
Frequently used delimeters are:
- forward slashes,
/
- hash symbols,
#
- tildes,
~
Some examples of delimeters:
/Dino/
#Dino#
%Dino%
{Dino}
~Dino~
Why not just choose one and stick to it? Simple. Readability. If you’re using the delimeter inside of your pattern, you will have to escape it. Let’s say that we want to look for Dino/Jeff
. If you were to start with a forward slash, PHP would think that the forward slash in your pattern is the closing symbol. You’ll need to escape it: /Dino\/Jeff/
. Or you could just use another delimiter: #Dino/Jeff#
. That way you don’t have to escape the forward slash.
preg_match()
Let’s introduce the preg_match function. We’re going to look at the first two parameters, which is the pattern and search-string respectively. Knowing what we know so far, let’s look at an example.
<?php
$pattern = "/Dino/";
$search_string = "Hey. My name is Dino. How are you?";
var_dump( preg_match($pattern, $search_string) );
The result will be 1
if the pattern matches inside of the string, 0
if there are no matches, and false
if there was an error.
/app/98 Functions/PregMatch.php:6:int 1
We get 1
as expected. Is the pattern case sensitive? What would we get if we searched for /dino/
instead of /Dino/
?
<?php
$pattern = "/dino/";
$search_string = "Hey. My name is Dino. How are you?";
var_dump( preg_match($pattern, $search_string) );
The result will actually be a 0
. It is in fact case sensitive.
The third parameter is where you would store your matched results. If we included a variable called $matches
there, it would store all of our matches there. We don’t have to initialize the $matches
variable.
<?php
$pattern = "/Dino/";
$search_string = "Hey. My name is Dino. How are you?";
var_dump( preg_match($pattern, $search_string, $matches) );
var_dump( $matches );
The function preg_match
still returns 1
, but it also stores the results inside of the $matches
array.
/app/98 Functions/PregMatch.php:6:int 1
/app/98 Functions/PregMatch.php:7:
array (size=1)
0 => string 'Dino' (length=4)
Common Regex Patterns
Here is a quick cheat sheet for some common regular expression patterns. We’re going to look at how these patterns behave as we’re going through them. If you need a full list of what characters do, check out this awesome RegEx cheatsheet.
https://quickref.me/regex?source=post_page—–b4b23869261a——————————–
[abc] A single character: a, b or c
Square brackets indicate a range. In the first example, we’re going to match any character, that’s either a, b, or c.
$pattern = "/[abc]/";
$search_string = "Hey. My name is Dino. How are you?";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:14:
array (size=1)
0 => string 'a' (length=1)
We get a response that there is a character a
matched. What if we did something like [xyz]
?
$pattern = "/[xyz]/";
$search_string = "Hey. My name is Dino. How are you?";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
There is no x
, but there is a y
, and it’s matched.
/app/98 Functions/PregMatch.php:14:
array (size=1)
0 => string 'y' (length=1)
[^abc] Any single character except a, b, or c
Next, to search for any other character other than the ones specified, we can add the ^
symbol. This effectively means “not.”
$pattern = "/[^xyz]/";
$search_string = "Hello there.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:21:
array (size=1)
0 => string 'H' (length=1)
The H
is the first letter that matches the pattern.
[a-z] Any single character in the range a-z
We can specify ranges with a -
. This pattern will match any lowercase letter between a
and z
.
$pattern = "/[a-z]/";
$search_string = "Hello there.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:28:
array (size=1)
0 => string 'e' (length=1)
It skips the first letter, H
, since it’s not in the range that we were looking for. It is frequently done this way, but it doesn’t have to be. What if we only wanted to search for a range from s
to y
? We can do that.
$pattern = "/[s-y]/";
$search_string = "Hello there.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:34:
array (size=1)
0 => string 't' (length=1)
The t
exists in there
, so we get a positive match. How about b
through e
and s
through y
?
$pattern = "/[b-es-y]/";
$search_string = "Hello there.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
This time the e
matches in Hello
.
/app/98 Functions/PregMatch.php:34:
array (size=1)
0 => string 'e' (length=1)
[a-zA-Z] Any single character in the range a-z or A-Z
What about a sequence of characters both upper and lowercase? Not too different.
$pattern = "/[a-zA-Z]/";
$search_string = "Hello there.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:47:
array (size=1)
0 => string 'H' (length=1)
Again, we don’t have to use the full range of all uppercase and lowercase letters.
$pattern = "/[b-dD-G]/";
$search_string = "My name is Dino.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:47:
array (size=1)
0 => string 'D' (length=1)
^ Start of line
Didn’t we just say that ^
means not? We did, but that’s when it’s inside of brackets. When it’s outside of the brackets, it indicates the start of the line. For example, let’s say that we wanted to search if the string starts with the word Hello
.
$pattern = "/Hello/";
$search_string = "Hello there.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
In this scenario, we did not use the ^
symbol. Do we get a match? Yes.
/app/98 Functions/PregMatch.php:60:
array (size=1)
0 => string 'Hello' (length=5)
What if we had the following string. My name is Dino. Hello there.
Will it match? Yes, again. But what if we insisted that this string starts with Hello
. Normally you would say hi and then introduce yourself. You would need to use the ^
symbol.
$pattern = "/^Hello/";
$search_string = "My name is Dino. Hello there.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
In this instance, you get no matches.
/app/98 Functions/PregMatch.php:60:
array (size=0)
empty
If our string started with Hello
, you would get a match.
$pattern = "/^Hello/";
$search_string = "Hello there. My name is Dino.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:66:
array (size=1)
0 => string 'Hello' (length=5)
$ End of line
The opposite of ^
is $
. This signifies a match that occurs at the end of a string. For example, let’s see if the end of the string is a number. We’ll start without the $
sign.
$pattern = "/[0-9]/";
$search_string = "Hello there. I'm 32. My name is Dino.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
Do we get a match? Of course.
/app/98 Functions/PregMatch.php:73:
array (size=1)
0 => string '3' (length=1)
But we need that number to appear at the end of the string.
$pattern = "/[0-9]$/";
$search_string = "Hello there. I'm 32. My name is Dino.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
This time, we get no match. If we move the number to the end, we will get a proper response.
$pattern = "/[0-9]$/";
$search_string = "Hello there. My name is Dino. I'm 32";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:73:
array (size=1)
0 => string '2' (length=1)
You can see that 2
is matched this time since it’s looking at the last digit in the string.
You might have caught that I removed the period (.
) and that’s a great observation. We’ll cover that next.
. Any single character
That’s right. The period matches any single character. Let’s revisit the same example and see what happens if we put the period in there.
$pattern = "/[0-9].$/";
$search_string = "Hello there. My name is Dino. I'm 32.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
It seems like it’s doing what is intended. It’s matching [0-9]
followed by a period at the end of the line. We should get 2.
and we do.
/app/98 Functions/PregMatch.php:86:
array (size=1)
0 => string '2.' (length=2)
But that’s not the way to match the period at the end of a string. What if we were missing the period from our string? What would be the match?
$pattern = "/[0-9].$/";
$search_string = "Hello there. My name is Dino. I'm 32";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:92:
array (size=1)
0 => string '32' (length=2)
It’s 32
? How is that 32
? Remember, the .
matches any character. So, we match 3
followed by any other character at the end of a string. That means that if we made a mistake and added a latter a
at the end of a string, we would still match correctly.
$pattern = "/[0-9].$/";
$search_string = "Hello there. My name is Dino. I'm 32a";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
In this instance, it will match 2a
.
/app/98 Functions/PregMatch.php:98:
array (size=1)
0 => string '2a' (length=2)
But, we need to match the period. It needs to end with a number followed by an actual period. We just need to escape that character.
/app/98 Functions/PregMatch.php:110:
array (size=1)
0 => string '2.' (length=2)
\d Any digit
We don’t have to use the pattern [0–9]
to match all of the digits. We can simplify that with \d
. To recreate our example above where we search for any digit followed by a period at the end of the string, we can use the following pattern: /\d\.$/
// \d Any digit
$pattern = "/\d\.$/";
$search_string = "Hello there. My name is Dino. I'm 32.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
For the sake of readability, let’s change our delimiter to #
.
// \d Any digit
$pattern = "#\d\.$#";
$search_string = "Hello there. My name is Dino. I'm 32.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:117:
array (size=1)
0 => string '2.' (length=2)
\D Any non-digit
We can also match any non-digit character. Let’s say that we want to accept a phone number, but it must not contain anything else other than numbers. i.e. 5554445555
is valid but 555–444–5555
is invalid.
$pattern = "#\D#";
$search_string = "5554445555";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
This will return an empty $matches
array, which is what we want.
/app/98 Functions/PregMatch.php:124:
array (size=0)
empty
If we add some non-digit characters inside of our string, our pattern will match.
$pattern = "#\D#";
$search_string = "555-444-5555";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:130:
array (size=1)
0 => string '-' (length=1)
\s Any whitespace character
Can we match whitespace characters? Yes we can.
$pattern = "#\s#";
$search_string = "555 444-5555";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:137:
array (size=1)
0 => string ' ' (length=1)
We can build on. For example, a digit followed by a space.
$pattern = "#\d\s#";
$search_string = "555 444-5555";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:143:
array (size=1)
0 => string '5 ' (length=2)
\S Any non-whitespace character
This will match any character that’s not a space.
$pattern = "#\d\S#";
$search_string = "555 444-5555";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
It will not match the 5
but will match 55
since 5
is a non-whitespace character.
/app/98 Functions/PregMatch.php:150:
array (size=1)
0 => string '55' (length=2)
Remember that it will match 44
and 4-
but it only returns the first match.
\w Any word character (letter, number, underscore)
This is the equivalent of the following pattern: [a-zA-Z0-9_]
. It’s just a shortcut since it’s frequently used.
$pattern = "#\w#";
$search_string = "Hello there. My name is Dino. I'm 32.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:157:
array (size=1)
0 => string 'H' (length=1)
\W Any non-word character
You guessed it. It’s the opposite of \w
. It’s the equivalent of: [^a-zA-Z0-9_]
$pattern = "#\W#";
$search_string = "Hello there. My name is Dino. I'm 32.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:164:
array (size=1)
0 => string ' ' (length=1)
(…) Capture everything enclosed
This is where it gets a little interesting. We’ve looked at patterns inside brackets, but what if we wanted to search for every instance that occurs? Let’s see what happens when we look at our $matches
array.
$pattern = "#(th)#";
$search_string = "Hello there. I'm one of the people here.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:170:
array (size=2)
0 => string 'th' (length=2)
1 => string 'th' (length=2)
We get two matches. The first is the full pattern and the second is the sub-pattern. It’s easier to explain that in the next section.
(a|b) a or b
This is one of my favorite uses of this pattern. The |
signifies or. This one will match a
or b
. If we wanted to match both Dino
and dino
, we could use (D|d)ino
. The full pattern is Dino
and the first occurrence of the sub-pattern is D
. The full pattern is stored in $matches[0]
and the first matching sub-pattern will be stored in $matches[1]
.
$pattern = "/(D|d)ino/";
$search_string = "Hello there. My name is Dino. I'm 32.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:183:
array (size=2)
0 => string 'Dino' (length=4)
1 => string 'D' (length=1)
The full pattern is Dino
so it’s stored in $matches[0]
and the first occurring sub-pattern is D
so it’s stored in 1
. Remember that (
signifies the start of a sub-pattern and )
signifies the end of a sub-pattern.
What if the full pattern is not matched? For example, I misspelled my name and entered Din
in the string.
$pattern = "/(D|d)ino/";
$search_string = "Hello there. My name is Din. I'm 32.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:189:
array (size=0)
empty
The result is an empty array. It didn’t find the full pattern so it could not find a sub-pattern within the full pattern either. The letter D
does exist, but it doesn’t exist in the string.
I keep saying first sub-pattern, but it’s the sub-pattern in the first full-pattern. Let’s take a look. I’ll use a string that has both the name Dino
and dino
.
$pattern = "/(D|d)ino/";
$search_string = "Hello there. My name is Dino or dino. I'm 32.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:195:
array (size=2)
0 => string 'Dino' (length=4)
1 => string 'D' (length=1)
The result is still the same. Even though there exists a pattern with Dino
and dino
, we’re only looking for the first sub-pattern. We can illustrate how we can have multiple sub-patterns inside of the full pattern.
$pattern = "/(D|d)(i)no/";
$search_string = "Hello there. My name is Dino or dino. I'm 32.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
Our full pattern is /(D|d)(i)no/
. It means, look for Dino
or dino
. The first sub-pattern will be a D
or d
. The second sub-pattern will be an i
.
/app/98 Functions/PregMatch.php:201:
array (size=3)
0 => string 'Dino' (length=4)
1 => string 'D' (length=1)
2 => string 'i' (length=1)
You can now see that $matches[0]
contains the full pattern match, $matches[1]
contains the first sub-pattern matching for D
or d
, and $matches[2]
contains the second sub-pattern matching i
.
Just in case you were wondering, the order is important. The following pattern /(i)(D|d)no/
is not the same as /(D|d)(i)no/
.
Quantifiers
You might have noticed that when we use [a-z], that means match a single character exactly one time. If we wanted to see if a number repeats multiple times, we would need to keep repeating the pattern
The typical phone number in the United States follows the following pattern: 3 digits
, followed by -
, followed by another 3 digits
, followed by another -
, and lastly 4 digits
. There’s a lot more to it, but that’s what we’re going to try and match with what we know so far.
555–555–5555
Our pattern will look something like this:
#\d\d\d-\d\d\d-\d\d\d\d#
$pattern = "#\d\d\d-\d\d\d-\d\d\d\d#";
$search_string = "Hello there. My name is Dino. My number is 555-444-5555.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:214:
array (size=1)
0 => string '555-444-5555' (length=12)
That’s kind of long. Before we look at simplifying it, let’s take a look at the quantifiers.
a?
Zero or one ofa
a*
Zero or more ofa
a+
One or more ofa
a{3}
Exactly 3 ofa
a{3,}
3 or more ofa
a{3,6}
Between3
and6
ofa
It’s time to simplify our example.
$pattern = "#\d{3}-\d{3}-\d{4}#";
$search_string = "Hello there. My name is Dino. My number is 555-444-5555.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
We get the same result.
/app/98 Functions/PregMatch.php:220:
array (size=1)
0 => string '555-444-5555' (length=12)
#\d{3}-\d{3}-\d{4}#
means exactly 3
digits, followed by -
, followed by exactly 3
digits, followed by -
, followed by another 4
digits.
We’ll do one more example that encompassed a bunch of these as the final end to this article.
/[G-I].{4}\s\w*\.\W?\D{1,}\d*-\w*\s.*/
I know. Kind of crazy. But we can walk through it and it should be simple.
What it says is:
[G-I]
Start with an uppercase character between G and I.{4}
means that we’re going to look for any 4 characters next.\s
is trying to find a space next.\w*
is looking for any word character zero or more times\.
means that we’re trying to find an actual period.\W?
is then continuing with any non-word character, like a space, zero or more times.\D{1,}
is searching for a non-digit that occurs at least 1 time.\d*
means that we’re next searching for a digit that occurs zero or more times.-
is just a simple dash.\w*
is looking for any word character zero or more times again.\s
searches for a space..*
is looking for any character zero or more times.
That pattern will match the following string:
Hello there. My name is Dino. I’m 32-years old. Pretty cool.
[G-I]
matches H.{4}
matches ello\s
matches (space after Hello)\w*
matches there\.
matches .\W?
matches (space after there.)\D{1,}
matches I’m (space after I’m)\d*
matches 32-
matches –\w*
matches years\s
matches (space after years).*
matches old. Pretty cool.
$pattern = "#[G-I].{4}\s\w*\.\W?\D{1,}\d*-\w*\s.+#";
$search_string = "Hello there. My name is Dino. I'm 32-years old. Pretty cool.";
preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:226:
array (size=1)
0 => string 'Hello there. My name is Dino. I'm 32-years old. Pretty cool.' (length=60)
And I think we’ve covered all of the important things. You can now look up any regular expression patterns that you’d like and be able to understand them, or even better, create your own.
https://github.com/dinocajic/php-youtube-tutorials?source=post_page—–b4b23869261a——————————-
USING TRIM() FOR INPUT SANITIZATION, HTMLSPECIALCHARS() FOR SECURITY, AND __CALL() FOR DYNAMIC METHOD HANDLING, PHP EMPOWERS DEVELOPERS WITH A TRIO OF VERSATILE TOOLS.
PHP – P101: BUILT IN FUNCTIONS
Through the last 100 PHP articles, we’ve used all sorts of built in functions. It’s time to collect them in one place, give a couple of examples, and add any additional ones that I might have missed.
enabling precise pattern matching and text manipulation
PHP – P102: built in functions preg match
A regular expression is just a sequence of characters that is used to search for specific patterns in a string.
EFFICIENT FILTERING AND VALIDATION
PHP – P103: BUILT IN FUNCTIONS FILTER_VAR
PHP’s filter_var simplifies data validation by efficiently filtering and validating user inputs against predefined or custom filters in PHP.