Built In Functions

enabling precise pattern matching and text manipulation

Continuing on with our built-in functions with preg_match. To review the previous functions, read P101.

https://blog.devgenius.io/php-p101-trim-htmlspecialchars-and-call-built-in-functions-7b4f25728db9

  • trim(), ltrim()rtrim()
  • htmlspecialchars()
  • __call()
  • preg_match()
  • filter_var()
  • addslashes()
  • str_replace()
  • strlen()
  • strtolower()
  • strtoupper()
  • ucfirst()
  • strpos(), stripos(), strrpos(), strripos()
  • Array Functions like: array_chunk(), array_diff(), array_key_exists(), array_key_first(), array_key_last(), array_map(), array_merge(), array_push(), array_sum(), asort(), arsort(), count(), in_array(), ksort(), krsort(), sort(), rsort(), shuffle(), sizeof(), is_array(), explode(), implode()
  • Magic Methods like: __invoke()__toString()

Regex Intro

I have been debating whether to even cover this one since it requires knowledge of regular expressions. We want to be as thorough as possible, so we’ll need to include some basic knowledge of regular expressions. A regular expression is just a sequence of characters that is used to search for specific patterns in a string.

A simple pattern might be /Dino/ where we search for Dino in some string like, Hey there Dino. How's it going?

What is the forward slash? It’s what’s called a delimeter. Each pattern must begin and end with a delimeter. You can choose a delimeter of your choice, but it cannot be:

  • alphanumeric characters, i.e. A or 3
  • backslash, \
  • whitespace

Frequently used delimeters are:

  • forward slashes, /
  • hash symbols, #
  • tildes, ~

Some examples of delimeters:

  • /Dino/
  • #Dino#
  • %Dino%
  • {Dino}
  • ~Dino~

Why not just choose one and stick to it? Simple. Readability. If you’re using the delimeter inside of your pattern, you will have to escape it. Let’s say that we want to look for Dino/Jeff. If you were to start with a forward slash, PHP would think that the forward slash in your pattern is the closing symbol. You’ll need to escape it: /Dino\/Jeff/. Or you could just use another delimiter: #Dino/Jeff#. That way you don’t have to escape the forward slash.

preg_match()

Let’s introduce the preg_match function. We’re going to look at the first two parameters, which is the pattern and search-string respectively. Knowing what we know so far, let’s look at an example.

<?php

$pattern = "/Dino/";
$search_string = "Hey. My name is Dino. How are you?";

var_dump( preg_match($pattern, $search_string) );

The result will be 1 if the pattern matches inside of the string, 0 if there are no matches, and false if there was an error.

/app/98 Functions/PregMatch.php:6:int 1

We get 1 as expected. Is the pattern case sensitive? What would we get if we searched for /dino/ instead of /Dino/?

<?php

$pattern = "/dino/";
$search_string = "Hey. My name is Dino. How are you?";

var_dump( preg_match($pattern, $search_string) );

The result will actually be a 0. It is in fact case sensitive.

The third parameter is where you would store your matched results. If we included a variable called $matches there, it would store all of our matches there. We don’t have to initialize the $matches variable.

<?php

$pattern = "/Dino/";
$search_string = "Hey. My name is Dino. How are you?";

var_dump( preg_match($pattern, $search_string, $matches) );
var_dump( $matches );

The function preg_match still returns 1, but it also stores the results inside of the $matches array.

/app/98 Functions/PregMatch.php:6:int 1
/app/98 Functions/PregMatch.php:7:
array (size=1)
  0 => string 'Dino' (length=4)

Common Regex Patterns

Here is a quick cheat sheet for some common regular expression patterns. We’re going to look at how these patterns behave as we’re going through them. If you need a full list of what characters do, check out this awesome RegEx cheatsheet.

https://quickref.me/regex?source=post_page—–b4b23869261a——————————–

[abc] A single character: a, b or c

Square brackets indicate a range. In the first example, we’re going to match any character, that’s either a, b, or c.

$pattern = "/[abc]/";
$search_string = "Hey. My name is Dino. How are you?";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:14:
array (size=1)
  0 => string 'a' (length=1)

We get a response that there is a character a matched. What if we did something like [xyz]?

$pattern = "/[xyz]/";
$search_string = "Hey. My name is Dino. How are you?";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

There is no x, but there is a y, and it’s matched.

/app/98 Functions/PregMatch.php:14:
array (size=1)
  0 => string 'y' (length=1)

[^abc] Any single character except a, b, or c

Next, to search for any other character other than the ones specified, we can add the ^ symbol. This effectively means “not.”

$pattern = "/[^xyz]/";
$search_string = "Hello there.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:21:
array (size=1)
  0 => string 'H' (length=1)

The H is the first letter that matches the pattern.

[a-z] Any single character in the range a-z

We can specify ranges with a -. This pattern will match any lowercase letter between a and z.

$pattern = "/[a-z]/";
$search_string = "Hello there.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:28:
array (size=1)
  0 => string 'e' (length=1)

It skips the first letter, H, since it’s not in the range that we were looking for. It is frequently done this way, but it doesn’t have to be. What if we only wanted to search for a range from s to y? We can do that.

$pattern = "/[s-y]/";
$search_string = "Hello there.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:34:
array (size=1)
  0 => string 't' (length=1)

The t exists in there, so we get a positive match. How about b through e and s through y?

$pattern = "/[b-es-y]/";
$search_string = "Hello there.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

This time the e matches in Hello.

/app/98 Functions/PregMatch.php:34:
array (size=1)
  0 => string 'e' (length=1)

[a-zA-Z] Any single character in the range a-z or A-Z

What about a sequence of characters both upper and lowercase? Not too different.

$pattern = "/[a-zA-Z]/";
$search_string = "Hello there.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:47:
array (size=1)
  0 => string 'H' (length=1)

Again, we don’t have to use the full range of all uppercase and lowercase letters.

$pattern = "/[b-dD-G]/";
$search_string = "My name is Dino.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:47:
array (size=1)
  0 => string 'D' (length=1)

^ Start of line

Didn’t we just say that ^ means not? We did, but that’s when it’s inside of brackets. When it’s outside of the brackets, it indicates the start of the line. For example, let’s say that we wanted to search if the string starts with the word Hello.

$pattern = "/Hello/";
$search_string = "Hello there.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

In this scenario, we did not use the ^ symbol. Do we get a match? Yes.

/app/98 Functions/PregMatch.php:60:
array (size=1)
  0 => string 'Hello' (length=5)

What if we had the following string. My name is Dino. Hello there. Will it match? Yes, again. But what if we insisted that this string starts with Hello. Normally you would say hi and then introduce yourself. You would need to use the ^ symbol.

$pattern = "/^Hello/";
$search_string = "My name is Dino. Hello there.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

In this instance, you get no matches.

/app/98 Functions/PregMatch.php:60:
array (size=0)
  empty

If our string started with Hello, you would get a match.

$pattern = "/^Hello/";
$search_string = "Hello there. My name is Dino.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:66:
array (size=1)
  0 => string 'Hello' (length=5)

$ End of line

The opposite of ^ is $. This signifies a match that occurs at the end of a string. For example, let’s see if the end of the string is a number. We’ll start without the $ sign.

$pattern = "/[0-9]/";
$search_string = "Hello there. I'm 32. My name is Dino.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

Do we get a match? Of course.

/app/98 Functions/PregMatch.php:73:
array (size=1)
  0 => string '3' (length=1)

But we need that number to appear at the end of the string.

$pattern = "/[0-9]$/";
$search_string = "Hello there. I'm 32. My name is Dino.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

This time, we get no match. If we move the number to the end, we will get a proper response.

$pattern = "/[0-9]$/";
$search_string = "Hello there. My name is Dino. I'm 32";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:73:
array (size=1)
  0 => string '2' (length=1)

You can see that 2 is matched this time since it’s looking at the last digit in the string.

You might have caught that I removed the period (.) and that’s a great observation. We’ll cover that next.

. Any single character

That’s right. The period matches any single character. Let’s revisit the same example and see what happens if we put the period in there.

$pattern = "/[0-9].$/";
$search_string = "Hello there. My name is Dino. I'm 32.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

It seems like it’s doing what is intended. It’s matching [0-9] followed by a period at the end of the line. We should get 2. and we do.

/app/98 Functions/PregMatch.php:86:
array (size=1)
  0 => string '2.' (length=2)

But that’s not the way to match the period at the end of a string. What if we were missing the period from our string? What would be the match?

$pattern = "/[0-9].$/";
$search_string = "Hello there. My name is Dino. I'm 32";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:92:
array (size=1)
  0 => string '32' (length=2)

It’s 32? How is that 32? Remember, the . matches any character. So, we match 3 followed by any other character at the end of a string. That means that if we made a mistake and added a latter a at the end of a string, we would still match correctly.

$pattern = "/[0-9].$/";
$search_string = "Hello there. My name is Dino. I'm 32a";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

In this instance, it will match 2a.

/app/98 Functions/PregMatch.php:98:
array (size=1)
  0 => string '2a' (length=2)

But, we need to match the period. It needs to end with a number followed by an actual period. We just need to escape that character.

/app/98 Functions/PregMatch.php:110:
array (size=1)
  0 => string '2.' (length=2)

\d Any digit

We don’t have to use the pattern [0–9] to match all of the digits. We can simplify that with \d. To recreate our example above where we search for any digit followed by a period at the end of the string, we can use the following pattern: /\d\.$/

// \d Any digit
$pattern = "/\d\.$/";
$search_string = "Hello there. My name is Dino. I'm 32.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

For the sake of readability, let’s change our delimiter to #.

// \d Any digit
$pattern = "#\d\.$#";
$search_string = "Hello there. My name is Dino. I'm 32.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:117:
array (size=1)
  0 => string '2.' (length=2)

\D Any non-digit

We can also match any non-digit character. Let’s say that we want to accept a phone number, but it must not contain anything else other than numbers. i.e. 5554445555 is valid but 555–444–5555 is invalid.

$pattern = "#\D#";
$search_string = "5554445555";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

This will return an empty $matches array, which is what we want.

/app/98 Functions/PregMatch.php:124:
array (size=0)
  empty

If we add some non-digit characters inside of our string, our pattern will match.

$pattern = "#\D#";
$search_string = "555-444-5555";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:130:
array (size=1)
  0 => string '-' (length=1)

\s Any whitespace character

Can we match whitespace characters? Yes we can.

$pattern = "#\s#";
$search_string = "555 444-5555";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:137:
array (size=1)
  0 => string ' ' (length=1)

We can build on. For example, a digit followed by a space.

$pattern = "#\d\s#";
$search_string = "555 444-5555";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:143:
array (size=1)
  0 => string '5 ' (length=2)

\S Any non-whitespace character

This will match any character that’s not a space.

$pattern = "#\d\S#";
$search_string = "555 444-5555";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

It will not match the 5 but will match 55 since 5 is a non-whitespace character.

/app/98 Functions/PregMatch.php:150:
array (size=1)
  0 => string '55' (length=2)

Remember that it will match 44 and 4- but it only returns the first match.

\w Any word character (letter, number, underscore)

This is the equivalent of the following pattern: [a-zA-Z0-9_]. It’s just a shortcut since it’s frequently used.

$pattern = "#\w#";
$search_string = "Hello there. My name is Dino. I'm 32.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:157:
array (size=1)
  0 => string 'H' (length=1)

\W Any non-word character

You guessed it. It’s the opposite of \w. It’s the equivalent of: [^a-zA-Z0-9_]

$pattern = "#\W#";
$search_string = "Hello there. My name is Dino. I'm 32.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:164:
array (size=1)
  0 => string ' ' (length=1)

(…) Capture everything enclosed

This is where it gets a little interesting. We’ve looked at patterns inside brackets, but what if we wanted to search for every instance that occurs? Let’s see what happens when we look at our $matches array.

$pattern = "#(th)#";
$search_string = "Hello there. I'm one of the people here.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:170:
array (size=2)
  0 => string 'th' (length=2)
  1 => string 'th' (length=2)

We get two matches. The first is the full pattern and the second is the sub-pattern. It’s easier to explain that in the next section.

(a|b) a or b

This is one of my favorite uses of this pattern. The | signifies or. This one will match a or b. If we wanted to match both Dino and dino, we could use (D|d)ino. The full pattern is Dino and the first occurrence of the sub-pattern is D. The full pattern is stored in $matches[0] and the first matching sub-pattern will be stored in $matches[1].

$pattern = "/(D|d)ino/";
$search_string = "Hello there. My name is Dino. I'm 32.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:183:
array (size=2)
  0 => string 'Dino' (length=4)
  1 => string 'D' (length=1)

The full pattern is Dino so it’s stored in $matches[0] and the first occurring sub-pattern is D so it’s stored in 1. Remember that ( signifies the start of a sub-pattern and ) signifies the end of a sub-pattern.

What if the full pattern is not matched? For example, I misspelled my name and entered Din in the string.

$pattern = "/(D|d)ino/";
$search_string = "Hello there. My name is Din. I'm 32.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:189:
array (size=0)
  empty

The result is an empty array. It didn’t find the full pattern so it could not find a sub-pattern within the full pattern either. The letter D does exist, but it doesn’t exist in the string.

I keep saying first sub-pattern, but it’s the sub-pattern in the first full-pattern. Let’s take a look. I’ll use a string that has both the name Dino and dino.

$pattern = "/(D|d)ino/";
$search_string = "Hello there. My name is Dino or dino. I'm 32.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:195:
array (size=2)
  0 => string 'Dino' (length=4)
  1 => string 'D' (length=1)

The result is still the same. Even though there exists a pattern with Dino and dino, we’re only looking for the first sub-pattern. We can illustrate how we can have multiple sub-patterns inside of the full pattern.

$pattern = "/(D|d)(i)no/";
$search_string = "Hello there. My name is Dino or dino. I'm 32.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

Our full pattern is /(D|d)(i)no/. It means, look for Dino or dino. The first sub-pattern will be a D or d. The second sub-pattern will be an i.

/app/98 Functions/PregMatch.php:201:
array (size=3)
  0 => string 'Dino' (length=4)
  1 => string 'D' (length=1)
  2 => string 'i' (length=1)

You can now see that $matches[0] contains the full pattern match, $matches[1] contains the first sub-pattern matching for D or d, and $matches[2] contains the second sub-pattern matching i.

Just in case you were wondering, the order is important. The following pattern /(i)(D|d)no/ is not the same as /(D|d)(i)no/.

Quantifiers

You might have noticed that when we use [a-z], that means match a single character exactly one time. If we wanted to see if a number repeats multiple times, we would need to keep repeating the pattern

The typical phone number in the United States follows the following pattern: 3 digits, followed by -, followed by another 3 digits, followed by another -, and lastly 4 digits. There’s a lot more to it, but that’s what we’re going to try and match with what we know so far.

555–555–5555

Our pattern will look something like this:

#\d\d\d-\d\d\d-\d\d\d\d#

$pattern = "#\d\d\d-\d\d\d-\d\d\d\d#";
$search_string = "Hello there. My name is Dino. My number is 555-444-5555.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:214:
array (size=1)
  0 => string '555-444-5555' (length=12)

That’s kind of long. Before we look at simplifying it, let’s take a look at the quantifiers.

  • a? Zero or one of a
  • a* Zero or more of a
  • a+ One or more of a
  • a{3} Exactly 3 of a
  • a{3,} 3 or more of a
  • a{3,6} Between 3 and 6 of a

It’s time to simplify our example.

$pattern = "#\d{3}-\d{3}-\d{4}#";
$search_string = "Hello there. My name is Dino. My number is 555-444-5555.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );

We get the same result.

/app/98 Functions/PregMatch.php:220:
array (size=1)
  0 => string '555-444-5555' (length=12)

#\d{3}-\d{3}-\d{4}# means exactly 3 digits, followed by -, followed by exactly 3 digits, followed by -, followed by another 4 digits.

We’ll do one more example that encompassed a bunch of these as the final end to this article.

/[G-I].{4}\s\w*\.\W?\D{1,}\d*-\w*\s.*/

I know. Kind of crazy. But we can walk through it and it should be simple.

What it says is:

  • [G-I] Start with an uppercase character between G and I
  • .{4} means that we’re going to look for any 4 characters next.
  • \s is trying to find a space next.
  • \w* is looking for any word character zero or more times
  • \. means that we’re trying to find an actual period.
  • \W? is then continuing with any non-word character, like a space, zero or more times.
  • \D{1,} is searching for a non-digit that occurs at least 1 time.
  • \d* means that we’re next searching for a digit that occurs zero or more times.
  • - is just a simple dash.
  • \w* is looking for any word character zero or more times again.
  • \s searches for a space.
  • .* is looking for any character zero or more times.

That pattern will match the following string:

Hello there. My name is Dino. I’m 32-years old. Pretty cool.

  • [G-I] matches H
  • .{4} matches ello
  • \s matches (space after Hello)
  • \w* matches there
  • \. matches .
  • \W? matches (space after there.)
  • \D{1,} matches I’m (space after I’m)
  • \d* matches 32
  • - matches 
  • \w* matches years
  • \s matches (space after years)
  • .* matches old. Pretty cool.
$pattern = "#[G-I].{4}\s\w*\.\W?\D{1,}\d*-\w*\s.+#";
$search_string = "Hello there. My name is Dino. I'm 32-years old. Pretty cool.";

preg_match($pattern, $search_string, $matches);
var_dump( $matches );
/app/98 Functions/PregMatch.php:226:
array (size=1)
  0 => string 'Hello there. My name is Dino. I'm 32-years old. Pretty cool.' (length=60)

And I think we’ve covered all of the important things. You can now look up any regular expression patterns that you’d like and be able to understand them, or even better, create your own.

https://github.com/dinocajic/php-youtube-tutorials?source=post_page—–b4b23869261a——————————-

 

Built In Functions

USING TRIM() FOR INPUT SANITIZATION, HTMLSPECIALCHARS() FOR SECURITY, AND __CALL() FOR DYNAMIC METHOD HANDLING, PHP EMPOWERS DEVELOPERS WITH A TRIO OF VERSATILE TOOLS.

PHP – P101: BUILT IN FUNCTIONS

Through the last 100 PHP articles, we’ve used all sorts of built in functions. It’s time to collect them in one place, give a couple of examples, and add any additional ones that I might have missed.

Built In Functions

enabling precise pattern matching and text manipulation

PHP – P102: built in functions preg match

A regular expression is just a sequence of characters that is used to search for specific patterns in a string.

Built In Functions

EFFICIENT FILTERING AND VALIDATION

PHP – P103: BUILT IN FUNCTIONS FILTER_VAR

PHP’s filter_var simplifies data validation by efficiently filtering and validating user inputs against predefined or custom filters in PHP.

Leave a Reply