Sunday, 13 March 2011
Regular expressions basics
Regular expression types
There are 2 types of regular expressions:
- POSIX Extended
- Perl Compatible
The ereg, eregi, ... are the POSIX versions and preg_match, preg_replace, ... are the Perl version. It is important that using Perl compatible regular expressions the expression should be enclosed in the delimiters, a forward slash (/), for example. However this version is more powerful and faster as well than the POSIX one.
The regular expressions basic syntax
To use regular expressions first you need to learn the syntax of the patterns. We can group the characters inside a pattern like this:
- Normal characters which match themselves like hello
- Start and end indicators as ^ and $
- Count indicators like +,*,?
- Logical operator like |
- Grouping with {},(),[]
An example pattern to check valid emails looks like this:
Code:^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$
The code to check the email using Perl compatible regular expression looks like this:
//code1
$pattern = "/^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$/";
$email = "jaison@demo.com";
if (preg_match($pattern,$email)) echo "Match";
else echo "Not match";
//code2
$pattern = "^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$";
$email = "jaison@demo.com";
else echo "Not match";
Regular expression (pattern) | Match (subject) | Not match (subject) | Comment |
world | Hello world | Hello Jim | Match if the pattern is present anywhere in the subject |
^world | world class | Hello world | Match if the pattern is present at the beginning of the subject |
world$ | Hello world | world class | Match if the pattern is present at the end of the subject |
world/i | This WoRLd | Hello Jim | Makes a search in case insensitive mode |
^world$ | world | Hello world | The string contains only the "world" |
world* | worl, world, worlddd | wor | There is 0 or more "d" after "worl" |
world+ | world, worlddd | worl | There is at least 1 "d" after "worl" |
world? | worl, world, worly | wor, wory | There is 0 or 1 "d" after "worl" |
world{1} | world | worly | There is 1 "d" after "worl" |
world{1,} | world, worlddd | worly | There is 1 ore more "d" after "worl" |
world{2,3} | worldd, worlddd | world | There are 2 or 3 "d" after "worl" |
wo(rld)* | wo, world, worldold | wa | There is 0 or more "rld" after "wo" |
earth|world | earth, world | sun | The string contains the "earth" or the "world" |
w.rld | world, wwrld | wrld | Any character in place of the dot. |
^.{5}$ | world, earth | sun | A string with exactly 5 characters |
[abc] | abc, bbaccc | sun | There is an "a" or "b" or "c" in the string |
[a-z] | world | WORLD | There are any lowercase letter in the string |
[a-zA-Z] | world, WORLD, Worl12 | 123 | There are any lower- or uppercase letter in the string |
[^wW] | earth | w, W | The actual character can not be a "w" or "W" |
Subscribe to:
Posts (Atom)