Sunday, 13 March 2011

Regular expressions basics


Regular expression types 
There are 2 types of  regular expressions:
  • POSIX Extended
  • Perl Compatible
The ereg, eregi, ... are the POSIX versions and preg_match, preg_replace, ... are the Perl version. It is important that using Perl compatible regular expressions the expression should be enclosed in the delimiters, a forward slash (/), for example. However this version is more powerful and faster as well than the POSIX one.
The regular expressions basic syntax
To use regular expressions first you need to learn the syntax of the patterns. We can group the characters inside a pattern like this:
  • Normal characters which match themselves like hello
  • Start and end indicators as ^ and $
  • Count indicators like +,*,?
  • Logical operator like |
  • Grouping with {},(),[]
An example pattern to check valid emails looks like this:
Code:
^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$
The code to check the email using Perl compatible regular expression looks like this:

//code1
$pattern = "/^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$/";
$email = "jaison@demo.com";
if (preg_match($pattern,$email)) echo "Match";
else echo "Not match";

//code2
$pattern = "^[a-zA-Z0-9._-]+@[a-zA-Z0-9-]+\.[a-zA-Z.]{2,5}$";
$email = "jaison@demo.com";
if (eregi($pattern,$email)) echo "Match";
else echo "Not match";

Regular expression (pattern)
Match (subject)
Not match (subject)
Comment
worldHello worldHello JimMatch if the pattern is present anywhere in the subject
^worldworld classHello worldMatch if the pattern is present at the beginning of the subject
world$Hello worldworld classMatch if the pattern is present at the end of the subject
world/iThis WoRLdHello JimMakes a search in case insensitive mode
^world$worldHello worldThe string contains only the "world"
world*worl, world, worldddworThere is 0 or more "d" after "worl"
world+world, worldddworlThere is at least 1 "d" after "worl"
world?worl, world, worlywor, woryThere is 0 or 1 "d" after "worl"
world{1}worldworlyThere is 1 "d" after "worl"
world{1,}world, worldddworlyThere is 1 ore more "d" after "worl"
world{2,3}worldd, worldddworldThere are 2 or 3 "d" after "worl"
wo(rld)*wo, world, worldoldwaThere is 0 or more "rld" after "wo"
earth|worldearth, worldsunThe string contains the "earth" or the "world"
w.rldworld, wwrldwrldAny character in place of the dot.
^.{5}$world, earthsunA string with exactly 5 characters
[abc]abc, bbacccsunThere is an "a" or "b" or "c" in the string
[a-z]worldWORLDThere are any lowercase letter in the string
[a-zA-Z]world, WORLD, Worl12123There are any lower- or uppercase letter in the string
[^wW]earthw, WThe actual character can not be a "w" or "W"



No comments:

Post a Comment