In computing, regular expressions provide a concise and flexible means for identifying text of interest, such as particular characters, words, or patterns of characters. Regular expressions are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.
So a regular expression is a way to identify a definite sequence of characters, useful in the search inside a long text or to validate a user input.
The "simplest" (are we sure??) example is the processing of an email address obtained by a user input from a, e.g., registration form.
Email adress are composed in this way:
- alphanumeric characters mixed with (. and/or - and/or _) not in the start/end
- alphanumeric characters and/or (. and/or - and/or _) not in the start/end
- . followed by 2/4 letters
A valid representetion for this kind of regex sounds like this:
I'm not blind, this is surely not cool, but in few lines I'll explain you what this mess means.
Firs of all the ^ and $ characters stands for the start and the end of the searched sequence: they are mandatory, infact the regex ^Hello finds all strings that begin with Hello, while end$ those which ends with end, middle stands for sequences with one or more occurence of the word middle and at last ^just this$ match correctly only the just this string.
Square brackets [ ], in couple, stands for a set of characters, so for example the regex  match every sequence that contains al least one number between 1 or 5, so are correctly matched 1hello and abcde51z2 but not 6a.
Of course, using a back slash \ you can use all protected characters for your sequences.
For numbers and letters, you can use the - to obtain a range of characters ([a-z] stands for the whole alphabet in small caps).
The + after a sequence means that that sequence should be repeated at least once, while the * states that that sequence can be present 0 or more times; moreover the ? means that the preceding sequence is optional, so it can appear 0 or 1 times.
In the above regex I have written a backslahed dot, because the . is a special character, meaning wathever character except for new line caracter (\n\r or \n\n or \r\n depending on your operating system). Infact the regex ^.+$ recognizes every strings, except one with only new lines or null.
Ending we can group together diverse sequences with round brackets ( ).
And now a brief explaination of the complex regex of an email addess:
- beginning of the sequence
- the first past begins with an alphanumeric character (one or more)
( [\._-]*[a-z0-9]+ )*
- the first part can contain dots, underscores and dashes but they must be followed by alphanumeric characters (it can end with a non alphanumeric); moreover this kind of sequence could not be present (*), so the previous part can recognize alone a simple email address without non alphanumerical characters (such as email@example.com)
- simply the @ character
- the same as above
- a dot followed by a simple sequence of small caps letters from 2 to 4 units
- end of the sequence