Code Butchering: regular expressions

Showing posts with label regular expressions. Show all posts

Wednesday, September 29, 2010

Javascript - strip off illegal characters from string

Recently had to come up with a piece of javascript to strip off a set of illegal characters from strings before passing down to the persistance layer.

Took me a while to come up with a regex for the replace, not because it's particularly difficult, but because I suck at regexes (and I am no js expert either).

I thought it could be handy to have this functionality as a string prototype:

// strips off illegal chars &%$
String.prototype.stripOffIllegalChars = function() {
 return this.replace(/[&%$]/g, "");
}

The /g above means that the replace will be global (so not just the first of those characters will be replaced).

It can be used like this on any string:

var dirtyString = "blah$blah%blah&";
var cleanString = dirtyString.stripOffIllegalChars();

Hopefully it'll save some time to the next in line.

Friday, April 3, 2009

Definitive Javascript RegEx Validation for Butchers

This comes back to bother me now and then - so I decided to put together a few snippets to use as base in case of client validation with RegExes.

First of all the core snippet, which takes text to validate and a regex and returns a boolean:


function regexMatch(regEx, stringToValidate)
{
   var oREGEXP = new RegExp(regEx);
   return oREGEXP.test(stringToValidate);
};

Often you will be validating a text field - so here's other two functions using the previous one:


function textFieldVisualRegexValidation(textElement, regEx)
{  
   var returnValue = false;
  
   if (regexMatch(regEx, textElement.value))
   {
       textElement.style.backgroundColor = "green";
       returnValue = true;
   }
   else
   {
       textElement.style.backgroundColor = "red";
   }
  
   return returnValue; 
};

function textFieldRegexMatch(ctrlName, regEx)
{  
   var elem = document.getElementById(ctrlName);
  
   return textFieldVisualRegexValidation(elem, regEx);
};

Some event will call textFieldRegexMatch triggering the validation. You can customize textFieldVisualRegexValidation to perform some action in case of validation succeeded or failed (I am setting the input field background to green.red but one could swap images or whatever).

You can hook up the above from (for example) the onBlur event of one of your textBoxes or any input field:


onblur="textFieldRegexMatch('yourInputFieldID', regExPattern)"

You obviously have to declare somewhere your regExPattern:


const regExPattern = "^[A-Z,a-z,0-9]{1,12}$";

I am sure there are better ways of doing the above but this is just meant as a reference to brutally Copy-Paste and tailor to your needs.

Sunday, July 20, 2008

[.NET] How to Reformat strings with Regular Expressions

.NET has great RegEx support, bad thing is in order to use this features you have to know about regular expressions (or you can just google validation patterns like crazy as I do).

In a recent post we covered how to validate input against a regex pattern How to Validate Strings with RegEx , now we'll see how to reformat strings using Regular Expressions.

Instead of using the System.Text.RegularExpressions.Regex.IsMatch (jeez) method that just checks for a match giving you back a boolean you can use the System.Text.RegularExpressions.Regex.Match (jeez) method which gives you back a match object stuffed with useful crap.

Let'see an example:


//include this statement
using System.Text.RegularExpressions;

//...

string funkyString = "Scuffia likes moby dick";
string reformattedString = "";

//we use round brackets in the reg ex to create groups
Match funkyMatch = Regex.Match(funkyString, @"^(Scuffia) (likes) (moby) (dick)$");

if (funkyMatch.Success)//success property tells us if we have any match
{
  reformattedString = String.Format("{0} {1} {2}", funkyMatch.Groups[1], funkyMatch.Groups[2], funkyMatch.Groups[4]);
  Console.WriteLine(reformattedString);
}

The output of this example app should read something (exactly) like "Scuffia likes dick".

As discussed above the Match object is being filled with useful stuff. You can access the Success property in order to check if we have any match, if so you can reformat the input string accessing "Groups" - assigned using the round brackets in the regular expression. Notice the group array is zero based but in the first element (index=0) we have the whole matched expression - we can access the actual sub-strings starting from the second element of the Groups array (index = 1).

I must admit, I really suck at regular expressions (I hate them) so my example is really lame, I'll give you that.

Anyway - a couple of days ago Scuffia turned 26 - happy birthday Scuffia!

Sunday, July 13, 2008

[.NET] How to Validate strings with Regular Expressions

Hi fellas,

a tiny code-snippet to show how to use regular expressions to validate strings in .NET:


//include this
using System.Text.RegularExpressions;
//...
string myRegEx = @"^ScuffiaIsOneBigFatFaggot$";
string myStringToValidate;
//...
//fill string to validate from input or whatever
//...
if (Regex.IsMatch(myRegEx, myStringToValidate))
Console.WriteLine("String is valid!");
else
Console.WriteLine("String is rubbish");
//...

It's worth to underline that Regular expressions are case-sensitive, even in VB. I hate writing Regular Expressions, but Scuffia is pretty good at it so when I can't find stuff on google I always bug him when I need to validate some data. For example a couple of weeks ago I needed to validate an email and I asked him for help.
He sent this regular expression straightaway:


^([\w-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))
([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$

Writing regular expressions is a mess but reading one if someone else wrote it is definitely something you don't wanna even think about: gotta have blind faith sometimes in this job - just remember to throw some bones to them SQA people (they're paid to break what us developers put together after all).

Thanx Scuffia, the email validation reg ex is working pretty well.

Monday, March 10, 2008

Regular Expression Validator

This is a simple application of the Javascript Regular Expression matcher.
Digg it!

Try some regular expressions:

Email Address
Random sequence of a,b characters
IP address

Here you find the code. If you need an explanation, comment below!


<script language="javascript">
function regexMatch()
{
  var isIE = (window.ActiveXObject)?true:false;
  var attributeClass = (isIE)?"className":"class";

 var t1 = document.getElementById("regexField");
 var t2 = document.getElementById("string");
 var strPattern = "^"+t1.value+"$";
 var oTest = t2.value;

 var oREGEXP = new RegExp(strPattern);
 if (oREGEXP.test(oTest))
 {
    t2.setAttribute(attributeClass,"right");
    }
 else
 {
    t2.setAttribute(attributeClass,"wrong");
    }
}
</script>

<style>
.right{background-color:#33FF33;}
.wrong{background-color:#FF5555;}
</style>


<form name="formRegEx" onSubmit="javascript:regexMatch(); return false;">
   <input type="text" id="regexField"/>
   <input type="text" id="string" />
   <input type="button" onClick="javascript:regexMatch();" value="check"/>
</form>

Tuesday, February 26, 2008

Regular Expressions: learning with an email regex

In computing, regular expressions provide a concise and flexible means for identifying text of interest, such as particular characters, words, or patterns of characters. Regular expressions are written in a formal language that can be interpreted by a regular expression processor, a program that either serves as a parser generator or examines text and identifies parts that match the provided specification.

Wikipedia

So a regular expression is a way to identify a definite sequence of characters, useful in the search inside a long text or to validate a user input.
The "simplest" (are we sure??) example is the processing of an email address obtained by a user input from a, e.g., registration form.
Email adress are composed in this way:

alphanumeric characters mixed with (. and/or - and/or _) not in the start/end

alphanumeric characters and/or (. and/or - and/or _) not in the start/end

. followed by 2/4 letters

A valid representetion for this kind of regex sounds like this:

^[a-z0-9]+([\._-]*[a-z0-9]+)*@[a-z0-9]+([\._-]*[a-z0-9]+)*\.[a-z]{2,4}$

I'm not blind, this is surely not cool, but in few lines I'll explain you what this mess means.

Firs of all the ^ and $ characters stands for the start and the end of the searched sequence: they are mandatory, infact the regex ^Hello finds all strings that begin with Hello, while end$ those which ends with end, middle stands for sequences with one or more occurence of the word middle and at last ^just this$ match correctly only the just this string.

Square brackets [ ], in couple, stands for a set of characters, so for example the regex [12345] match every sequence that contains al least one number between 1 or 5, so are correctly matched 1hello and abcde51z2 but not 6a.
Of course, using a back slash \ you can use all protected characters for your sequences.
For numbers and letters, you can use the - to obtain a range of characters ([a-z] stands for the whole alphabet in small caps).

The + after a sequence means that that sequence should be repeated at least once, while the * states that that sequence can be present 0 or more times; moreover the ? means that the preceding sequence is optional, so it can appear 0 or 1 times.

In the above regex I have written a backslahed dot, because the . is a special character, meaning wathever character except for new line caracter (\n\r or \n\n or \r\n depending on your operating system). Infact the regex ^.+$ recognizes every strings, except one with only new lines or null.

Ending we can group together diverse sequences with round brackets ( ).

And now a brief explaination of the complex regex of an email addess:

^: beginning of the sequence
[a-z0-9]+: the first past begins with an alphanumeric character (one or more)
( [\._-]*[a-z0-9]+ )*: the first part can contain dots, underscores and dashes but they must be followed by alphanumeric characters (it can end with a non alphanumeric); moreover this kind of sequence could not be present (*), so the previous part can recognize alone a simple email address without non alphanumerical characters (such as pippo82@x.us)
@: simply the @ character
[a-z0-9]+([\._-]*[a-z0-9]+)*: the same as above
\.[a-z]{2,4}: a dot followed by a simple sequence of small caps letters from 2 to 4 units
$: end of the sequence

RegExp
String