October 12, 2009

Regex Pattern for Validating Email addresses

Interesting post here talks about how complex RFC 822 is, that defines a valid email address. But thankfully the guy found a simpler pattern from msdn and modified it to put more restriction on the regex; here it goes:-

"^([\w]+)(([-\.][\w]+)?)*@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"

However this pattern fails to approve email ids like "mygroup+subscribe@mygroups.com" which is a common pattern used by listserv email lists. There are few more cases when email bots send you emails containing reply to addresses like "mygroup+confirm-23EWEJEW233@mygroups.com" where you reply and using the session id "23EWEJEW233", your a/c is activated or something. Writely (now Google docs) uses a similar pattern allowing you to upload documents through emails; their email ids are of type "xyz+pqr-12345678901234567890-SDWEPO23@prod.writely.com"

To verify if a value is passing through this perl regex filter, i used the following command:
echo [candidate_value]|grep -P [pattern]
and then checking "echo $?", if its 0, the pattern matches, else it failed. Actually if the pattern matches, you will see the [candidate_value] printed as well.

Turns out adding support for '+' as additional symbol was the only thing required. The modified regex looks exactly the same, except one character "+" added in the sequence.

"^([\w]+)(([-\.\+][\w]+)?)*@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"
For ex:
  1. echo "abc.pp@fjdsf.sdsd.dsdsds.com"|grep -P "^([\w]+)(([-\.\+][\w]+)?)*@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"
    passes the regex
  2. echo   "abc.@fjdsf.sdsd.dsdsds.com"|grep -P "^([\w]+)(([-\.\+][\w]+)?)*@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"
    fails the regex
  3. echo "xyz+pqr-12345678901234567890-SDWEPO23@prod.writely.com"|grep -P "^([\w]+)(([-\.\+][\w]+)?)*@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([\w-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$"
    passes the regex

No comments: