-
Website
http://www.dominicsayers.com -
Original page
http://www.dominicsayers.com/isemail/results.php -
Subscribe
All Comments -
Community
-
Top Commenters
-
NSpeaks
1 comment · 3 points
-
sadimr9
1 comment · 1 points
-
Dominic Sayers
40 comments · 1 points
-
Stephen
1 comment · 2 points
-
niklaslj
1 comment · 2 points
-
-
Popular Threads
"test\ blah"@example.com
"test blah"@example.com
In addition some IPv6 address litteral compression corrections have been made.
You may need to retest Simon Slick code.
The intention of my head-to-head page is to compare free, open-source email validators. I will replace the Simon Slick code with the standard PHP validation, the PEAR library validation and the Drupal validation.
Pretty impressive for a single RegEx.
There is usage info and contact info for VEAF on the Simon Slick site. Just go to the VEAF tool and enter an address to see the info.
http://SimonSlick.com/VEAF/
I carefully read the usage information on the Simon Slick website before I made my decision to remove it from the comparison. The code is clearly marked All Rights Reserved and its use in commercial or redistributable applications is explicitly forbidden . I therefore have no license to use it here.
But hey, then you might have to eat those words you uttered bashing the use of RegEx to validate email address format. Could not have that now, could we.
"null \"@char.com
is valid.
The escaping of the closing double quote of a double quoted string makes the closing double quote an element of the address string and therefor there is no longer a closing double quote for the double quoted string.
This would be okay though
"null \""@char.com
+@b.c
is valid.
TLD's are at least currently a minimum of 2 characters.
So my position on this is slightly inconsistent. On the one hand I am counting numeric TLDs as invalid based on flimsy evidence from RFC 1123. On the other hand I am allowing single-character TLDs based on strict RFC interpretation.
I'm treading a very fine line between usefulness and RFC compliance. Strict compliance would make the validator less useful in my opinion (for instance allowing a TLD to have an MX).
Disagree
Angels on the head of a pin.
echo unitTest("\"\"@example.com", false, "Local part is effectively empty");
According to the quoted-string specification the portion between the double quotes can be 0 or more characters.
In other words, every component of a double quoted string is optional, with the exception of the double quotes.
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
So much for that self proclaimed 100% rating.
On this specific issue, I wrote a blog post about it some time ago: http://blog.dominicsayers.com/2009/02/23/confus...
Both RFCs 5321 and 5322 allow the empty string as an email address - you are absolutely right about that. This is another case where we need to trade correctness with usefulness. When I wrote my validator I thought that it would not be useful to allow local parts that are effectively empty, hence my decision to call this address invalid.
With hindsight and given the interest my validator has aroused, I am leaning more towards absolute RFC-compliance in my validator. It would then be open to people implementing real-world websites to add additional business rules that weed out non-useful addresses.
What do you think?
Oh but empty local part is useful.
echo unitTest("first.\"\".last@example.com", false, "Contains a zero-length element");
echo unitTest("first.last@com", false, "Mail host must be second- or lower level");
echo unitTest("test@example", false, "Dave Child says so");
Say good-bye to your self proclaimed 100%.
In the light of your work and some other questions I've received, I think there is a case for making the validator absolutely compliant with the RFCs even beyond the point of real-world common sense. We could then add back some well-chosen (optional) business rules that make the validator more useful in real applications. Two examples of this being empty local parts and TLD domain parts.
My "self-proclaimed" 100% is a measure of the validator's ability to correctly match the unit tests. Which it does in 228 out of 228 cases. How does yours do?
Better than you might think.
Of your 228 test cases, with the 4 I pointed out being corrected, plus the single digit TLD not being permitted (that is a subjective one that gets a pass either way far as I am concerned), 223 pass. So that is on par with yours (yours should be failing at least 4 test cases that it is passing). The non passes are all due to obsolete domain name forms containing embedded comments.
Well it is pretty easy to get 100% when you permit test cases to pass when they should fail. i.e. create incorrect test cases that will pass for your tool but fail for other tools that perform correctly. And then not update the test cases in a timely manner when it is pointed out. That is disingenuous.
Correct your test cases and it won't be 100% anymore. Of course I really doubt you will correct the test cases until after you update your code, so as not to have less than 100% posted.
echo unitTest("a@bar", false, "");
This makes at least 6 now that your validator should be failing on. Down to 97.4% now.
echo unitTest(" \r\n (\r\n x \r\n ) \r\n first\r\n ( \r\n x\r\n ) \r\n .\r\n ( \r\n x) \r\n last \r\n ( x \r\n ) \r\n @example.com", true, "");
RFC 5321 4.5.3.1.1. Local-part
The maximum total length of a user name or other local-part is 64 octets.
Much better than you might think.
With the 6 incorrect test cases being corrected plus the single character TLD not being permitted, passes all 228 of your test cases.
These have been reported to you before, several months ago. Do you intend to correct these, or just continue misleading people with your self-proclaimed, albeit incorrect, perfection.
***********
echo unitTest(" \r\n (\r\n x \r\n ) \r\n first\r\n ( \r\n x\r\n ) \r\n .\r\n ( \r\n x) \r\n last \r\n ( x \r\n ) \r\n @example.com", true, "");
Disagree. Local part limit is 64 octets. Expected result should be false (Invalid).
RFC 5321 4.5.3.1.1. Local-part
The maximum total length of a user name or other local-part is 64 octets.
***********
echo unitTest("\"\"@example.com", false, "Local part is effectively empty");
Disagree. Quoted local part is permitted to be empty. Expected result should be true (Valid).
According to the quoted-string specification the portion between the double quotes can be 0 or more characters.
In other words, every component of a double quoted string is optional, with the exception of the double quotes.
quoted-string = [CFWS]
DQUOTE *([FWS] qcontent) [FWS] DQUOTE
[CFWS]
***********
echo unitTest("first.last@com", false, "Mail host must be second- or lower level");
echo unitTest("test@example", false, "Dave Child says so");
echo unitTest("a@bar", false, "");
Disagree. @TLD is permitted. Expected result should be true (Valid).
@TLD is valid and permitted. RFC 5321 2.3.5 Domain Names says so.
***********
echo unitTest("first.\"\".last@example.com", false, "Contains a zero-length element");
Disagree. Empty element permitted for obs-local-part. Expected result should be true (Valid).
Zero-length element is permitted in obsolete local part (obs-local-part).
***********
echo unitTest("+@b.c", true, "TLDs can be any length");
Disagree. TLDs are currently at least 2 characters. Expected result should be false (Invalid).
All TLD's are currently a minimum of 2 characters. This is not likely change any time soon.
***********
echo unitTest("\"first.middle.last\"@example.com", true, "obs-local-part form as described in RFC 2822");
echo unitTest("\"first..last\"@example.com", true, "obs-local-part form as described in RFC 2822");
Agree. However, this is a modern double quoted string, not an obs-local-part.
***********