DISQUS

Dominic Sayers website: Dominic Sayers - RFC-compliant email address validator

  • NOYB · 7 months ago
    You have listed these as unexpected result for Simon Slick. That is not correct, Simon Slick says they are vaild, which is the correct result.
    "test\ blah"@example.com
    "test blah"@example.com

    In addition some IPv6 address litteral compression corrections have been made.

    You may need to retest Simon Slick code.
  • Dominic Sayers · 7 months ago
    The Simon Slick code is copyright All Rights Reserved. I can see no license for me to use it on my site, so I am in the process of removing it from the head-to-head.

    The intention of my head-to-head page is to compare free, open-source email validators. I will replace the Simon Slick code with the standard PHP validation, the PEAR library validation and the Drupal validation.
  • NOYB · 7 months ago
    That's too bad. When test cases for obsoletes and comments are removed, the Simon Slick Validate Email Address Format RegEx scores 100% against your test cases. The subjective single character TLD withstanding.

    Pretty impressive for a single RegEx.

    There is usage info and contact info for VEAF on the Simon Slick site. Just go to the VEAF tool and enter an address to see the info.

    http://SimonSlick.com/VEAF/
  • Dominic Sayers · 7 months ago
    I'm glad you found my test cases useful. They are released under an open source license so anybody can use them for free.

    I carefully read the usage information on the Simon Slick website before I made my decision to remove it from the comparison. The code is clearly marked All Rights Reserved and its use in commercial or redistributable applications is explicitly forbidden . I therefore have no license to use it here.
  • NOYB · 3 months ago
    No were do I see anything that would prohibit you from evaluating and comparing it. http://simonslick.com/VEAF/

    But hey, then you might have to eat those words you uttered bashing the use of RegEx to validate email address format. Could not have that now, could we.
  • NOYB · 7 months ago
    Disagree that that
    "null \"@char.com
    is valid.

    The escaping of the closing double quote of a double quoted string makes the closing double quote an element of the address string and therefor there is no longer a closing double quote for the double quoted string.

    This would be okay though
    "null \""@char.com
  • Dominic Sayers · 7 months ago
    Your analysis is correct, but the first address you quote isn't in my test suite. There is a test case with a ASCII NUL after the backslash, which is allowed by the obs-local-part EBNF in RFC 5322.
  • NOYB · 7 months ago
    Diagree that
    +@b.c
    is valid.

    TLD's are at least currently a minimum of 2 characters.
  • Dominic Sayers · 7 months ago
    Currently they are. There is no prospect of ICANN creating any single-letter TLDs and it's unlikely they will do so. The RFCs do not prevent them from doing so if they wanted to, but nor do they prevent them from creating numeric TLDs either.

    So my position on this is slightly inconsistent. On the one hand I am counting numeric TLDs as invalid based on flimsy evidence from RFC 1123. On the other hand I am allowing single-character TLDs based on strict RFC interpretation.

    I'm treading a very fine line between usefulness and RFC compliance. Strict compliance would make the validator less useful in my opinion (for instance allowing a TLD to have an MX).
  • NOYB · 7 months ago
    "can have escaped null character"

    Disagree
  • Dominic Sayers · 7 months ago
    Escaped null characters are allowed within a quoted string by the obs-qp EBNF in RFC 5322. Obscure but true.

    Angels on the head of a pin.
  • NOYB · 7 months ago
    Believe this test case is incorrect.
    echo unitTest("\"\"@example.com", false, "Local part is effectively empty");

    According to the quoted-string specification the portion between the double quotes can be 0 or more characters.
    In other words, every component of a double quoted string is optional, with the exception of the double quotes.

    quoted-string = [CFWS]
    DQUOTE *([FWS] qcontent) [FWS] DQUOTE
    [CFWS]

    So much for that self proclaimed 100% rating.
  • Dominic Sayers · 7 months ago
    Thanks for your continued interest, NOYB. It's great when we can discuss these issues openly and have a collaborative discovery process. This is only possible because we make our work freely available to our peers.

    On this specific issue, I wrote a blog post about it some time ago: http://blog.dominicsayers.com/2009/02/23/confus...

    Both RFCs 5321 and 5322 allow the empty string as an email address - you are absolutely right about that. This is another case where we need to trade correctness with usefulness. When I wrote my validator I thought that it would not be useful to allow local parts that are effectively empty, hence my decision to call this address invalid.

    With hindsight and given the interest my validator has aroused, I am leaning more towards absolute RFC-compliance in my validator. It would then be open to people implementing real-world websites to add additional business rules that weed out non-useful addresses.

    What do you think?
  • NOYB · 7 months ago
    "Both RFCs 5321 and 5322 allow the empty string as an email address - you are absolutely right about that. This is another case where we need to trade correctness with usefulness. When I wrote my validator I thought that it would not be useful to allow local parts that are effectively empty, hence my decision to call this address invalid."

    Oh but empty local part is useful.
  • Dominic Sayers · 7 months ago
    Oh, and on your final point: 227 out of 228 is still 100%. Cal Henderson's function gets one wrong too but that's still shown as 100% - maybe I should always round down rather than to the nearest integer?
  • NOYB · 7 months ago
    Here is another one that should be true for an obsolete local part (obs-local-part).

    echo unitTest("first.\"\".last@example.com", false, "Contains a zero-length element");
  • NOYB · 7 months ago
    And here are 2 more incorrect test cases. @TLD is valid and permitted. RFC 5321 2.3.5 Domain Names says so.

    echo unitTest("first.last@com", false, "Mail host must be second- or lower level");
    echo unitTest("test@example", false, "Dave Child says so");

    Say good-bye to your self proclaimed 100%.
  • Dominic Sayers · 7 months ago
    Hi NOYB, and thanks for your continued efforts in assessing these test cases.

    In the light of your work and some other questions I've received, I think there is a case for making the validator absolutely compliant with the RFCs even beyond the point of real-world common sense. We could then add back some well-chosen (optional) business rules that make the validator more useful in real applications. Two examples of this being empty local parts and TLD domain parts.

    My "self-proclaimed" 100% is a measure of the validator's ability to correctly match the unit tests. Which it does in 228 out of 228 cases. How does yours do?
  • NOYB · 7 months ago
    "How does yours do?"

    Better than you might think.

    Of your 228 test cases, with the 4 I pointed out being corrected, plus the single digit TLD not being permitted (that is a subjective one that gets a pass either way far as I am concerned), 223 pass. So that is on par with yours (yours should be failing at least 4 test cases that it is passing). The non passes are all due to obsolete domain name forms containing embedded comments.
  • NOYB · 7 months ago
    “My "self-proclaimed" 100% is a measure of the validator's ability to correctly match the unit tests. Which it does in 228 out of 228 cases.”

    Well it is pretty easy to get 100% when you permit test cases to pass when they should fail. i.e. create incorrect test cases that will pass for your tool but fail for other tools that perform correctly. And then not update the test cases in a timely manner when it is pointed out. That is disingenuous.

    Correct your test cases and it won't be 100% anymore. Of course I really doubt you will correct the test cases until after you update your code, so as not to have less than 100% posted.
  • NOYB · 7 months ago
    Just found another one. That is at least 5 now that your validator should be failing on. Down to 97.8% now.

    echo unitTest("a@bar", false, "");
  • NOYB · 7 months ago
    And here is yet another one...

    This makes at least 6 now that your validator should be failing on. Down to 97.4% now.

    echo unitTest(" \r\n (\r\n x \r\n ) \r\n first\r\n ( \r\n x\r\n ) \r\n .\r\n ( \r\n x) \r\n last \r\n ( x \r\n ) \r\n @example.com", true, "");

    RFC 5321 4.5.3.1.1. Local-part

    The maximum total length of a user name or other local-part is 64 octets.
  • NOYB · 7 months ago
    "How does yours do?"

    Much better than you might think.

    With the 6 incorrect test cases being corrected plus the single character TLD not being permitted, passes all 228 of your test cases.
  • NOYB · 3 months ago
    Seven (7) (3%) of the two hundred and twenty eight (228) the test cases in your "Email address validation test suite version 1.8" have an incorrect expected result.
    These have been reported to you before, several months ago. Do you intend to correct these, or just continue misleading people with your self-proclaimed, albeit incorrect, perfection.

    ***********

    echo unitTest(" \r\n (\r\n x \r\n ) \r\n first\r\n ( \r\n x\r\n ) \r\n .\r\n ( \r\n x) \r\n last \r\n ( x \r\n ) \r\n @example.com", true, "");

    Disagree. Local part limit is 64 octets. Expected result should be false (Invalid).

    RFC 5321 4.5.3.1.1. Local-part

    The maximum total length of a user name or other local-part is 64 octets.

    ***********

    echo unitTest("\"\"@example.com", false, "Local part is effectively empty");

    Disagree. Quoted local part is permitted to be empty. Expected result should be true (Valid).

    According to the quoted-string specification the portion between the double quotes can be 0 or more characters.
    In other words, every component of a double quoted string is optional, with the exception of the double quotes.

    quoted-string = [CFWS]
    DQUOTE *([FWS] qcontent) [FWS] DQUOTE
    [CFWS]

    ***********

    echo unitTest("first.last@com", false, "Mail host must be second- or lower level");
    echo unitTest("test@example", false, "Dave Child says so");
    echo unitTest("a@bar", false, "");

    Disagree. @TLD is permitted. Expected result should be true (Valid).

    @TLD is valid and permitted. RFC 5321 2.3.5 Domain Names says so.

    ***********

    echo unitTest("first.\"\".last@example.com", false, "Contains a zero-length element");

    Disagree. Empty element permitted for obs-local-part. Expected result should be true (Valid).

    Zero-length element is permitted in obsolete local part (obs-local-part).

    ***********

    echo unitTest("+@b.c", true, "TLDs can be any length");

    Disagree. TLDs are currently at least 2 characters. Expected result should be false (Invalid).

    All TLD's are currently a minimum of 2 characters. This is not likely change any time soon.

    ***********

    echo unitTest("\"first.middle.last\"@example.com", true, "obs-local-part form as described in RFC 2822");
    echo unitTest("\"first..last\"@example.com", true, "obs-local-part form as described in RFC 2822");

    Agree. However, this is a modern double quoted string, not an obs-local-part.

    ***********