recognize more valid email addresses

Ticket

+commit
 

Things that confuse our regex before the @:

  • dots
  • numbers
  • underscores
  • dashes
  • plusses
  • ?

Other issues:

  • add "un" as a 'country' (and there must be a web page somewhere with the canonical list of top-level domains, maybe this: http://www.iana.org/domains/root/db or even better would be a blog that posts as new ones are added)

 

 

 

I'll try to expand the example to test everything at some point. Here's a probably-reliable source: http://en.wikipedia.org/wiki/E-mail_address#Local_part

--John Abbe.....2013-06-11 07:16:51 +0000

Well, recognizing "free" URIs, including emails (with mailto: prefix as with URI standard) can be problematic. Better is to use links, [[]] explicitly, but maybe you are here.

 

The issue with more expansive matching of emails and URIs without special syntax is the regular expressions can have a lot of "backtracking" and it makes parsing certain patterns really, really slow.

 

Main thing here is that we should conform to the spec for email addresses, and for sure it needs to be right for external links.

--Gerry Gleason.....2013-06-11 15:04:56 +0000

If they should be recognized, let's add tests for each case.

--Gerry Gleason.....2013-06-11 15:06:25 +0000

Gerry's right, though just adding numbers, underscores, and periods may be worth it.  In the meantime .

--Ethan McCutchen.....2013-06-11 18:25:35 +0000

I was not using [[]] but imho if we're going to recognize free URIs we should at least do a half-decent job of it :-)

 

I'd throw dashes - and plusses, common w GMail addresses - into any quick fix.

--John Abbe.....2013-06-11 20:08:29 +0000

Any thoughts on when this will work?

 

Also, partial workaround - for GMail addresses at least - is that their dots are apparently optional.

--John Abbe.....2013-09-13 08:41:21 +0000

dunno. not a priority for me.

--Ethan McCutchen.....2013-09-13 22:18:47 +0000

Anyone want to point me to the file? It's just a regex, right?

--John Abbe.....2013-09-14 06:37:24 +0000

It's a long way from just-a-regex. https://github.com/wagn/wagn/blob/master/pack/core/chunks/uri.rb

 

Any changes would need automated tests and code review for the performance problems Gerry described.

--Ethan McCutchen.....2013-09-14 13:47:24 +0000

An easier fix that would be nice would be if the "mailto:" got dropped from double-bracketed email links:

 

[[mailto:john.abbe@gmail.com]]:

--John Abbe.....2013-09-14 18:52:03 +0000

The issue for me is that I'm working on pages that I want newbies to be able to edit, so the less markup, the better.

--John Abbe.....2013-09-14 18:53:13 +0000

Still a significant annoyance, I run into email addresses where the part before the @ ends with a number all the time and you end up with a clickable link that goes to the domain :-P. Trying to explain workarounds for that to Wagn newbies is not something I look forward to.

--John Abbe.....2014-01-29 08:55:22 +0000

Tried this other workaround - editing the HTML to make it an explicit mailto link - but Wagn sticks the web link on top of the domain part of the address: sample5@domain.com">

--John Abbe.....2014-01-29 09:27:49 +0000

ok something even weirder is happening there than when I tried it on another site - giving up for now.

--John Abbe.....2014-01-29 09:29:54 +0000

May help:

http://daringfireball.net/2010/07/improved_regex_for_matching_urls

--John Abbe.....2014-02-12 18:08:46 +0000