Regular Expression to match <a> tags without http://
how to match html "a" tags, o开发者_开发百科nly the ones without http, using regular expression?
ie match:
blahblah... < a href=\"somthing\" > ...blahblah
but not
blahblah... < a href=\"http://someting\" > ...blahblah
It's more easy to use a DOMParser and XPath, not a regex.
See my response in jsfiddle.
HTML
<body>
<div>
<a href='index.php'>1. index</a>
<a href='http://www.bar.com'>2. bar</a>
<a href='http://www.foo.com'>3. foo</a>
<a href='hello.php'>4. hello</a>
</div>
</body>
JS
$(document).ready(function() {
var type = XPathResult.ANY_TYPE;
var page = $("body").html();
var doc = DOMParser().parseFromString(page, "text/xml");
var xpath = "//a[not(starts-with(@href,'http://'))]";
var result = doc.evaluate(xpath, doc, null, type, null);
var node = result.iterateNext();
while (node) {
console.log(node); // returns links 1 and 4
node = result.iterateNext();
}
});
NOTES
- I'm using jquery to have a small code, but you can do it without jquery.
- This code must be adapted to work with ie (I've tested in firefox).
You should use a XML parser instead of regexes.
On the same topic :
- RegEx match open tags except XHTML self-contained tags
With jquery, You can do something very simple:
links_that_doesnt_start_with_http = $("a:not([href^=http://])")
edit: Added the ://
I'm interpreting your question in that you mean any (mostly) absolute URI with a protocol, and not just HTTP. To add to everyone else's incorrect solutions. You should be doing this check on the href:
if (href.slice(0, 2) !== "//" && !/^[\w-]+:\/\//.test(href)) {
// href is a relative URI without http://
}
var html = 'Some text with a <a href="http://example.com/">link</a> and an <a href="#anchor">anchor</a>.';
var re = /<a href="(?!http:\/\/)[^"]*">/i;
var match = html.match(re);
// match contains <a href="#anchor">
Note: this won't work if you've additional attributes.
精彩评论