开发者

What's the easiest way to extract the links on a web page using python without BeautifulSoup?

I'm using cygwin 开发者_开发百科and do not have BeautifulSoup installed.


Getting the value of href attributes in all <a> tags on a html file with Python

python, regex to find anchor link html

Regular expression to extract URL from an HTML link


If you don't care much about performance you can use regular expressions:

import re
linkre = re.compile(r"""href=["']([^"']+)["']""")
links = linkre.findall(your_html)

If you just want links like in http:// links then change the expression to:

linkre = re.compile(r"""href=["']http:([^"']+)["']""")

Or you can put "' as optional if by some chance you have html without them around the links.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜