What's the easiest way to extract the links on a web page using python without BeautifulSoup?
I'm using cygwin 开发者_开发百科and do not have BeautifulSoup installed.
Getting the value of href attributes in all <a> tags on a html file with Python
python, regex to find anchor link html
Regular expression to extract URL from an HTML link
If you don't care much about performance you can use regular expressions:
import re
linkre = re.compile(r"""href=["']([^"']+)["']""")
links = linkre.findall(your_html)
If you just want links like in http:// links then change the expression to:
linkre = re.compile(r"""href=["']http:([^"']+)["']""")
Or you can put "' as optional if by some chance you have html without them around the links.
精彩评论