What's the easiest way to extract the links on a web page using python without BeautifulSoup?

2023-01-30 05:11 问答作者：

I'm using cygwin 开发者_开发百科and do not have BeautifulSoup installed.

Getting the value of href attributes in all <a> tags on a html file with Python

python, regex to find anchor link html

Regular expression to extract URL from an HTML link

If you don't care much about performance you can use regular expressions:

import re
linkre = re.compile(r"""href=["']([^"']+)["']""")
links = linkre.findall(your_html)

If you just want links like in http:// links then change the expression to:

linkre = re.compile(r"""href=["']http:([^"']+)["']""")

Or you can put "' as optional if by some chance you have html without them around the links.

继续阅读：python

精彩评论