python url fetch help - regex

2022-12-13 19:20 问答作者：

I have a web site where there are links like 开发者_JAVA技巧<a href="http://www.example.com?read.php=123"> Can anybody show me how to get all the numbers (123, in this case) in such links using python? I don't know how to construct a regex. Thanks in advance.

import re
re.findall("\?read\.php=(\d+)",data)

"If you have a problem, and decide to use regex, now you have two problems..."

If you are reading one particular web page and you know how it is formatted, then regex is fine - you can use S. Mark's answer. To parse a particular link, you can use Kimvai's answer. However, to get all the links from a page, you're better off using something more serious. Any regex solution you come up with will have flaws,

I recommend mechanize. If you notice, the Browser class there has a links method which gets you all the links in a page. It has the added benefit of being able to download the page for you =) .

This will work irrespective of how your links are formatted (e.g. if some look like <a href="foo=123"/> and some look like <A TARGET="_blank" HREF='foo=123'/>).

import re
from BeautifulSoup import BeautifulSoup
soup = BeautifulSoup(html)
p = re.compile('^.*=([\d]*)$')
for a in soup.findAll('a'):
   m = p.match(a["href"])
   if m:
      print m.groups()[0]

While the other answers are sort of correct, you should probably use the urllib2 library instead;

from urllib2 import urlparse
import re
urlre = re.compile('<a[^>]+href="([^"]+)"[^>]*>',re.IGNORECASE)
links = urlre.findall('<a href="http://www.example.com?read.php=123">')
for link in links:
    url = urlparse.urlparse(link)
    s = [x.split("=") for x in url[4].split(';')]
    d = {}
    for k,v in s:
        d[k]=v
    print d["read.php"]

It's not as simple as some of the above, but guaranteed to work even with more complex urls.

/[0-9]/

thats the regex sytax you want

for reference see

http://gnosis.cx/publish/programming/regular_expressions.html

One without the need for regex

>>> s='<a href="http://www.example.com?read.php=123">'
>>> for item in s.split(">"):
...     if "href" in item:
...         print item[item.index("a href")+len("a href="): ]
...
"http://www.example.com?read.php=123"

if you want to extract the numbers

item[item.index("a href")+len("a href="): ].split("=")[-1]

继续阅读：python regex

python url fetch help - regex

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？