What does this Perl XML filter look like in Python?

2023-02-05 12:46 问答作者：

curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" | perl -ne 'print "\t" if /<name>/; print "$2\n" if /<(title|name)>(.*)<\/\1>/;'开发者_开发问答

I have this shell script which gets the Atom feed with command-line arguments for the username and password. I was wondering if this type of thing was possible in Python, and if so, how I would go about doing it. The atom feed is just regular XML.

Python does not lend itself to compact one liners quite as well as Perl. This is primarily for three reasons:

With Perl, whitespace is insignificant in almost all cases. In Python, whitespace is very significant.
Perl has some helpful shortcuts for one liners, such as perl -ne or perl -pe that put an implicit loop around the line of code.
There is a large body a cargo-cult Perl one liners to do useful things.

That all said, this python is close to what you posted in Perl:

curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" | python -c ' 
import sys
for s in sys.stdin:
    s=s.strip()
    if not s: print '\t',
    else: print s
'

It is a little difficult to do better because, as stated in my comment, the Perl you posted is incomplete. You have:

perl -ne 'print "\t" if //; print "$2\n" if /(.*)/;'

Which is equivalent to:

LINE:
while (<>) {
  print "\t" if //;         # print a tab for a blank line
  print "$2\n" if /(.*)/;   # nonsensical. Print second group but only 
                            # a single match group defined...
}

Edit

While it is trivial to rewrite that Perl in Python, here is something a bit better:

#!/usr/bin/python
from xml.dom.minidom import parseString
import sys

def get_XML_doc_stdin(f):
    return xml.dom.minidom.parse(f)

def get_tagged_data2(tag, index=0):    
    xmlData = dom.getElementsByTagName(tag)[index].firstChild.data
    return xmlData

data=sys.stdin.read()
dom = parseString(data)

ele2=get_tagged_data2('title')
print ele2

count=int(get_tagged_data2('fullcount'))
print count,"New Messages:"

for i in range(0,count):
    nam=get_tagged_data2('name',i)
    email=get_tagged_data2('email',i)
    print "  {0}: {1} <{2}>".format(i+1,nam,email)

Now save that in a text file, run chmod +x on it, then:

curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" | 
/path/pythonfile.py

It produces this:

Gmail - Inbox for xxxxxxx@gmail.com
2 New Messages:
  1: bob smith <bob@smith.com>
  2: Google Alerts <googlealerts-noreply@google.com>

edit 2 And if you don't like that, here is the Python 1 line filter:

curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" |python -c ' 
import sys, re
for t,m in re.findall(r"<(title|name)>(.*)<\/\1>",sys.stdin.read()):
    print "\t",m
'

You may use an "URL opener" from the urllib2 standard Python module with a handler for authentication. For example:

#!/usr/bin/env python

import getpass
import sys
import urllib2

def main(program, username=None, password=None, url=None):

    # Get input if any argument is missing
    username = username or raw_input('Username: ')
    password = password or getpass.getpass('Password: ')
    url = url or 'https://mail.google.com/mail/feed/atom'

    # Create password manager
    password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
    password_mgr.add_password(None, url, username, password)

    # Create HTTP Authentication handler and URL opener
    authhandler = urllib2.HTTPBasicAuthHandler(password_mgr)
    opener = urllib2.build_opener(authhandler)

    # Fetch URL and print content
    response = opener.open(url)
    print response.read()

if __name__ == '__main__':
    main(*sys.argv)

If you'd like to extract information from the feed too, you should check how to parse Password-Protected Feeds with feedparser.

继续阅读：curl email perl python regex

What does this Perl XML filter look like in Python?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？