What does this Perl XML filter look like in Python?
curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" | perl -ne 'print "\t" if /<name>/; print "$2\n" if /<(title|name)>(.*)<\/\1>/;'开发者_开发问答
I have this shell script which gets the Atom feed with command-line arguments for the username and password. I was wondering if this type of thing was possible in Python, and if so, how I would go about doing it. The atom feed is just regular XML.
Python does not lend itself to compact one liners quite as well as Perl. This is primarily for three reasons:
- With Perl, whitespace is insignificant in almost all cases. In Python, whitespace is very significant.
- Perl has some helpful shortcuts for one liners, such as
perl -ne
orperl -pe
that put an implicit loop around the line of code. - There is a large body a cargo-cult Perl one liners to do useful things.
That all said, this python is close to what you posted in Perl:
curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" | python -c '
import sys
for s in sys.stdin:
s=s.strip()
if not s: print '\t',
else: print s
'
It is a little difficult to do better because, as stated in my comment, the Perl you posted is incomplete. You have:
perl -ne 'print "\t" if //; print "$2\n" if /(.*)/;'
Which is equivalent to:
LINE:
while (<>) {
print "\t" if //; # print a tab for a blank line
print "$2\n" if /(.*)/; # nonsensical. Print second group but only
# a single match group defined...
}
Edit
While it is trivial to rewrite that Perl in Python, here is something a bit better:
#!/usr/bin/python
from xml.dom.minidom import parseString
import sys
def get_XML_doc_stdin(f):
return xml.dom.minidom.parse(f)
def get_tagged_data2(tag, index=0):
xmlData = dom.getElementsByTagName(tag)[index].firstChild.data
return xmlData
data=sys.stdin.read()
dom = parseString(data)
ele2=get_tagged_data2('title')
print ele2
count=int(get_tagged_data2('fullcount'))
print count,"New Messages:"
for i in range(0,count):
nam=get_tagged_data2('name',i)
email=get_tagged_data2('email',i)
print " {0}: {1} <{2}>".format(i+1,nam,email)
Now save that in a text file, run chmod +x
on it, then:
curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" |
/path/pythonfile.py
It produces this:
Gmail - Inbox for xxxxxxx@gmail.com
2 New Messages:
1: bob smith <bob@smith.com>
2: Google Alerts <googlealerts-noreply@google.com>
edit 2 And if you don't like that, here is the Python 1 line filter:
curl -u $1:$2 --silent "https://mail.google.com/mail/feed/atom" |python -c '
import sys, re
for t,m in re.findall(r"<(title|name)>(.*)<\/\1>",sys.stdin.read()):
print "\t",m
'
You may use an "URL opener" from the urllib2
standard Python module with a handler for authentication. For example:
#!/usr/bin/env python
import getpass
import sys
import urllib2
def main(program, username=None, password=None, url=None):
# Get input if any argument is missing
username = username or raw_input('Username: ')
password = password or getpass.getpass('Password: ')
url = url or 'https://mail.google.com/mail/feed/atom'
# Create password manager
password_mgr = urllib2.HTTPPasswordMgrWithDefaultRealm()
password_mgr.add_password(None, url, username, password)
# Create HTTP Authentication handler and URL opener
authhandler = urllib2.HTTPBasicAuthHandler(password_mgr)
opener = urllib2.build_opener(authhandler)
# Fetch URL and print content
response = opener.open(url)
print response.read()
if __name__ == '__main__':
main(*sys.argv)
If you'd like to extract information from the feed too, you should check how to parse Password-Protected Feeds with feedparser
.
精彩评论