Python - Extracting text by Column Header from Given Row
I created a text file开发者_如何学C from multiple email messages.
Each of the three tuples below was written to the text file from a different email message and sender.
Cusip NAME Original Current Cashflow Collat Offering
362341D71 GSAA 2005-15 2A2 10,000 8,783 FCF 5/25 65.000
026932AC7 AHM 2007-1 GA1C 9,867 7,250 Spr Snr OA 56.250
Name O/F C/F Cpn FICO CAL WALB 60+ Notes Offer
CSMC 06-9 7A1 25.00 11.97 L+45 728 26 578 35.21 FLT,AS,0.0% 50-00
LXS 07-10H 2A1 68.26 34.01 L+16 744 6 125 33.98 SS,9.57% 39-00`
CUSIP Name BID x Off SIZE C/E 60++ WAL ARM CFLW
86360KAA6 SAMI 06-AR3 11A1 57-00 x 59-00 73+MM 46.9% 67.0% 65 POA SSPT
86361HAQ7 SAMI 06-AR7 A12 19-08 x 21-08 32+MM 15.4% 61.1% 61 POA SRMEZ
By 'Name' I need a way to pull out the Price info (Price info = data under the words:'Offering','Offer' and 'Off'). This process will be replicated over the whole text file and the extracted data ('Name' and 'Price') will be written to an excel file via XLWT. Notice that the format for the price data varies by tuple.
The formatting for this makes it a little tricky since your names can have spaces, which can make csv
difficult to use. One way to get around this is to use the first column to get the location and width of the columns you are interested by using regex. You can try something like this:
import re
for email in emails:
print email
lines = email.split('\n')
name = re.search(r'name\s*', lines[0], re.I)
price = re.search(r'off(er(ing)?)?\s*', lines[0], re.I)
for line in lines[1:]:
n = line[name.start():name.end()].strip()
p = line[price.start():price.end()].strip()
print (n, p)
print
This assumes that emails
is a list where each entry is an email. Here is the output:
Cusip NAME Original Current Cashflow Collat Offering
362341D71 GSAA 2005-15 2A2 10,000 8,783 FCF 5/25 65.000
026932AC7 AHM 2007-1 GA1C 9,867 7,250 Spr Snr OA 56.250
('GSAA 2005-15 2A2', '65.000')
('AHM 2007-1 GA1C', '56.250')
Name O/F C/F Cpn FICO CAL WALB 60+ Notes Offer
CSMC 06-9 7A1 25.00 11.97 L+45 728 26 578 35.21 FLT,AS,0.0% 50-00
LXS 07-10H 2A1 68.26 34.01 L+16 744 6 125 33.98 SS,9.57% 39-00`
('CSMC 06-9 7A1', '50-00')
('LXS 07-10H 2A1', '39-00')
CUSIP Name BID x Off SIZE C/E 60++ WAL ARM CFLW
86360KAA6 SAMI 06-AR3 11A1 57-00 x 59-00 73+MM 46.9% 67.0% 65 POA SSPT
86361HAQ7 SAMI 06-AR7 A12 19-08 x 21-08 32+MM 15.4% 61.1% 61 POA SRMEZ
('SAMI 06-AR3 11A1', '59-00')
('SAMI 06-AR7 A12', '21-08')
Just use csv module. and use good formatting for your numbers
精彩评论