开发者

Python - Extracting text by Column Header from Given Row

I created a text file开发者_如何学C from multiple email messages.

Each of the three tuples below was written to the text file from a different email message and sender.

Cusip     NAME              Original Current Cashflow Collat Offering
362341D71 GSAA 2005-15 2A2   10,000   8,783  FCF       5/25  65.000
026932AC7 AHM 2007-1 GA1C    9,867    7,250  Spr Snr   OA    56.250 

Name            O/F    C/F    Cpn  FICO CAL WALB  60+    Notes             Offer
CSMC 06-9 7A1   25.00  11.97  L+45  728  26  578  35.21  FLT,AS,0.0%       50-00
LXS 07-10H 2A1  68.26  34.01  L+16  744   6  125  33.98  SS,9.57%          39-00`

CUSIP      Name               BID   x Off       SIZE   C/E    60++  WAL   ARM  CFLW
86360KAA6  SAMI 06-AR3 11A1   57-00 x 59-00     73+MM  46.9%  67.0%  65   POA  SSPT
86361HAQ7  SAMI 06-AR7 A12    19-08 x 21-08     32+MM  15.4%  61.1%  61   POA SRMEZ

By 'Name' I need a way to pull out the Price info (Price info = data under the words:'Offering','Offer' and 'Off'). This process will be replicated over the whole text file and the extracted data ('Name' and 'Price') will be written to an excel file via XLWT. Notice that the format for the price data varies by tuple.


The formatting for this makes it a little tricky since your names can have spaces, which can make csv difficult to use. One way to get around this is to use the first column to get the location and width of the columns you are interested by using regex. You can try something like this:

import re

for email in emails:
    print email
    lines = email.split('\n')
    name = re.search(r'name\s*', lines[0], re.I)
    price = re.search(r'off(er(ing)?)?\s*', lines[0], re.I)
    for line in lines[1:]:
        n = line[name.start():name.end()].strip()
        p = line[price.start():price.end()].strip()
        print (n, p)
    print

This assumes that emails is a list where each entry is an email. Here is the output:

Cusip     NAME              Original Current Cashflow Collat Offering
362341D71 GSAA 2005-15 2A2   10,000   8,783  FCF       5/25  65.000
026932AC7 AHM 2007-1 GA1C    9,867    7,250  Spr Snr   OA    56.250 
('GSAA 2005-15 2A2', '65.000')
('AHM 2007-1 GA1C', '56.250')

Name            O/F    C/F    Cpn  FICO CAL WALB  60+    Notes             Offer
CSMC 06-9 7A1   25.00  11.97  L+45  728  26  578  35.21  FLT,AS,0.0%       50-00
LXS 07-10H 2A1  68.26  34.01  L+16  744   6  125  33.98  SS,9.57%          39-00`
('CSMC 06-9 7A1', '50-00')
('LXS 07-10H 2A1', '39-00')

CUSIP      Name               BID   x Off       SIZE   C/E    60++  WAL   ARM  CFLW
86360KAA6  SAMI 06-AR3 11A1   57-00 x 59-00     73+MM  46.9%  67.0%  65   POA  SSPT
86361HAQ7  SAMI 06-AR7 A12    19-08 x 21-08     32+MM  15.4%  61.1%  61   POA SRMEZ
('SAMI 06-AR3 11A1', '59-00')
('SAMI 06-AR7 A12', '21-08')


Just use csv module. and use good formatting for your numbers

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜