Pythonic way to extract values from this text file

2023-03-14 00:45 问答作者：

I have an output file from a legacy piece of software which is shown below. I want to extract values from it, so that, for example, I can set a variable called direct_solar_irradiance to 648.957, and target ground pressure to 1013.00.

So far, I have been extracting individual lines and processing them like below (repeated many times for the different values I want to extract):

values = lines[97].split()
self.irradiance_direct, self.irradiance_diffuse, self.irradiance_env = values

However, I have now found that extra lines are added to the middle of the output when certain parameters are selected. This means, of course that the 97th line will no longer have the values I need on it.

Is there a good Pythonic way to extract these values, given that there may be extra lines added into the output under certain circumstances? I guess I need to search for known pieces of text in the file, and then extract the numbers referred to by them, but the only ways I can think of doing that are very clunky.

So:

Is there a nice Pythonic way to search for these strings and extract the values that I want?

If not, is there some other way to sensibly do this? (for example, some kind of cool text-file parsing library that I know nothing about).

******************************* 6sV version 1.0B ******************************
*                                                                             *
*                       geometrical conditions identity                       *
*                       -------------------------------                       *
*                       user defined conditions                               *
*                                                                             *
*   month: 14 day :   1                                                       *
*   solar zenith angle:   10.00 deg  solar azimuthal angle:       20.00 deg   *
*   view zenith angle:    30.00 deg  view azimuthal angle:        40.00 deg   *
*   scattering angle:    159.14 deg  azimuthal angle difference:  20.00 deg   *
*                                                                             *
*                       atmospheric model description                         *
*                       -----------------------------                         *
*           atmospheric model identity :                                      *
*               midlatitude summer  (uh2o=2.93g/cm2,uo3=.319cm-atm)           *
*           aerosols type identity :                                          *
*               Maritime aerosol model                                        *
*           optical condition identity :                                      *
*               visibility :  8.49 km  opt. thick. 550 nm :  0.5000           *
*                                                                             *
*                       spectral condition                                    *
*                       ------------------                                    *
*            monochromatic calculation at wl 0.400 micron                     *
*                                                                             *
*                       Surface polarization parameters                       *
*                       ----------------------------------                    *
*                                                                             *
*                                                                             *
* Surface Polarization Q,U,Rop,Chi    0.00000  0.00000  0.00000     0.00      *
*                                                                             *
*                                                                             *
*                       target type                                           *
*                       -----------                                           *
*           homogeneous ground                                                *
*             monochromatic reflectance  1.000                                *
*                                                                             *
*                       target elevation description                          *
*                       ----------------------------                          *
*           ground pressure  [mb] 1013.00                                     *
*           ground altitude  [km] 0.000                                       *
*                                                                             *
*                       plane simulation description                          *
*                       ----------------------------                          *
*           plane  pressure          [mb] 1013.00                             *
*           plane  altitude absolute [km]  0.000                              *
*                atmosphere under plane description:                          *
*                ozone content             0.000                              *
*                h2o   content             0.000                              *
*               aerosol opt. thick. 550nm  0.000                              *
*                                                                             *
*                        atmospheric correction activated                     *
*                        --------------------------------                     *
*                        BRDF coupling correction                             *
*           input apparent reflectance :  0.500                               *
*                                                                             *
*******************************************************************************



*******************************************************************************
*                                                                             *
*                         integrated values of  :                             *
*                         --------------------                                *
*                                                                             *
*       apparent reflectance  1.1287696  appar. rad.(w/m2/sr/mic)  588.646    *
*                   total gaseous transmittance  1.000                        *
*                                                                             *
*******************************************************************************
*                                                                             *
*                         coupling aerosol -wv  :                             *
*                         --------------------                                *
*           wv above aerosol :   1.129     wv mixed with aerosol :   1.129    *
*                       wv under aerosol :   1.129                            *
*******************************************************************************
*                                                                             *
*                         integrated values of  :                             *
*                         --------------------                                *
*                                                                             *
*       app. polarized refl.  0.0000    app. pol. rad. (w/m2/sr/mic)    0.000 *
*             direction of the plane of polarization  0.00                    *
*                   total polarization ratio     0.000                        *
*                                                                             *
*******************************************************************************
*                                                                             *
*                         int. normalized  values  of  :                      *
*                         ---------------------------    开发者_运维技巧                     *
*                      % of irradiance at ground level                        *
*     % of direct  irr.    % of diffuse irr.    % of enviro. irr              *
*               0.351               0.354               0.295                 *
*                       reflectance at satellite level                        *
*     atm. intrin. ref.   background  ref.  pixel  reflectance                *
*               0.000               0.000               1.129                 *
*                                                                             *
*                         int. absolute values of                             *
*                         -----------------------                             *
*                      irr. at ground level (w/m2/mic)                        *
*     direct solar irr.    atm. diffuse irr.    environment  irr              *
*             648.957             655.412             544.918                 *
*                      rad at satel. level (w/m2/sr/mic)                      *
*     atm. intrin. rad.    background  rad.    pixel  radiance                *
*               0.000               0.000             588.646                 *
*                                                                             *
*                                                                             *
*                      sol. spect (in w/m2/mic)                               *
*                                1663.594                                     *
*                                                                             *
*******************************************************************************





*******************************************************************************
*                                                                             *
*                          integrated values of  :                            *
*                          --------------------                               *
*                                                                             *
*                             downward        upward          total           *
*      global gas. trans. :     1.00000        1.00000        1.00000         *
*      water   "     "    :     1.00000        1.00000        1.00000         *
*      ozone   "     "    :     1.00000        1.00000        1.00000         *
*      co2     "     "    :     1.00000        1.00000        1.00000         *
*      oxyg    "     "    :     1.00000        1.00000        1.00000         *
*      no2     "     "    :     1.00000        1.00000        1.00000         *
*      ch4     "     "    :     1.00000        1.00000        1.00000         *
*      co      "     "    :     1.00000        1.00000        1.00000         *
*                                                                             *
*                                                                             *
*      rayl.  sca. trans. :     0.84422        1.00000        0.84422         *
*      aeros. sca.   "    :     0.94572        1.00000        0.94572         *
*      total  sca.   "    :     0.79616        1.00000        0.79616         *
*                                                                             *
*                                                                             *
*                                                                             *
*                             rayleigh       aerosols         total           *
*                                                                             *
*      spherical albedo   :     0.23410        0.12354        0.29466         *
*      optical depth total:     0.36193        0.55006        0.91199         *
*      optical depth plane:     0.00000        0.00000        0.00000         *
*      reflectance I      :     0.00000        0.00000        0.00000         *
*      reflectance Q      :     0.00000        0.00000        0.00000         *
*      reflectance U      :     0.00000        0.00000        0.00000         *
*      polarized reflect. :     0.00000        0.00000        0.00000         *
*      degree of polar.   :         nan           0.00            nan         *
*      dir. plane polar.  :      -45.00         -45.00         -45.00         *
*      phase function I   :     1.38819        0.27621        0.71751         *
*      phase function Q   :    -0.09117       -0.00856       -0.04134         *
*      phase function U   :    -1.34383        0.02142       -0.52039         *
*      primary deg. of pol:    -0.06567       -0.03099       -0.05762         *
*      sing. scat. albedo :     1.00000        0.98774        0.99261         *
*                                                                             *
*                                                                             *
*******************************************************************************
*******************************************************************************



*******************************************************************************
*                        atmospheric correction result                        *
*                        -----------------------------                        *
*       input apparent reflectance            :    0.500                      *
*       measured radiance [w/m2/sr/mic]       :  260.747                      *
*       atmospherically corrected reflectance                                 *
*       Lambertian case :      0.52995                                        *
*       BRDF       case :      0.52995                                        *
*       coefficients xa xb xc                 :  0.00241  0.00000  0.29466    *
*       y=xa*(measured radiance)-xb;  acr=y/(1.+xc*y)                         *

A more complete, perhaps more robust solution will require the use of either a parser using a custom grammer (pyparsing) or some sort of FSM-based processor (TextFSM).

Both options like they'll be non-trivial to use with this output. A (possibly) lighter-weight solution would be to identify each line based on known labels, then extract appropriately (as suggested by other posters).

There are several ways to implement this. I would suggest mapping 'extractor' callables to known line labels, then iterate and call matched extractors. Each callable would take line and a context object/dict as arguments and add attributes to the context as required. Something along the lines of https://gist.github.com/1035938

you could throw your own mini-language, i.e. automate the extraction. I did the following to automate the parsing of a proprietary program-output

# will match in the order written here
tokens = ["num_ref_frames", "Max QP", "Min QP", "Avg QP", "I4x4",
          "I16x16", "SkipZero", "SkipMV", "16x16", "16x8", "8x16",
          "8x8", "8x4", "4x8", "4x4"]

special = ["Quarterpel MVs"]

# this dictionary (hash-table) contains the search string from tokens array
# as well as an array where the first element is the field to extract to
# create matrix array. e.g. 0 = 1st field, 1 = 2nd field, 3 = 3rd field etc.
dict = {tokens[0]:  [1], tokens[1]:  [1], tokens[2]:  [1], tokens[3]:  [1],
        tokens[4]:  [2], tokens[5]:  [2], tokens[6]:  [2], tokens[7]:  [2],
        tokens[8]:  [2], tokens[9]:  [2], tokens[10]: [2], tokens[11]: [2],
        tokens[12]: [2], tokens[13]: [2], tokens[14]: [2],}

Then I simply looped over the input, and for each line checking against the content of token; if match found I did a split according to the dict-entry to extract the correct field.

special above was to handle, well a special variable that required reading from multiple lines.

Update

clone git://gist.github.com/1037403.git to get a copy of the code

usage:
./parser.py all_dec.txt

Hope it helps!

Well if you want a generic parsing library there is pyparsing, but it would probably be overkill in this case.

This appears to be a fairly line-oriented text file, that is not that large in size, so you're best bet would be to loop through each line looking for text that will identify the things you are after.

So something like:

lines = open('file.txt', 'r')
for n, line in enumerate(lines):
    if 'direct solar irr.    atm. diffuse irr.    environment  irr' in line:
        values = lines[n+1].split() # after the next line after this one
        self.irradiance_direct, self.irradiance_diffuse, self.irradiance_env = values

You could then add more if statements and so on as needed to get other data out. Though if you've got a lot of data you'd probably want to generalise the code somewhat. (Probably a dictionary with the text to match as key and a function to call when you match the key).

You may also want to use a regex to match the line, so that you can handle different amounts of white-space better. Otherwise just a one space too many or too few will throw it out.

The best way, IMHO would be to use a mmaped file, and then use regular expression to find what you are looking for.

 text = mmap.mmap(file)
 re.sub(pattern, text)

Mmap module maps a file as it were text, so you can perform pretty much any operations you would perform on a string. And the regex is the best way to search for something. Simple and efficient.

If you need to find specific lines, just handle everything as a string and run specific regular expressions to dig out your gems.

If you need to extract more data, I believe that with a small amount of work you can craft a nice parser for your data. I would use the following functions as a start:

def extract_screens(text):
    """ 
    Returns a list of screens (divided by astericks).
    Each screen is a list of strings stripped from asterisks.
    """
    ...

def process_screen(screen):
    """ 
    Returns a list of screen divisions as tuples: [(heading, body)...]
    heading is a string, body is a list of strings
    blank lines are filtered out.
    """
    ...

By now you should have an indexed list of pieces of text. You can loop through them and execute a simple and specific special parser method for each section.

Tip: Use unit tests to keep yourself sane.

继续阅读：idioms python text

Pythonic way to extract values from this text file

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？