开发者

How to parse this format (Praat TextGrid)

TextGrid is the "segmentation" file used by Praat program. I'd like to write a parser that will then verify the data. My question is:

How would you write a parser for this format? Read it line by line or something else? Is this a known format?

File type = "ooTextFile"
Object class = "TextGrid"

xmin = 0 
xmax = 93.0538775510204 
tiers? <exists> 
size = 3 

item []: 
    item [1]:
        class = "IntervalTier" 
        name = "diph" 
        xmin = 0 
        xmax = 93.0538775510204 
        intervals: size = 65 
        intervals [1]:
            xmin = 0 
            xmax = 1.30009070294784开发者_StackOverflow中文版6 
            text = "" 
        intervals [2]:
            xmin = 1.300090702947846 
            xmax = 1.5300845864661654 
            text = "ey_s" 
        intervals [3]:
            xmin = 1.5300845864661654 
            xmax = 3.4648692624493815 
            text = "" 

(This is then repeated to EOF, with intervals[4....n])


TextGrid parser already exists and it is a part of NLTK Toolkit. The Python file is here:

http://nltk.googlecode.com/svn/trunk/nltk_contrib/nltk_contrib/textgrid.py

Updated link: https://github.com/nltk/nltk_contrib/blob/master/nltk_contrib/textgrid.py


Automatic Praat's TextGrid File Parser is a small application to parse Praat's textGrid Files. The result of the parsing is a spreadsheet that is saved in a output text file. The output text file can be imported by applications such as Excel. TGP is meant to be a flexible program that can be continuously extended or modified easily, it is currently capable of analyzing certain types of TextGrid files. The version 1.0 of the TGP reads TextGrid files with the following item types: word, phone and optionally focus.

http://tgp.peremila.com/


An alternative solution is to work with JSON or YAML representations of these Praat objects; then parsing for correctness is trivial.

I've written two Perl scripts to facilitate precisely this (to convert from Praat to JSON/YAML, and to convert from YAML/JSON to Praat), which might be useful for these tasks.

The scripts are part of a plugin I maintain called serialise, which is distributed through CPrAN. The implementation is a bit of a hack, but it's quite stable, and the plugin supports most objects that you'd want to use. All comments welcome.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜