How to parse this format (Praat TextGrid)
TextGrid is the "segmentation" file used by Praat program. I'd like to write a parser that will then verify the data. My question is:
How would you write a parser for this format? Read it line by line or something else? Is this a known format?
File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0
xmax = 93.0538775510204
tiers? <exists>
size = 3
item []:
item [1]:
class = "IntervalTier"
name = "diph"
xmin = 0
xmax = 93.0538775510204
intervals: size = 65
intervals [1]:
xmin = 0
xmax = 1.30009070294784开发者_StackOverflow中文版6
text = ""
intervals [2]:
xmin = 1.300090702947846
xmax = 1.5300845864661654
text = "ey_s"
intervals [3]:
xmin = 1.5300845864661654
xmax = 3.4648692624493815
text = ""
(This is then repeated to EOF, with intervals[4....n])
TextGrid parser already exists and it is a part of NLTK Toolkit. The Python file is here:
http://nltk.googlecode.com/svn/trunk/nltk_contrib/nltk_contrib/textgrid.py
Updated link: https://github.com/nltk/nltk_contrib/blob/master/nltk_contrib/textgrid.py
Automatic Praat's TextGrid File Parser is a small application to parse Praat's textGrid Files. The result of the parsing is a spreadsheet that is saved in a output text file. The output text file can be imported by applications such as Excel. TGP is meant to be a flexible program that can be continuously extended or modified easily, it is currently capable of analyzing certain types of TextGrid files. The version 1.0 of the TGP reads TextGrid files with the following item types: word, phone and optionally focus.
http://tgp.peremila.com/
An alternative solution is to work with JSON or YAML representations of these Praat objects; then parsing for correctness is trivial.
I've written two Perl scripts to facilitate precisely this (to convert from Praat to JSON/YAML, and to convert from YAML/JSON to Praat), which might be useful for these tasks.
The scripts are part of a plugin I maintain called serialise
, which is distributed through CPrAN. The implementation is a bit of a hack, but it's quite stable, and the plugin supports most objects that you'd want to use. All comments welcome.
精彩评论