PHP Syllable Detection [closed]
开发者_如何学Go
Questions asking us to recommend or find a tool, library or favorite off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 9 years ago.
Improve this questionI would like to find a way to be able to split a word into syllables with PHP. For example, the word "nevermore" ran through detect_syllables(), would return "nev-er-more." Are there any good APIs or something out there?
There's a useful PHd thesis paper by Frank Liang that describes an exceptionally accurate algorithm for this: written over 25 years ago, it's still valid. But I'm not aware of any implementation in PHP
EDIT
A quick google has identified this link to a Text Statistics library in PHP, which includes algorithms for syllable counting within words (among other readability measuring algorithms). You should be able to find the code for syllable splitting here.
I'm actually in the finishing stages of making a PHP Hyphenator class based upon Frank Liang's algorithm and the TeX dictionaries, which pretty much seems to be the appoach taken by all office suites. (Actually I found this topic while looking for a good name for it that wasn't already taken). With slowly improving support from browsers for the entity, it's becoming a realistic option to hyphenate content in websites.
Core functionality is working; splitting (and thus counting) and/or hyphenating text and/or HTML, parsing TeX hyphen dictionaries, caching those parsed dictionaries. Some planned features are still missing but nothing that stops you from using it. Also there's no good documentation, samples, formal unittest or vanity site yet.
I've created a github site for it here and will post the current version on it ASAP, so check back in a few days.
I've only tested it with Dutch (my native language) and US English, so it may still have some issues with languages using different character sets.
Note that Frank Liang's paper is on hyphenation, NOT on syllable detection. In addition, his thesis paper itself states that its success rate is around 89% for the dictionary he used, which is not going to be good enough for everyone. There really is no substitute for manually doing it for every single word it seems. It's not that efficient to have to require a complete one-to-one lookup table wordlist in order to do it, but these days storage space is far less expensive than CPU time anyways.
Perhaps someone might consider making a CAPTCHA-like service so that many users could be asked to provide the solution to every known word, with the results checked against each other, so that one person wouldn't have to do it all themselves. I'd hope the results would be released freely once complete.
精彩评论