RegexpError: Stack overflow in regexp matcher

2023-01-22 05:39 问答作者：

I have small problem with a simple tokenizer regex:

def test_tokenizer_regex_limit
   string = '<p>a</p>' * 400
   tokens = string.scan(/(<\s*tag:.*?\/?>)|((?:[^<]|\<(?!\s*tag:.*?\/?>))+)/)
end

Basically it runs through the text and gets pairs of [ matched_tag , other_text ]. Here's an example: http://rubular.com/r/f88JBjfzFh

Works fine for smaller sets. If you run in under ruby 1.8.7 it will blow up. 1.9.2 work开发者_JAVA百科s fine.

Any ideas how to simplify / improve this? My regex-fu is weak

This is a bit more simplified but not much:

(<[^<]*:[^<]*>)|((?:[^<]|<[^:]*>)+)

~~(<.*?>|[^<>]+)~~

继续阅读：regex ruby

精彩评论