Regular Expression: Split English and Non-English words with Comma?

2022-12-11 14:14 问答作者：

开发者_运维问答Is there any regular expression pattern to change this string

This is a mix string of üößñ and English. üößñ üößñ are Unicode words.

to this?

This is a mix string of, üößñ, and English., üößñ üößñ, are Unicode words.

Actually, I want to split English words and non-English words with comma.

Thanks.

javascript

/((?:\ [^\w\d]+)+)/g

'This is a mix string of üößñ and English. üößñ üößñ are Unicode words.'.replace(/((?:\ [^\w\d]+)+)/g,',$1,')

This is a mix string of, üößñ, and English., üößñ üößñ, are Unicode words.

Mark

No regular expression can detect strings in a particular language, but you can certainly match characters in (or not in) a range of code points, by using unicode literals, such as

/[\u0900-\u097F]+/

which matches a sequence of Devanagari characters.

Remember that a Script (a collection of characters) can be used by many languages.

Sure, you can use \x to filter specific ASCII code ranges

For example (in JavaScript):

var x = "This is a mix string of üößñ and English. üößñ üößñ are Unicode characters.";
x.replace(/([^\x00-\x80]+\s)+/g, function(match) { return match.slice(0,-1)+", "; } ); // matches characters outside the 0-128 ASCII range

Output:

This is a mix string of üößñ, and English. üößñ üößñ, are Unicode characters.

I'm sure another regex savvy person can optimize further, but this is the best I can think of half-awake :)

    String s = "This is a mix string of üößñ and English. üößñ üößñ are Unicode words.";
    System.out.println(s.replaceAll("((?: ?[\\p{L}&&[^A-Za-z]]+)+)", ",$1,"));

Unicode scripts define about 45 different language scripts. The above simply detects any unicode not in the ASCII range.

继续阅读：javascript php regex

Regular Expression: Split English and Non-English words with Comma?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？