开发者

How do I use regex to do this in Python? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
开发者_如何转开发

Want to improve this question? Add details and clarify the problem by editing this post.

Closed 8 years ago.

Improve this question
def symbolsReplaceDashes(text):

I want to replace all spaces and symbols with hyphens. Because I want to use this with URL.


import re
text = "this isn't alphanumeric"
result = re.sub(r'\W','-',text) # result will be "this-isn-t-alphanumeric"

The \W class is the inverse of the \w class, which consists of alphanumeric characters and underscores ([a-zA-Z0-9_]). Thus, replacing any character that doesn't match \W with a dash will leave you with a string that consists of only alphanumerics, underscores, and dashes, suitable for a URL.


Instead of regex, if you want to escape a string to be used for an url, use urllib.quote() or urllib.quote_plus(). For more complex queries, you might want to build the url using urllib.urlencode(). You can reverse the quotation with urllib.unquote() and urllib.unquote_plus().


This response doesn't use regular expressions, but should also work, with greater control over the types of symbols to filter. It uses the unicodedata module to remove all symbols by checking the categories of the characters.

import unicodedata

# See http://www.dpawson.co.uk/xsl/rev2/UnicodeCategories.html for character categories
replace = ('Sc', 'Sk', 'Sm', 'So', 'Zs')
def symbolsReplaceDashes(text):
    L = []
    for char in text:
        if unicodedata.category(unicode(char)) in replace:
            L.append('-')
        else: L.append(char)
    return ''.join(L)

You may need to use something like urllib.quote(output.encode('utf-8')) to encode characters if ranges are beyond basic ASCII alphanumeric characters.

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜