Regex for search queries
I have page designed in Django that has its own search engine. What I need help with is construction of regex that will filter only valid queries, which are consisting only of polish alphabet letters(both开发者_Go百科 upper- and lowercase) and symbols * and ? , can anyone be of assistance?
EDIT: I tried something like that:
query_re = re.compile(r'^\w*[\*\?]*$', re.UNICODE)
if not query_re.match(self.cleaned_data['query']):
raise forms.ValidationError(_('Illegal character'))
but it also allows some invalid characters from different alphabets and wont allow *somest?ing* queries.
If your locale is correctly set, you would use
query_re = re.compile(r'^[\w\*\?]*$', re.LOCALE|re.IGNORECASE)
\w
matches all locale-specific alphanumerics: http://docs.python.org/library/re.html
Try something like
regex = r'(?iL)^[\s\*\?a-z]*$'
assuming your machine's locale is Polish. The first part (?iL) sets the locale and ignorecase flags. The ^ matches the start of the string, \s matches any whitespace, and a-z any lowercase letter (or uppercase, thanks to the ignorecase flag).
Alternatively, instead of using (?L) and a-z, you could just explicitly list the allowable letters (e.g. abcdefghijklmnopqrstuvwxyz).
精彩评论