swearing in code
I have an application that uses predefined word lists but I want to extend it to give the option of using their own custom lists.
Unfortunately lists like SOWPODS (the official Scrabble word list) are quite comprehensive and contain words I wouldn't want popping up on the screen.
I can e开发者_如何学Pythonasily get hold of a banned word list and build it into my application as a kind of swear filter. Is this likely to get my app trapped by any application filtering that may be present on Google Marketplace and, if so, is there a way around it? (encryption, compression etc.)
EDIT: Most of the answers so far are missing the point that the user will be supplying the list so I have no control over its content and need to filter it in my app either on import or as it is used. (Though they will still blame me if the app "swears" at them)
Is there a reason you couldn't just filter the words upon import against a "bad words" list that, according to a previous comment you made, it sounds like you already compiled?
You could also add the option into a preferences menu so that it doesn't filter them on import.
Edit: Google's policies don't allow "excessive profanity." If it is rejected, I assume you could just appeal with the argument that it is a filter against profanity and your app would be accepted.
Random thought: why not build a Bloom Filter for disallowed words, and store the bits in the filter in your program's executable instead of the word list? Sure, you might get the odd false positive, but in the space of possible strings your word list is going to filter a lot more bits.
Alternatively, if what you're really worried about is someone doing a string dump on your application, some simple obfuscation like base64 should do the trick.
Many *nix distros include a word list in a plain text file /usr/share/dict/words
(used for spell-check, etc.). On my OSX Leopard laptop the list appears to be stripped of the f-words. On my linux server, the f-words are there. Check your *nix distro with grep to see what you have and if it doesn't contain f-words, you could base your program on that word list.
Why not have a list of good-words instead of bad-words.. Much easier to find, and will make sure people can not trick your filter. I do however believe that users do not really like filters.
I would think the banned word list would be relatively small (what, 15-20 words?). I haven't done anything like this in Java yet, but I imagine it would be simple to, when the user imports a list, put that list into a binary search tree, and then check it against the banned word list, deleting any matching entries. Then save this filtered list and use it.
Just to add to this, I would perhaps have a popup dialog, or maybe a preference that allows the user to disable filtering. Always better to give the option. :)
I commented, but this is really more of an answer.
I think you need to learn how "spam" and "content filtering" works.
Neither of those things will prevent your app from containing or emitting any type of word. To be very clear, neither are going to search the binary of your application for those words.
That said, you can absolutely keep a list of words with your installer that you use to filter out what is displayed to the user regardless of what they upload.
BTW, "spam" filters are there to stop spam email from being received and hence block those. Content Filters work two ways. First, by letting the content providers explicitly state what audience their content is good for and second by filtering the data as it comes across. These do NOT work inside of an application; rather, they work on the data a web browser receives.
I would start with a standard word list. A separate program would filter out any bad words and create your own modified standard word list.
Your application would then not need to worry about filtering anything out. No garbage in, no garbage out.
精彩评论