Url remover from user input
I am currently creating a website in wich the user can add text to a data base, I am trying to write some code with will remove any urls which have been written in the text. It must be able to find all prefixes (www. , "nothing" http://) and all suffixes (.com, .co.uk, .de). I understand that this is a hard task as urls can come in a veriaty of ways hence my asking for any advice here. below are some examples of ways users could hide urls (please add any if you have any oters). Thanks
www.google.com
开发者_运维技巧www.google.co.uk
www.google.de
w w w . g o o g l e . c o m
w|w|w|.|g|o|o|g|l|e|.|c|o|m
you can set up regular expressions to find known variations, but making an algorithm that catches any variation a user can throw in is not possible. If you want to fight this battle, it will be ongoing, as people intent on bypassing your system will find a way.
This does not mean all is hopeless. You can start banning users that do this type of thing. You can also force everyone to be a user to be able to enforce. Banning certain IPs is also an option. This will still not stop the persistent gnat, but going for the 100% solution is expensive.
What is the context for this requirement?
精彩评论