Help Implement Tags in PHP
In my recent PHP project, I need to implement Tags (searchable) separated by comma (similar to this site or something like in WordPress). What is the smart way to detect and remove unnecessary characters or tags? Putting the XSS concern aside, first of all I need to clean and extract only text if user inputs HTML(or other tags) instead of the plain text.
For example:
If user inputs <b>sdfasdf</b&开发者_Go百科gt;, <a href="something">sdfsdfsdf</a>, <sdfsdfsdf
It should strip out all the unnecessary characters and tags and only plain text should be saved in database.
I have tried it in WordPress and it is very smart to figure out this plus automatically extracts text only.
My question:
Is there an open source library available for this task, which I can integrate in my project. I have done some homework regarding this but *htmlentities(), strip_tags(), HTML Purifier* etc. doesn't seem suitable for this task. Or do need to build my own library combined with this?
Can somebody guide me on this?
Thanks!
In addition to removing "complete" tags (markup language elements) such as found in <b>sdfasdf</b>, <a href="something">sdfsdfsdf</a>
,
you can also remove "forbidden" characters such as "<", ">", and "&" (using preg_replace
and the like), and collapse multiple spaces into a single space (also using preg_replace
).
Remember, they're used only as tags (keywords), so it's acceptable here to use a somewhat restricted character set. In Stack
Overflow, for instance, only letters, numbers, and hyphens are allowed in tags.
I would look at this the other way around. What input is legal? Which characters are allowed in tag names? Ones those questions are answered I would build a server-side whitelist of legal characters using regex, state the rules in the UI, and simply reject input that does comply.
Massaging invalid inpu into valid, is rarely a good idea.
Characters allowed in tags are usually alphanumeric + dashes and underscores. Some sites also allow spaces.
精彩评论