开发者

Best way to 'filter' user input for username [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 10 years ago.

I have a site which allows users to create a 'unique URL' so they can pass along to colleagues in the form of www.site.com/customurl.

I, of course, run a check to make sure the input is actually unique but I also want to filter out things like large company names (copyrighted names, etc) and curse words. To do this, my thought was to build a txt file with a list of every possible name/word which came to mind. The file size on the test txt file we have is not a concern but am curious if this is the best way to go about this. I do not think a DB call is as efficient as reading in the text file.

My code is:

$filename = 'badurls.txt';
$fp = fopen($_SERVER['DOCUMENT_ROOT'] . '/' .$filename, 'r'); 
if ($fp) { 
  $array = explode("\n", fread($fp, filesize(开发者_运维百科$_SERVER['DOCUMENT_ROOT'] . '/' .$filename))); 
}

if(in_array($url, $array)) {
  echo 'You used a bad word!';
} else {
  echo 'URL would be good'; 
}

NOTE

I am talking about possibly a list of the top 100-200 companies and maybe 100 curse words. I could be wrong but do not anticipate this list ever growing beyond 500 words total, let alone 1000.


You may not think that a DB call is as efficient, but it is much more efficient. The database generates indexes on the data, and so it doesn't actually have to iterate through each item (as in_array does internally) to see if it exists. Your code will be O(n) and the DB will be O(log n)... Not to mention the memory savings from not having to load the file in its entirety on each page load. (see B-Tree Indexes).

Sure, 500 elements isn't a whole lot. It wouldn't be a huge deal to just stick that in a file, would it? Actually, it would. It's not a much a performance issue (the overhead of the DB call will cancel out the efficiency loss of the file, so they should be roughly even in terms of time). But it is an issue of maintainability. You say today that 500 words is the maximum. What happens when you realize that you need to provide duplicate detection? That is, check for the existence of existing URLs in your site. That will require a DB query anyway, so why not just take care of it all in one place?

Just create a table with names, index it, and then do a simple SELECT. It will be faster. And more efficient. And more scalable... Imagine if you reach 1gb of data. A database can handle that fine. A file read into memory cannot (you'll run out of RAM)...

Don't try to optimize like this, Premature Optimization should be avoided. Instead, implement the clean and good solution, and then optimize only if necessary after the application is finished (and you can identify the slow parts)...

One other point worth considering. The code as is will fail if $url = 'FooBar'; and foobar is in the file. Sure, you could simply do strtolower on the url, but why bother? That's another advantage of the database. It can do case-insensitive traversal. So you can do:

SELECT id FROM badnametable WHERE badname LIKE 'entry' LIMIT 1

And just check that there are no matching rows. There's no need to do a COUNT(*), or anything else. All you care about is the number of matching rows (0 is good, !0 is not good).

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜