Best way to 'filter' user input for username [closed]

2023-04-03 02:47 问答作者：

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance. Closed 10 years ago.

I have a site which allows users to create a 'unique URL' so they can pass along to colleagues in the form of www.site.com/customurl.

I, of course, run a check to make sure the input is actually unique but I also want to filter out things like large company names (copyrighted names, etc) and curse words. To do this, my thought was to build a txt file with a list of every possible name/word which came to mind. The file size on the test txt file we have is not a concern but am curious if this is the best way to go about this. I do not think a DB call is as efficient as reading in the text file.

My code is:

$filename = 'badurls.txt';
$fp = fopen($_SERVER['DOCUMENT_ROOT'] . '/' .$filename, 'r'); 
if ($fp) { 
  $array = explode("\n", fread($fp, filesize(开发者_运维百科$_SERVER['DOCUMENT_ROOT'] . '/' .$filename))); 
}

if(in_array($url, $array)) {
  echo 'You used a bad word!';
} else {
  echo 'URL would be good'; 
}

NOTE

I am talking about possibly a list of the top 100-200 companies and maybe 100 curse words. I could be wrong but do not anticipate this list ever growing beyond 500 words total, let alone 1000.

You may not think that a DB call is as efficient, but it is much more efficient. The database generates indexes on the data, and so it doesn't actually have to iterate through each item (as in_array does internally) to see if it exists. Your code will be O(n) and the DB will be O(log n)... Not to mention the memory savings from not having to load the file in its entirety on each page load. (see B-Tree Indexes).

Sure, 500 elements isn't a whole lot. It wouldn't be a huge deal to just stick that in a file, would it? Actually, it would. It's not a much a performance issue (the overhead of the DB call will cancel out the efficiency loss of the file, so they should be roughly even in terms of time). But it is an issue of maintainability. You say today that 500 words is the maximum. What happens when you realize that you need to provide duplicate detection? That is, check for the existence of existing URLs in your site. That will require a DB query anyway, so why not just take care of it all in one place?

Just create a table with names, index it, and then do a simple SELECT. It will be faster. And more efficient. And more scalable... Imagine if you reach 1gb of data. A database can handle that fine. A file read into memory cannot (you'll run out of RAM)...

Don't try to optimize like this, Premature Optimization should be avoided. Instead, implement the clean and good solution, and then optimize only if necessary after the application is finished (and you can identify the slow parts)...

One other point worth considering. The code as is will fail if $url = 'FooBar'; and foobar is in the file. Sure, you could simply do strtolower on the url, but why bother? That's another advantage of the database. It can do case-insensitive traversal. So you can do:

SELECT id FROM badnametable WHERE badname LIKE 'entry' LIMIT 1

And just check that there are no matching rows. There's no need to do a COUNT(*), or anything else. All you care about is the number of matching rows (0 is good, !0 is not good).

继续阅读：fopen fread php text

Best way to 'filter' user input for username [closed]

NOTE

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

NOTE

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集 河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？