Generally a Good Idea to Always Hash Unique Identifiers in URL?

2022-12-11 22:00 问答作者：

Most sites which use an auto-increment primary-key display it openly in the url.

i.e.

example.org/?id=5

This makes it very easy for anyone to spider a site and collect all the information by simply incrementing the value of id. I can understand where in some cases this is a bad thing if permissions/authentication are not setup correctly and anyone could view anything by simply guessing the id, but is it ever a good thing?

example.org/?id=e4da3b7fbbce2345d7772b0674a318d5

Is there ever a situation where hashing the id to prevent crawling is bad-practice (besides losing the time it takes to setup this functionality)? Or 开发者_如何学Gois this all a moot topic because by putting something on the web you accept the risk of it being stolen/mined?

Generally with web-sites you're trying to make them easy to crawl and get access to all the information so that you can get good search rankings and drive traffic to your site. Good web developers design their HTML with search engines in mind, and often also provide things like RSS feeds and site maps to make it easier to crawl content. So if you're trying to make crawling more difficult by not using sequential identifiers then (a) you aren't making it more difficult, because crawlers work by following links, not by guessing URLs, and (b) you're trying to make something more difficult that you also spend time trying to make easier, which makes no sense.

If you need security then use actual security. Use checks of the principal to authorize or deny access to resources. Obfuscating URLs is no security at all.

So I don't see any problem with using numeric identifiers, or any value in trying to obfuscate them.

Using a hash like MD5 or SHA on the ID is not a good idea:

there is always the possibility of collisions. That is, two different IDs hash to the same value.
How are you going to unhash it back to the actual ID?

A better approach if you're set on avoiding incrementing IDs would be to use a GUID, or just a random value when you create the ID.

That said, if your application security relies on people not guessing an ID, that shows some flaws elsewhere in the system. My advice: stick to the plain and easy auto-incrementing ID and apply some proper access control.

I think hashing for publicly accessible id's is not a bad thing, but showing sequential id's will in some cases be a bad thing. Even better, use GUID/UUIDs for all your IDs. You can even use sequential GUIDS in a lot of technologies, so it's faster (insert-stage) (though not as good in a distributed environment)

Hashing or randomizing identifiers or other URL components can be a good practice when you don't want your URLs to be traversable. This is not security, but it will discourage the use (or abuse) of your server resources by crawlers, and can help you to identify when it does happen.

In general, you don't want to expose application state, such as which IDs will be allocated in the future, since it may allow an attacker to use a prediction in ways that you didn't forsee. For example, BIND's sequential transaction IDs were a security flaw.

If you do want to encourage crawling or other traversal, a more rigorous way would be to provide links, rather than by providing an implementation detail which may change in the future.

Using sequential integers as IDs can make many things cheaper on your end, and might be a resonable tradeoff to make.

My opinion is that if something is on the web, and is served without requiring authorization, it was put with the intention that it should be publicly accessible. Actively trying to make it more difficult to access seems counter-intuitive.

Often, spidering a site is a Good Thing. If you want your information available as much as possible, you want sites like Google to gather data on your site, so that others can find it.

If you don't want people to read through your site, use authentication, and deny access to people who don't have access.

Random-looking URLs only give the impression of security, without giving the reality. If you put account information (hidden) in a URL, everyone will have access to that web spider's account.

My general rule is to use a GUID if I'm showing something that has to be displayed in a URL and also requires credentials to access or is unique to a particular user (like an order id). http://site.com/orders?id=e4da3b7fbbce2345d7772b0674a318d5

That way another user won't be able to "peek" at the next order by hacking the url. They may be denied access to someone else's order, but throwing a zillion letters and numbers at them is a pretty clear way to say "don't mess with this".

If I'm showing something that's public and not tied to a particular user, then I may use the integer key. For example, for displaying pictures, you might wish to allow your users to hack the url to see the next picture.

http://example.org/pictures?id=4, http://example.org/pictures?id=5, etc.

(I actually wouldn't do either as a simple GET parameter, I'd use mod_rewrite (or something) to make readable urls. Something like http://example.org/pictures/4 -> /pictures.php?picture_id=4, etc.)

Hashing an integer is a poor implementation of security by obscurity, so if that's the goal, a true GUID or even a "sequential" GUID (whether via NEWSEQUENTIALID() or COMB algorithm) is much better.

Either way, no one types URLs anymore, so I don't see much sense in worrying about the difference in length.

继续阅读：language-agnostic

Generally a Good Idea to Always Hash Unique Identifiers in URL?

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？