Should I be obfuscating database IDs from my users?
I'm being told that I shouldn't be using database IDs directly in HTML code in web applications.
Currently I use the IDs on things like table row IDs (tableRow-454 where 454 is the ID of the row in the DB), in hidden or selects fields in forms or in URLs. (I'm not referring t开发者_运维知识库o telling people visually on a page that they are ####.)
The recommendation I was given was to use some math to obfuscate the ID from the user. I'm thinking this will only make things more complicated and add unnecessary complexity. But I can see some good reasons to make it more difficult to determine a database ID from the HTML.
Do you obfuscate the IDs from the user? Or do you care?
No, you're just making extra work for yourself.
As long as you're doing enough testing that changing an ID here or there won't give the users access to something they shouldn't then you're fine having the IDs visible.
In some situations it can be beneficial to hide them or have non-sequential numbers, or maybe not starting counting from zero. For example if someone got order number 3 they might start asking questions...
IMO: No, you shouldn't obfuscate IDs. If security is a concern for your application, you need other security measures anyway. Security by obscurity is not enough, it just gives you a false sense of security.
BTW, if you do just a little math, chances are that other IDs - obscured too - are still predictable. In many cases, the bad guy doesn't care which account he breaks into, as long as it isn't his own ;-)
Yes, you probably should.
If a user maliciously changes those IDs, does your code check to make sure the data they're then requesting is still data they're allowed to see?
It's easier to hide the ID than manually do the validation all over the place.
Edit, with clarity this time:
If you display to the screen a list with ID's 1-to-20, it's very easy to validate that the response was between 1-and-20. If you display the actual IDs to screen, you either needed to spend memory caching the IDs, or need to make another database call/a more lengthy query to make sure they aren't giving you bad data.
If you use surrogate IDs, this becomes easier, I think.
I would say that it in general is OK, but there are some gotchas:
If your numbers are sequential you open up to 'guessing games'(e.g. the user changes details?id=4511 to details?id=4512 to see someone else's details). To counteract this you firstly need proper security. In some cases this will not be enough though, because revealing that the information exists could be a problem in itself. Take Youtube for instance, they use more spread out id:s to avoid robot screen scrapings.
If you make assumptions about the nature of the id:s you could be in trouble (i.e if you manipulate the id:s directly in the GUI or sort items based on ids). This could be a nasty surprise once you need to load balance your DB and end up with separate id ranges.
ID reuse. For instance MySql:s InnoDB engine resets sequences upon restart (to latest max value). This can bite you hard if you do deletes.
But all these gotchas apply to custom id:s as well, but are probably where these people are coming from. The most flexible solution in my eyes are GUID:s, because you can generate them anywhere and don't run into any of the above mentioned problems. They are harder to read for humans though, so it's a trade-off.
You should hide the database IDs from users
Why? It is an implementation detail that they (the user) do not need to be concerned about. I am ID 123311122? Who cares! (Unless, perhaps I'm on ICQ). See how SO minimizes (but does not eliminate) exposure, for instance.
On a secondary note, using predictable IDs is also another vector of attack, security by obscurity, but it's still another valid tool. I've never worked on software where this has been a concern though.
That being said, if your framework does not provide the means to handle "hiding IDs" or otherwise providing a clean mapping, then it may simply not be worth it to switch over.
I would say that if that ID has any kind of meaning for the user, and you're dealing with a data sensitive application, you should hide the value of the ID. More importantly I would recommend that you verify if the current user does have access to the data they're requesting via URL. For example if you have a URL like mysite.com/account/view/12, does the current user have access to the that account ID (i.e. 12).
If the information is public and not sensitive I would say it's not required to hide the ID values.
I don't like to expose anything that talks about the internals of the data to users, however it's usually always going to happen especially when it comes to creating dropdowns based on a datasource. My solution to this is to have different identity specs on tables to eliminate incrementer attacks and where possible encrypt the querystring values. This isn't always possible, given very popular MVC URL patterns in .NET so I tend to use a unique column name and then normalize it for the URL. I then lookup that value in the database. There's no easy way to work around it completely however.
"I'm being told that I shouldn't be using database IDs directly in HTML code in web applications. ... Do you obfuscate the IDs from the user?"
Users expect to be facing data that they can link to the real world they are dealing with in their jobs.
"Database IDs" should be part of the user interface only to the extent that those IDs are also part of the user's real-life working situation. If it isn't, then indeed the application should shield the user completely from such fields.
Doing that shielding indeed amounts to "more work", but that is completely irrelevant. Not doing that shielding means giving the user an application that will be harder for him to handle, which will eventually result in the "more work" being done by the user himself, each and every day over again.
I recommend you to use friendly names. It's more modern and more semantic. Also, it looks way better when you create a link.
For example, consider mysite/products.aspx?id=25 vs mysite/products/ferrari . The latter is cleaner, easier to remember (for users), and so on.
If you are currently using IDs in hidden fields, try using session to store those values. If you use them only in the table row IDs you mentioned, I don't think it's worth the time to hide them.
ID:s in URI:s are a part of a good UI if you want your users to be able to walk that road. All URI:s are public anyway, it's up to the responding page to handle the request and base the rendering on some kind of authentication.
Transactions and otherwise sensitive material could easily be hidden with a salted + hashed string and stored in another column in your database if you wanted to, but if you're still not validating user input (which in web programming clearly could be "navigating through the URI") you're missing the point.
As Greg answered, an order number should probably not be low, but the responsibility of that kind of information handling should not lie within the webpage but within the organization's/your stated values (i.e. you would still be printing out the order number on a written receipt so why hide it in the first place?)
With the web being as transparent as it is today, it's hard to find another use of it besides the way it's used to shorten URI:s, i.e. Youtube's video pages and the countless URI-shortening services.
As an example of "publishing" an id to the public, look at the url for this post.
I tried https://stackoverflow.com/questions/1, but it looks like it was deleted. You've got to go to While applying opacity to a form should we use a decimal or double value? to find the "first" ordinal question ; )
... no harm done, just a reference to another post.
If you do decide to tamper-proof row ids in hidden fields, you might look at Steve Sanderson's asp.net mvc book on page 415 where he presents an HMAC (Hashing Message Authentication Code) Utility class for the purpose. It's .NET of course but you can get the idea. You can see it on Google Books, (Sanderson has written about the best programming book I've read).
When I do put row ids in html and the business model says a user should have access to a certain set of rows, such as those he/she has inserted, I might design a db table to include a user id column and include it in any db operations, (pseudo code: where @rowID = table.rowID AND @userID = table.userID). That way, if another user user hacks the rowIDs in hidden fields, the db operation won't let them see another user's data because the userids won't match up. Of course, this approach assumes you're using authentication of some sort and it hasn't been hacked too.
精彩评论