URL shortener with no database

2023-02-06 21:51 问答作者：

I'd like to write a URL shortener that doesn't have to use a database. Instead, to have as few moving parts as possible, the script would just create a unique hash for my URL based on an algorithm (like md5, except an md5 would be 开发者_如何学JAVAtoo long). I'm not really sure how I'd go about doing this. Any advice?

If it matters, I'd prefer to write this in Ruby.

What you need, is a way to compress and decompress a String. Where the resulting compressed version is a string too. This is nearly impossible, because an URL is already very short. Encoding and lossless compression always add minimal overhead, which will result in a string that is larger than the original, for most URLS.

For very long URLs, however, it may work.

So, in the end, you will almost always need a lookup-table in storage (database).

Base64 is the most logical solution. On itself, however, Base64 encoding returns longer strings than the original, for short strings (which URL are, generally); due to the padding, mostly. So we'll also try with zlib, to compress the string.

require "uri"
require "base64"
require "zlib"

shortner_url = URI.parse("https://s.to")
long = "https://stackoverflow.com/questions/4818429/url-shortener-with-no-database"
url = URI.parse(long) 
stripped = url.host + url.path
stripped.length #=> 66

# Let's see that Base64 on its own does not shorten the url.
encoded = Base64.encode64(stripped)
encoded.length #=> 90

# So, using zlib. To compress.
compressed = Zlib::Deflate.deflate(stripped)
encoded = Base64.encode64(compressed)
encoded.length #=> 94 
# It became worse.

# Now, with a long url (they can be much longer even), in a oneliner; to simplify omit the stripping part:
long = "http://www.thelongestlistofthelongeststuffatthelongestdomainnameatlonglast.com/wearejustdoingthistobestupidnowsincethiscangoonforeverandeverandeverbutitstilllookskindaneatinthebrowsereventhoughitsabigwasteoftimeandenergyandhasnorealpointbutwehadtodoitanyways.html"
long.length #=> 263
Base64.encode64(Zlib::Deflate.deflate(long)).length #=> 228

# In order to turn this into a valid short URL, however, we need `urlsaf_encode64()`

shortner_url.path = "/" + Base64.urlsafe_encode64(Zlib::Deflate.deflate(long))
shorther_url.to_s #=> "https://s.to/eJxNjkEWwyAIRG-U7HsbElFpEPIE68vti6t2BcwbZn51v1_7PufcvCKrFDRnMtf8u81HzuA_IWkDEoGG4EtiMN9ObftE6Pgey0FSvK6gIx7GTUl0GsmJSz1Biqpk7fjBDpL-xjGcopKYWfWyiySBRBFJABw9UnB9xaWj1LDCQWUGAQYzBVLECPbyxFLBJDqA7-DxSJ5YIbkGnoM8Ex7bqjf-AiodbYM="
 shortner_url.to_s.length #=> 237 WE SAVED 26 characters!

Note on stripping: can remove 'https://'. A Real implementation would need to add a piece to the string, to determine https or http: '1'+result for https, '0'+result for http. Another "hack" would be to make the url-shortening service use http for http urls and https for https urls.

If you always have the same domain, you can disgard the domain part too.

If you have a lot of slashes, or other repeating characters such as a dash, the compression works better.

You could do this with several of the string manipulation tools available to transform a URL into something obscured however as you noted in your question the url's you get from doing this would be longer than is typical for a url shortener.

url's don't compress very well.

Ultimately if you're after a short link, you simply need to generate a suitably legible unique code (try to omit similar letters/numbers such as zero and 'o', in case some poor bugger actually has to type it in) and associate that code with the original URL in some form of store.

Whilst I can understand why you don't want to use a database, in many ways it's the perfect form of storage, especially if you look at one of the dedicated key/value stores such as Cassandra, Redis, MongoDB, etc. (That said, a simple "traditional" SQL database may be an easy first step if you're in unfamiliar territory.)

You won't be able to resolve the original URL from a hash code without looking it up in some kind of database.

About the only thing you can do without a database is compress the URL and then decompress it when you resolve the URL.

Strictly speaking, I guess you could just hash the URL. But of what possible value would that be if you are not able to resolve it back to the original URL?

继续阅读：url-shortener

URL shortener with no database

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？