535. Encode and Decode TinyURL
Last updated
Was this helpful?
Last updated
Was this helpful?
Note: This is a companion problem to the problem:.
TinyURL is a URL shortening service where you enter a URL such ashttps://leetcode.com/problems/design-tinyurl
and it returns a short URL such ashttp://tinyurl.com/4e9iAk
.
Design theencode
anddecode
methods for the TinyURL service. There is no restriction on how your encode/decode algorithm should work. You just need to ensure that a URL can be encoded to a tiny URL and the tiny URL can be decoded to the original URL.
Thoughts: 1. link to algorithm discussed in :
Hash Function: long_url -> md5/sha1: a. md5 convert a string into 128 binary bits, generally represented as 16 bytes hex: -> c93a360dc7f3eb093ab6e304db516653; b. sha1 convert a string into 160 binary bits, generally represented as 20 bytes hex: -> dff85871a72c73c3eae09e39ffe97aea63047094
These two algorithms could make sure hash values are randomly distributed, implementation-wise simple.
Cons: the conflicts are inevitable. Any hash algorithm could have inevitable conflicts.
Solutions: 1. use (long_url + timestamp) as the hash function key. 2. When conflicts, regenerates the hash value(it's different because timestamp changes).
Overall, when urls are over 1 billion, there would be a lot of conflicts and the efficiency could be very low.
Base62: Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion.
This implementation simply adopts the base-62 (26*2 letters + 10 digits) mapping to the URL string. Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion.
Each short_url represent a decimal digit. It could be the auto_increment_id in SQL database.
The following solution doesn't have these problems. It produces short URLs likehttp://tinyurl.com/KtLa2U
, using a random code of six digits or letters. If a long URL is already known, the existing short URL is used and no new entry is generated.
It's possible that a randomly generated code has already been generated before. In that case, another random code is generated instead. Repeat until we have a code that's not already in use. How long can this take? Well, even if we get up to using half of the code space, which is a whopping 626/2 = 28,400,117,792 entries, then each code has a 50% chance of not having appeared yet. So the expected/average number of attempts is 2, and for example only one in a billion URLs takes more than 30 attempts. And if we ever get to an even larger number of entries and this does become a problem, then we can just use length 7. We'd need to anyway, as we'd be running out of available codes.
Base 62 with Random Generator: by 's