TinyURL is a URL shortening service where you enter a URL such ashttps://leetcode.com/problems/design-tinyurland it returns a short URL such ashttp://tinyurl.com/4e9iAk.
Design theencodeanddecodemethods for the TinyURL service. There is no restriction on how your encode/decode algorithm should work. You just need to ensure that a URL can be encoded to a tiny URL and the tiny URL can be decoded to the original URL.
Hash Function: long_url -> md5/sha1: a. md5 convert a string into 128 binary bits, generally represented as 16 bytes hex:
http://site.douban.com/chuan-> c93a360dc7f3eb093ab6e304db516653; b. sha1 convert a string into 160 binary bits, generally represented as 20 bytes hex: http://site.douban.com/chuan-> dff85871a72c73c3eae09e39ffe97aea63047094
These two algorithms could make sure hash values are randomly distributed, implementation-wise simple.
Cons: the conflicts are inevitable. Any hash algorithm could have inevitable conflicts.
Solutions: 1. use (long_url + timestamp) as the hash function key. 2. When conflicts, regenerates the hash value(it's different because timestamp changes).
Overall, when urls are over 1 billion, there would be a lot of conflicts and the efficiency could be very low.
Base62: Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion.
This implementation simply adopts the base-62 (26*2 letters + 10 digits) mapping to the URL string. Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion.
Each short_url represent a decimal digit. It could be the auto_increment_id in SQL database.
// You can type code here and execute it.#include<iostream>#include<algorithm>#include<string>usingnamespacestd;stringidToShortURL(longinttest){char map[] ="abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"; string shorturl;while(test){shorturl.push_back(map[test%62]); test = test/62;}reverse(shorturl.begin(),shorturl.end());return shorturl;}longintshortURLtoID(stringshortURL){longint id =0;for(int i =0; i <shortURL.length(); i++){if('a'<=shortURL[i]&&shortURL[i]<='z') id = id *62+shortURL[i]-'a';if('A'<=shortURL[i]&&shortURL[i]<='Z') id = id *62+shortURL[i]-'A'+26;if('0'<=shortURL[i]&&shortURL[i]<='9') id = id *62+shortURL[i]-'0'+52;}return id;}// function intmain(intargc,char**argv){ unordered_map<int, string> record;for(int i =1; i< argc; i++){ cout<<"original URL: "<<(argv[i])<<endl;record[i]=(argv[i]);}for(int i =1; i< argc; i++){ cout<<"test URL: "<<record[i]<<endl; string shortURL =idToShortURL(i); cout<<"generated URL: "<<shortURL<<endl; cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;}// int test = 66666;// string shortURL = idToShortURL(test);// cout<<"generated URL: "<<shortURL<<endl;// cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;}
The following solution doesn't have these problems. It produces short URLs likehttp://tinyurl.com/KtLa2U, using a random code of six digits or letters. If a long URL is already known, the existing short URL is used and no new entry is generated.
It's possible that a randomly generated code has already been generated before. In that case, another random code is generated instead. Repeat until we have a code that's not already in use. How long can this take? Well, even if we get up to using half of the code space, which is a whopping 626/2 = 28,400,117,792 entries, then each code has a 50% chance of not having appeared yet. So the expected/average number of attempts is 2, and for example only one in a billion URLs takes more than 30 attempts. And if we ever get to an even larger number of entries and this does become a problem, then we can just use length 7. We'd need to anyway, as we'd be running out of available codes.
class Codec:
alphabet = string.ascii_letters + '0123456789'
def __init__ (self):
self.long2short = {}
self.short2long = {}
def encode(self, longUrl):
"""Encodes a URL to a shortened URL.
:type longUrl: str
:rtype: str
"""
while longUrl not in self.long2short:
# randromly generate a code
shortUrl = ''.join(random.choice(Codec.alphabet) for _ in range(6))
# check the existence
if shortUrl not in self.short2long:
self.long2short[longUrl] = shortUrl
self.short2long[shortUrl] = longUrl
return 'http://tinyurl.com/' + shortUrl
def decode(self, shortUrl):
"""Decodes a shortened URL to its original URL.
:type shortUrl: str
:rtype: str
"""
return self.short2long[shortUrl[-6:]]
# Your Codec object will be instantiated and called as such:
# codec = Codec()
# codec.decode(codec.encode(url))