534. Design TinyURL
Note: For the coding companion problem, please see:Encode and Decode TinyURL.
How would you design a URL shortening service that is similar to TinyURL?
Background:
TinyURL is a URL shortening service where you enter a URL such ashttps://leetcode.com/problems/design-tinyurland it returns a short URL such ashttp://tinyurl.com/4e9iAk.
Requirements:
- For instance, - "http://tinyurl.com/**4e9iAk"is the tiny url for the page- "https://leetcode.com/problems/design-tinyurl". The- identifier (the highlighted part) can be any string with 6 alphanumeric characters containing - 0-9,- a-z,- A-Z.
- Each shortened URL must be unique; that is, no two different URLs can be shortened to the same URL. 
Note about Questions: Below are just a small subset of questions to get you started. In real world, there could be many follow ups and questions possible and the discussion is open-ended (No one true or correct way to solve a problem). If you have more ideas or questions, please ask in Discuss and we may compile it here!
Questions:
- How many unique identifiers possible? Will you run out of unique URLs? - Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion. Of course this will run out (assuming there is no expiration for the short_url) 
 
- Should the identifier be increment or not? Which is easier to design? Pros and cons? - Look at the "Thoughts" part (Hash function vs Base62) 
 
- Mapping an identifier to an URL and its reversal - Does this problem ring a bell to you? 
- How do you store the URLs? Does a simple flat file database work? - SQL? 
 
- What is the bottleneck of the system? Is it read-heavy or write-heavy ? 1. Read-Heavy? 
- Estimate the maximum number of URLs a single machine can store. - 10M new mappings (long URL to short URL) per day. 
- assume each mapping takes 100B in average. 
- 1GB every day. 1 TB hard drive could stand for 3 years. 
- Storage is not the problem for this kind of system. Service like Netflix may have storage issues 
- Through SN( Scenario: Long URL to short URL and reversed.; Need:Assume the system is not massive if you are not sure) analysis, we could have a big picture of the system. In general, this system is not hard and could be handled by a single SSD Machine. 
 
- Estimate the maximum number of queries per second (QPS) for decoding a shortened URL in a single machine. - Daily User: 100M. 
- Daily usage per person: (Write) long2short 0.1s, (Read) short2long 1s. 
- Daily request: Write 10M, Read 100M. 
- QPS: Since a day is 86400s approximately 100K: Write 100, Read 1K. 
- Peak QPS: Write 200, Read 2K. 
- Thousand level can be handled by a single SSD MySQL Machine 
 
- How would you scale the service? For example, a viral link which is shared in social media could result in a peak QPS at a moment's notice. 
- How could you handle redundancy? i,e, if a server is down, how could you ensure the service is still operational? 
- Keep URLs forever or prune, pros/cons? How we do pruning? (Contributed by @alex_svetkin) 
- What API would you provide to a third-party developer? (Contributed by @alex_svetkin) 1. Only one service: URLService 1. Core (Business Logic) Layer:Class: URLService 2. Interface: - URLService.encode(String long_url) 
- URLService.decode(String short_url) 
- Web Layer: - REST API: 
 
- GET: /{short_url}, return a http redirect response(301) 
- POST: goo.gl method - google shorten URL 
 
- If you can enable caching, what would you cache and what's the expiry time? (Contributed by @Humandroid) 
- SQL vs NoSQL: 1. Does it need to support transactions? NoSQL does not support transaction. No -> NoSQL 2. Do we need rich SQL query? NoSQL does not support as many queries as SQL. No -> NoSQL 3. Pursue development efficiency? Most Web Framework supports SQL database very well (with ORM(Object-Relational Mapping)). It means fewer codes for the system. Does not matter because of only a few lines of code -> NoSQL 4. Do we need to use AUTO_INCREMENT ID? NoSQL couldn't do this. It only has a global unique Object_id. Our algorithm needs AUTO_INCREMENT ID.-> SQL 5. Does the system has a high requirement for QPS? NoSQL has high performance. For example, Memcached's QPS could reach million level, MondoDB does 10K level, MySQL only supports K level. Write 200, Read 2K. Not high.-> SQL 6. How high is the system's scalability? SQL requires developers write their codes to scale, while NoSQL comes with them (sharding, replica). Not high -> SQL 
Thoughts:
For the coding part, the goal is to compress the long URL and make sure the original url would be re-applied after the encode-decode. For the systematic algorithm part, here are two proposed solution:
- Hash Function: long_url -> md5/sha1: a. md5 convert a string into 128 binary bits, generally represented as 16 bytes hex: http://site.douban.com/chuan-> c93a360dc7f3eb093ab6e304db516653; b. sha1 convert a string into 160 binary bits, generally represented as 20 bytes hex: http://site.douban.com/chuan-> dff85871a72c73c3eae09e39ffe97aea63047094 - These two algorithms could make sure hash values are randomly distributed, implementation-wise simple. 
- Cons: the conflicts are inevitable. Any hash algorithm could have inevitable conflicts. 
- Solutions: 1. use (long_url + timestamp) as the hash function key. 2. When conflicts, regenerates the hash value(it's different because timestamp changes). - Overall, when urls are over 1 billion, there would be a lot of conflicts and the efficiency could be very low. 
 
- Base62: Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion. 1. This implementation simply adopts the base-62 (26*2 letters + 10 digits) mapping to the URL string. Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion. 2. Each short_url represent a decimal digit. It could be the auto_increment_id in SQL database. 
// You can type code here and execute it.
#include<iostream>
#include<algorithm>
#include<string>
using namespace std;
string idToShortURL(long int test){
    char map[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    string shorturl;
    while(test){
        shorturl.push_back(map[test%62]);
        test = test/62;
    }
    reverse(shorturl.begin(), shorturl.end());
    return shorturl;
}
long int shortURLtoID (string shortURL){
    long int id = 0;
    for (int i = 0; i < shortURL.length(); i++){
        if ('a' <= shortURL[i] && shortURL[i]<='z')
            id = id * 62 + shortURL[i] - 'a';
        if ('A' <= shortURL[i] && shortURL[i] <= 'Z')
            id = id * 62 + shortURL[i] - 'A' + 26;
        if('0' <= shortURL[i] && shortURL[i]<='9')
            id = id * 62 + shortURL[i] - '0' + 52;
    }
    return id;
  }
// function 
int main(int argc, char** argv) {
    unordered_map<int, string> record;
    for (int i = 1; i< argc; i++){
         cout<<"original URL: "<<(argv[i])<<endl;
        record[i] = (argv[i]);
    }
    for (int i = 1; i< argc; i++){
        cout<<"test URL: "<<record[i]<<endl;
        string shortURL = idToShortURL(i);
        cout<<"generated URL: "<<shortURL<<endl;
        cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;
    }
    // int test = 66666;
    // string shortURL = idToShortURL(test);
    // cout<<"generated URL: "<<shortURL<<endl;
    // cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;
}Special thanks: A "complete" solution for TinyURL (Leetcode System Design) and Segmentfault for the reference!
Last updated
Was this helpful?