534. Design TinyURL

Note: For the coding companion problem, please see:Encode and Decode TinyURL.

How would you design a URL shortening service that is similar to TinyURL?

Background: TinyURL is a URL shortening service where you enter a URL such ashttps://leetcode.com/problems/design-tinyurland it returns a short URL such ashttp://tinyurl.com/4e9iAk.

Requirements:

For instance,"http://tinyurl.com/**4e9iAk" is the tiny url for the page"https://leetcode.com/problems/design-tinyurl". The
identifier (the highlighted part) can be any string with 6 alphanumeric characters containing0-9,a-z,A-Z.
Each shortened URL must be unique; that is, no two different URLs can be shortened to the same URL.

Note about Questions: Below are just a small subset of questions to get you started. In real world, there could be many follow ups and questions possible and the discussion is open-ended (No one true or correct way to solve a problem). If you have more ideas or questions, please ask in Discuss and we may compile it here!

Questions:

How many unique identifiers possible? Will you run out of unique URLs?
1. Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion. Of course this will run out (assuming there is no expiration for the short_url)
Should the identifier be increment or not? Which is easier to design? Pros and cons?
1. Look at the "Thoughts" part (Hash function vs Base62)
Mapping an identifier to an URL and its reversal - Does this problem ring a bell to you?
How do you store the URLs? Does a simple flat file database work?
1. SQL?
What is the bottleneck of the system? Is it read-heavy or write-heavy ? 1. Read-Heavy?
Estimate the maximum number of URLs a single machine can store.
1. 10M new mappings (long URL to short URL) per day.
2. assume each mapping takes 100B in average.
3. 1GB every day. 1 TB hard drive could stand for 3 years.
4. Storage is not the problem for this kind of system. Service like Netflix may have storage issues
5. Through SN( Scenario: Long URL to short URL and reversed.; Need:Assume the system is not massive if you are not sure) analysis, we could have a big picture of the system. In general, this system is not hard and could be handled by a single SSD Machine.
Estimate the maximum number of queries per second (QPS) for decoding a shortened URL in a single machine.
1. Daily User: 100M.
2. Daily usage per person: (Write) long2short 0.1s, (Read) short2long 1s.
3. Daily request: Write 10M, Read 100M.
4. QPS: Since a day is 86400s approximately 100K: Write 100, Read 1K.
5. Peak QPS: Write 200, Read 2K.
6. Thousand level can be handled by a single SSD MySQL Machine
How would you scale the service? For example, a viral link which is shared in social media could result in a peak QPS at a moment's notice.
How could you handle redundancy? i,e, if a server is down, how could you ensure the service is still operational?
Keep URLs forever or prune, pros/cons? How we do pruning? (Contributed by @alex_svetkin)
What API would you provide to a third-party developer? (Contributed by @alex_svetkin) 1. Only one service: URLService 1. Core (Business Logic) Layer:Class: URLService 2. Interface:
- URLService.encode(String long_url)
- URLService.decode(String short_url)
- Web Layer:
  1. REST API:
- GET: /{short_url}, return a http redirect response(301)
- POST: goo.gl method - google shorten URL
If you can enable caching, what would you cache and what's the expiry time? (Contributed by @Humandroid)
SQL vs NoSQL: 1. Does it need to support transactions? NoSQL does not support transaction. No -> NoSQL 2. Do we need rich SQL query? NoSQL does not support as many queries as SQL. No -> NoSQL 3. Pursue development efficiency? Most Web Framework supports SQL database very well (with ORM(Object-Relational Mapping)). It means fewer codes for the system. Does not matter because of only a few lines of code -> NoSQL 4. Do we need to use AUTO_INCREMENT ID? NoSQL couldn't do this. It only has a global unique Object_id. Our algorithm needs AUTO_INCREMENT ID.-> SQL 5. Does the system has a high requirement for QPS? NoSQL has high performance. For example, Memcached's QPS could reach million level, MondoDB does 10K level, MySQL only supports K level. Write 200, Read 2K. Not high.-> SQL 6. How high is the system's scalability? SQL requires developers write their codes to scale, while NoSQL comes with them (sharding, replica). Not high -> SQL

Thoughts:

For the coding part, the goal is to compress the long URL and make sure the original url would be re-applied after the encode-decode. For the systematic algorithm part, here are two proposed solution:

Hash Function: long_url -> md5/sha1: a. md5 convert a string into 128 binary bits, generally represented as 16 bytes hex: http://site.douban.com/chuan-> c93a360dc7f3eb093ab6e304db516653; b. sha1 convert a string into 160 binary bits, generally represented as 20 bytes hex: http://site.douban.com/chuan-> dff85871a72c73c3eae09e39ffe97aea63047094
1. These two algorithms could make sure hash values are randomly distributed, implementation-wise simple.
2. Cons: the conflicts are inevitable. Any hash algorithm could have inevitable conflicts.
3. Solutions: 1. use (long_url + timestamp) as the hash function key. 2. When conflicts, regenerates the hash value(it's different because timestamp changes).
  Overall, when urls are over 1 billion, there would be a lot of conflicts and the efficiency could be very low.
Base62: Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion. 1. This implementation simply adopts the base-62 (26*2 letters + 10 digits) mapping to the URL string. Take short_url as a 62 base notation. 6 bits could represent 62^6=57 billion. 2. Each short_url represent a decimal digit. It could be the auto_increment_id in SQL database.

// You can type code here and execute it.
#include<iostream>
#include<algorithm>
#include<string>
using namespace std;

string idToShortURL(long int test){
    char map[] = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789";
    string shorturl;
    while(test){
        shorturl.push_back(map[test%62]);
        test = test/62;
    }
    reverse(shorturl.begin(), shorturl.end());
    return shorturl;
}

long int shortURLtoID (string shortURL){
    long int id = 0;
    for (int i = 0; i < shortURL.length(); i++){
        if ('a' <= shortURL[i] && shortURL[i]<='z')
            id = id * 62 + shortURL[i] - 'a';
        if ('A' <= shortURL[i] && shortURL[i] <= 'Z')
            id = id * 62 + shortURL[i] - 'A' + 26;
        if('0' <= shortURL[i] && shortURL[i]<='9')
            id = id * 62 + shortURL[i] - '0' + 52;
    }
    return id;
  }

// function 
int main(int argc, char** argv) {
    unordered_map<int, string> record;

    for (int i = 1; i< argc; i++){
         cout<<"original URL: "<<(argv[i])<<endl;
        record[i] = (argv[i]);
    }
    for (int i = 1; i< argc; i++){

        cout<<"test URL: "<<record[i]<<endl;
        string shortURL = idToShortURL(i);
        cout<<"generated URL: "<<shortURL<<endl;
        cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;
    }

    // int test = 66666;
    // string shortURL = idToShortURL(test);
    // cout<<"generated URL: "<<shortURL<<endl;
    // cout<<"Mapped URL back to ID: "<<shortURLtoID(shortURL)<<endl;
}

Special thanks: A "complete" solution for TinyURL (Leetcode System Design) and Segmentfault for the reference!

Previous348. Design Tic-Tac-Toe Next535. Encode and Decode TinyURL

Last updated 6 years ago

Was this helpful?