What happens when you type google.com in your browser and press Enter
Have you ever wondered what goes on behind the scenes when you visit a website like Google? How does your browser know where to find the web page you want to see? How does the web server handle your request and send back the right content? How does the web page interact with other services and databases to provide you with relevant information? In this blog post, we will try to answer these questions by following the journey of a web request from your browser to Google's web server and back.
DNS REQUEST
The first step in this journey is to resolve the domain name https://www.google.com to an IP address. An IP address is a unique numerical identifier that every device connected to the Internet has. It allows devices to communicate with each other using a common protocol called TCP/IP. However, humans usually prefer to use domain names, which are easier to remember and type, instead of IP addresses. Therefore, we need a way to translate domain names to IP addresses. This is where DNS (Domain Name System) comes in.
DNS is a distributed system of servers that store and provide mappings between domain names and IP addresses. When you type https://www.google.com in your browser, your browser will first check its own cache to see if it already knows the IP address for this domain name. If not, it will ask your operating system to do the same. If the operating system also does not have the answer, it will send a DNS query to a DNS resolver, which is usually provided by your Internet Service Provider (ISP). The DNS resolver will then contact one or more DNS servers to find the answer. These servers may be root servers, top-level domain (TLD) servers, or authoritative servers, depending on how specific the domain name is.
For example, if we want to resolve https://www.google.com, we may need to contact the following servers:
- A root server, which knows the IP addresses of all TLD servers (such as .com, .org, .net, etc.)
- A .com server, which knows the IP addresses of all second-level domain servers (such as google.com, amazon.com, facebook.com, etc.)
- A google.com server, which knows the IP addresses of all subdomain servers (such as www.google.com, mail.google.com, maps.google.com, etc.)
The DNS query will start from the root server and follow the hierarchy of domains until it reaches the authoritative server that has the final answer. The authoritative server will then send back the IP address of www.google.com to the DNS resolver, which will cache it and send it back to your operating system, which will cache it and send it back to your browser.
TCP/IP
Now that your browser knows the IP address of www.google.com, it can initiate a connection with the web server using TCP/IP. TCP/IP stands for Transmission Control Protocol/Internet Protocol, which are two protocols that work together to enable reliable and efficient data transmission over the Internet.
TCP is responsible for establishing a connection between two devices, dividing data into packets (small chunks of data), assigning sequence numbers and checksums (error-detection codes) to each packet, sending and receiving packets, reordering packets if they arrive out of order, retransmitting packets if they are lost or corrupted, and terminating the connection when it is no longer needed.
IP is responsible for routing packets from one device to another across multiple networks. It assigns a unique IP address to each device and adds a header (additional information) to each packet that contains the source and destination IP addresses and other information such as time-to-live (TTL), which limits how long a packet can travel before being discarded.
To establish a connection with www.google.com's web server, your browser will send a TCP packet with a special flag called SYN (synchronize) to indicate that it wants to start a connection. The web server will respond with another TCP packet with two flags: SYN and ACK (acknowledge), which means that it agrees to establish a connection and acknowledges receiving your SYN packet. Your browser will then send another TCP packet with only ACK flag to confirm receiving the web server's SYN+ACK packet. This process is called a three-way handshake and ensures that both devices are ready to communicate.
FIREWALL
Before your TCP packet reaches www.google.com's web server, it may have to pass through one or more firewalls. A firewall is a device or software that monitors and filters incoming and outgoing network traffic based on predefined rules. It can block, allow, or modify packets based on criteria such as IP addresses, ports, protocols, or content. Firewalls are used to protect networks and devices from unauthorized or malicious access, as well as to enforce policies and regulations.
For example, your ISP may have a firewall that blocks certain websites or services that are deemed illegal or inappropriate in your country. Or, Google may have a firewall that blocks requests from IP addresses that are known to be sources of spam or attacks. Or, you may have a personal firewall on your computer that blocks requests from applications that you do not trust or recognize.
If your TCP packet passes through all the firewalls without being blocked or modified, it will finally reach www.google.com's web server. If not, you may see an error message in your browser indicating that the connection was refused or timed out.
HTTPS/SSL
Once your browser establishes a TCP connection with www.google.com's web server, it will send an HTTP request to ask for the web page you want to see. HTTP stands for Hypertext Transfer Protocol, which is a protocol that defines how web browsers and web servers communicate and exchange data. However, HTTP is not secure, meaning that anyone who intercepts the network traffic can see the content of the request and the response, including any sensitive information such as passwords, credit card numbers, or personal details. This is where HTTPS comes in.
HTTPS stands for Hypertext Transfer Protocol Secure, which is a version of HTTP that uses SSL (Secure Sockets Layer) or TLS (Transport Layer Security) to encrypt the data between the browser and the web server. SSL/TLS are protocols that use cryptographic techniques to ensure confidentiality, integrity, and authentication of the data. Confidentiality means that only the intended recipient can decrypt and read the data. Integrity means that the data cannot be modified or corrupted without being detected. Authentication means that the sender and the receiver can verify each other's identity.
To use HTTPS, your browser and www.google.com's web server need to perform an SSL/TLS handshake before sending any HTTP request or response. The SSL/TLS handshake involves the following steps:
- Your browser sends a ClientHello message to www.google.com's web server, which contains information such as the SSL/TLS version, the supported cipher suites (encryption algorithms), and a random number.
- www.google.com's web server sends a ServerHello message to your browser, which contains information such as the chosen cipher suite, a random number, and a digital certificate. A digital certificate is a document that proves the identity and public key of www.google.com's web server. It is issued by a trusted third party called a certificate authority (CA), which verifies that www.google.com's web server owns the domain name https://www.google.com.
- Your browser verifies the validity of the digital certificate by checking its signature, expiration date, and issuer. It also checks if the CA is trusted by looking at its own list of trusted CAs. If everything is OK, your browser extracts the public key of www.google.com's web server from the certificate.
- Your browser sends a ClientKeyExchange message to www.google.com's web server, which contains a pre-master secret encrypted with the public key of www.google.com's web server. A pre-master secret is another random number that is used to generate a master secret. A master secret is a shared secret key that is used to encrypt and decrypt the data between your browser and www.google.com's web server.
- www.google.com's web server decrypts the pre-master secret with its private key and generates the same master secret as your browser.
- Your browser and www.google.com's web server send ChangeCipherSpec messages to each other to indicate that they are ready to switch to encrypted communication.
- Your browser and www.google.com's web server send Finished messages to each other to confirm that the handshake was successful. These messages are encrypted with the master secret and contain hashes (digests) of all the previous messages exchanged during the handshake.
After completing the SSL/TLS handshake, your browser and www.google.com's web server can securely send HTTP requests and responses using symmetric encryption (the same key for encryption and decryption) with the master secret.
LOAD-BALANCER
When your browser sends an HTTP request to https://www.google.com, it may not reach a single web server, but rather a load-balancer. A load-balancer is a device or software that distributes incoming requests among multiple web servers based on various criteria such as availability, capacity, performance, or location. Load-balancing improves scalability, reliability, and efficiency of web applications by spreading the workload among multiple servers and avoiding overloading or crashing of any single server.
For example, Google may have thousands of web servers around the world that can handle requests for https://www.google.com. A load-balancer can decide which web server to forward your request to base on factors such as:
- The geographic proximity of the web server to your location
- The current load (number of requests) on each web server
- The health