What happens when you type taringa.net in your browser and press enter?

15 min readApr 21, 2021

Ah, yes. The super interesting question that we all ask ourselves at given time of our life. My school really thought out of the box and replaced what would have been “google.com” with “holbertonschool.com” in the question. That’s unbelievably clever, but I won’t be using that domain and use a more interesting one: taringa.net.

Alright, seriously now. I’ll try my best to explain it below.

0.1: ??

First of all, put some music on (not optional), turn on your computer and open your browser, that’s where we’ll start from. I recommend “Don’t Let it Get To Your Head” by Black Harmony.

1: HSTS or HTTP Strict Transport Security

After typing taringa.net in your browser and pressing enter, your browser first checks its “preloaded HSTS” list. HSTS stands for HTTP Strict Transport Security and it’s a list of websites that have stated explicitly to be accessed via HTTPS only.

HSTS is actually a lesser known security layer that not only protects your website from third parties and SSL Stripping attacks, but also increases the speed of your website (like NO₂ in Fast & Furious).

Setting up HSTS for your website is done by first setting an HSTS header in your Apache config file. Then adding your domain to the HSTS preload list. This list is baked into modern browsers so that a website using this extra layer of security ensures that it is always contacted via HTTPS at all times.

2: DNS Lookup

The next thing that happens is that your browser first checks if the domain, taringa.net, is in its DNS cache.

DNS stands for Domain Name System and it’s a hierarchical naming system that was created because humans can’t to remember IP addresses (dumb, aren’t we?), instead we remember names like taringa.net. Much like the way we have home or office addresses instead of giving out coordinates as addresses to people.

If taringa.net is not found in the cache, before trying to resolve it via DNS, your browser first calls a library function, which varies by operating system, but is mostly called gethostbyname. This function checks if the hostname — taringa.net— can be resolved by reference in the local hosts file, which is a file stored in your location machine (whose location varies by operating system).

If gethostbyname function doesn’t have it cached nor can it find it in the local hosts file, then it sends a request to the DNS server, which is typically the local router or your ISP’s (Internet Service Provider’s) caching DNS server.

3: Address Resolution Protocol Process

At this time, a DNS request is generated to find the IP and MAC addresses of taringa.net since that is what the OSI model understands, not a hostname. The process of finding the MAC address of a computer or server in a network is called the address resolution via an ARP (Address Resolution Protocol) broadcast. ARP is a network protocol used to find out the hardware (MAC) address of a device from an IP address. For example, if you try to ping an IP address on your local network, say 192.79.215.142, your system has to turn that IP address into a MAC address, which is the physical address of the device. However, In order to send an ARP broadcast, you need the target IP address to look up and as of right now, we only know the hostname, taringa.net.

The system then first checks the ARP cache for an entry for the target IP. If it’s in the cache, the library function returns the result: Target IP = MAC.

If the entry is not in the ARP cache, then the route table is looked up. The route table is a set of rules, often viewed in table format, and is used to determine where data packets travel over an IP. The routing table contains the following information:

Destination: The IP address of the packet’s final destination
Next hop: The IP address to which the packet is forwarded
Interface: The outgoing network interface the device should use when forwarding the packet to the next hop or final destination
Metric: Assigns a cost to each available route so that the most cost-effective path can be chosen
Routes: Includes directly attached subnets, indirect subnets that are not attached to the device but can be accessed through one or more hopes, and default routes to use for certain types of traffic or when information is lacking.

If the Target IP is found, the library uses the interface associated with that subnet, otherwise it uses the interface that has the subnet of our default gateway. Then the MAC address of the selected network interface is looked up. Afterwards, the network library sends a Layer 2 ARP Request which looks like this:

Sender MAC: interface:mac:address:here
Sender IP: interface.ip.goes.here
Target MAC: FF:FF:FF:FF:FF:FF (Broadcast)
Target IP: target.ip.goes.here

The type of hardware that exists between the computer and router will define how the ARP Request is sent back, but it typically looks like this:

Sender MAC: target:mac:address:here
Sender IP: target.ip.goes.here
Target MAC: interface:mac:address:here
Target IP: interface.ip.goes.here

At this point, our network library now has the IP address of either our DNS server or the default gateway, and it can then resume its DNS process!

3: Sockets

After our browser receives the IP of the destination server, it also receives the given port number from the URL — the HTTP protocol is to port 80, and HTTPS to port 443. Then it makes a call to the system library function named socket and requests a TCP socket stream - AF_INET/AF_INET6 and SOCK_STREAM.

That request passes through the firewall, and is passed to the Transport Layer where a TCP segment is crafted on port 443. The destination port, which the browser had gotten earlier, is added to the header, and a source port is chosen from within the kernel’s dynamic port range (ip_local_port_range in Linux).

This segment is sent to the Network Layer, which wraps an additional IP header. The IP address of the destination server as well as that of the current machine is inserted to form a packet.

The packet next arrives at the Link Layer. A frame header is added that includes the MAC address of the machine’s NIC as well as the MAC address of the gateway (local router). As before, if the kernel does not know the MAC address of the gateway, it must broadcast an ARP query to find it.

At this point the packet is ready to be transmitted through either:

Please go and read those interesting Wiki pages, I’ll wait.

For most home or small business Internet connections the packet will pass from your computer, possibly through a local network, and then through a modem (MOdulator/DEModulator) which converts digital 1’s and 0’s into an analog signal suitable for transmission over telephone, cable, or wireless telephony connections. On the other end of the connection is another modem which converts the analog signal back into digital data to be processed by the next network node where the from and to addresses would be analyzed further.

Most larger businesses and newer residential connections will have fiber or direct Ethernet connections in which case the data remains digital and is passed directly to the next network node for processing.

Eventually, the packet will reach the router managing the local subnet. From there, it will continue to travel to the autonomous system’s (AS) border routers, other ASes, and finally to the destination server. Each router along the way extracts the destination address from the IP header and routes it to the appropriate next hop. The time to live (TTL) field in the IP header is decremented by one for each router that passes. The packet will be dropped if the TTL field reaches zero or if the current router has no space in its queue (perhaps due to network congestion).

This send and receive happens multiple times following the TCP connection flow:

Client chooses an initial sequence number (ISN) and sends the packet to the server with the SYN bit set to indicate it is setting the ISN
Server receives SYN and if it’s in an agreeable mood:
Server chooses its own initial sequence number
Server sets SYN to indicate it is choosing its ISN
Server copies the (client ISN +1) to its ACK field and adds the ACK flag to indicate it is acknowledging receipt of the first packet
Client acknowledges the connection by sending a packet:
Increases its own sequence number
Increases the receiver acknowledgment number
Sets ACK field
Data is transferred as follows:
As one side sends N data bytes, it increases its SEQ by that number
When the other side acknowledges receipt of that packet (or a string of packets), it sends an ACK packet with the ACK value equal to the last received sequence from the other
To close the connection:
The closer sends a FIN packet
The other sides ACKs the FIN packet and sends its own FIN
The closer acknowledges the other side’s FIN with an ACK

4: TLS handshake (or fist bump in 2020 and on)

Remember when people bumped elbows? Damn. People were told to sneeze and cough on your elbow and then to bump each other’s elbows as a more “healthy” handshake, not very clever.

Alright, sorry; back to boring stuff:

The client computer now sends a ClientHello message to the server with its Transport Layer Security (TLS) version, list of cipher algorithms and compression methods available.

The server replies with a ServerHello message to the client with the TLS version, selected cipher, selected compression methods and the server's public certificate signed by a CA (Certificate Authority). The certificate contains a public key that will be used by the client to encrypt the rest of the handshake until a symmetric key can be agreed upon.

The client verifies the server digital certificate against its list of trusted CAs. If trust can be established based on the CA, the client generates a string of pseudo-random bytes and encrypts this with the server’s public key. These random bytes can be used to determine the symmetric key.

The server decrypts the random bytes using its private key and uses these bytes to generate its own copy of the symmetric master key.

The client sends a Finished message to the server, encrypting a hash of the transmission up to this point with the symmetric key.

The server generates its own hash, and then decrypts the client-sent hash to verify that it matches. If it does, it sends its own Finished message to the client, also encrypted with the symmetric key.

From now on the TLS session transmits the application (HTTP) data encrypted with the agreed symmetric key.

5: HTTP protocol

Most well coded browsers instead of sending an HTTP request to retrieve the page, send a request to try and negotiate with the server an “upgrade” from HTTP to the SPDY protocol.

If the client is using the HTTP protocol and does not support SPDY, it sends a request to the server of the form:

GET / HTTP/1.1
Host: taringa.net
Connection: close
[other headers]

where [other headers] refers to a series of colon-separated key-value pairs formatted as per the HTTP specification and separated by single new lines. (This assumes the web browser being used doesn't have any bugs violating the HTTP spec. This also assumes that the web browser is using HTTP/1.1, otherwise it may not include the Host header in the request and the version specified in the GET request will either be HTTP/1.0 or HTTP/0.9.)

HTTP/1.1 defines the “close” connection option for the sender to signal that the connection will be closed after completion of the response. For example,

Connection: close

HTTP/1.1 applications that do not support persistent connections MUST include the “close” connection option in every message.

After sending the request and headers, the web browser sends a single blank newline to the server indicating that the content of the request is done.

The server responds with a response code denoting the status of the request and responds with a response of the form:

200 OK
[response headers]

Followed by a single newline, and then sends a payload of the HTML content of www.taringa.net. The server may then either close the connection, or if headers sent by the client requested it, keep the connection open to be reused for further requests.

If the HTTP headers sent by the web browser included sufficient information for the web server to determine if the version of the file cached by the web browser has been unmodified since the last retrieval (ie. if the web browser included an ETagheader), it may instead respond with a request of the form:

304 Not Modified
[response headers]

and no payload, and the web browser instead retrieves the HTML from its cache.

After parsing the HTML, the web browser (and server) repeats this process for every resource (image, CSS, favicon.ico, etc) referenced by the HTML page, except instead of GET / HTTP/1.1 the request will be GET /$(URL relative to www.taringa.net) HTTP/1.1.

If the HTML referenced a resource on a different domain than www.taringa.net, the web browser goes back to the steps involved in resolving the other domain, and follows all steps up to this point for that domain. The Host header in the request will be set to the appropriate server name instead of taringa.net.

6: HTTP Server Request Handle

The HTTPD (HTTP Daemon) server is the one handling the requests/responses on the server side. The most common HTTPD web servers are Apache or nginx for Linux and IIS for Windows.

If you have a load-balancer, it first receives the request before distributing it to a web server.
The HTTPD (HTTP Daemon) receives the request.
The server breaks down the request to the following parameters:
HTTP Request Method (either GET, HEAD, POST, PUT, DELETE, CONNECT, OPTIONS, or TRACE). In the case of a URL entered directly into the address bar, this will be GET.
Domain, in this case — taringa.net.
Requested path/page, in this case — / (as no specific path/page was requested, / is the default path).
The server verifies that there is a Virtual Host configured on the server that corresponds with taringa.net.
The server verifies that taringa.net. can accept GET requests.
The server verifies that the client is allowed to use this method (by IP, authentication, etc.).
If the server has a rewrite module installed (like mod_rewrite for Apache or URL Rewrite for IIS), it tries to match the request against one of the configured rules. If a matching rule is found, the server uses that rule to rewrite the request.
The application server goes to pull the content data that corresponds with the request, in our case it will fall back to the index file, as “/” is the main file (some cases can override this, but this is the most common method).
The application server parses the file according to the handler. If Taringa is running on PHP, the server uses PHP to interpret the index file, and streams the output to the client.

7: Behind the scenes of the Browser

Once the application server supplies the data (HTML, CSS, JS, images, etc.) to the browser it undergoes the below process:

Parsing — HTML, CSS, JS
Rendering — Construct DOM Tree → Render Tree → Layout of Render Tree → Painting the render tree

8: Browser

The browser’s functionality is to present the web resource you choose, by requesting it from the application server, which has generated the page, and displaying it in the browser window. The resource is usually an HTML document, but may also be a PDF, image, or some other type of content. The location of the resource is specified by the user using a URI (Uniform Resource Identifier).

The way the browser interprets and displays HTML files is specified in the HTML and CSS specifications. These specifications are maintained by the W3C (World Wide Web Consortium) organization, which is the standards organization for the web.

Browser user interfaces have a lot in common with each other. Among the common user interface elements are:

An address bar for inserting a URI
Back and forward buttons
Bookmarking options
Refresh and stop buttons for refreshing or stopping the loading of current documents
Home button that takes you to your home page

Browser High Level Structure

The components of the browsers are:

User interface: The user interface includes the address bar, back/forward button, bookmarking menu, etc. Every part of the browser display except the window where you see the requested page.
Browser engine: The browser engine marshals actions between the UI and the rendering engine.
Rendering engine: The rendering engine is responsible for displaying requested content. For example if the requested content is HTML, the rendering engine parses HTML and CSS, and displays the parsed content on the screen.
Networking: The networking handles network calls such as HTTP requests, using different implementations for different platforms behind a platform-independent interface.
UI backend: The UI backend is used for drawing basic widgets like combo boxes and windows. This backend exposes a generic interface that is not platform specific. Underneath it uses operating system user interface methods.
JavaScript engine: The JavaScript engine is used to parse and execute JavaScript code.
Data storage: The data storage is a persistence layer. The browser may need to save all sorts of data locally, such as cookies. Browsers also support storage mechanisms such as localStorage, IndexedDB, WebSQL and FileSystem.

9: HTML parsing

The rendering engine starts getting the contents of the requested document from the networking layer. This will usually be done in 8 kb chunks.

The primary job of HTML parser to parse the HTML markup into a parse tree.

The output tree (the “parse tree”) is a tree of DOM element and attribute nodes. DOM is short for Document Object Model. It is the object presentation of the HTML document and the interface of HTML elements to the outside world like JavaScript. The root of the tree is the “Document” object. Prior of any manipulation via scripting, the DOM has an almost one-to-one relation to the markup.

The parsing algorithm

HTML cannot be parsed using the regular top-down or bottom-up parsers.

The reasons are:

The forgiving nature of the language.
The fact that browsers have traditional error tolerance to support well known cases of invalid HTML.
The parsing process is re-entrant. For other languages, the source doesn’t change during parsing, but in HTML, dynamic code (such as script elements containing document.write() calls) can add extra tokens, so the parsing process actually modifies the input.

Unable to use the regular parsing techniques, the browser utilizes a custom parser for parsing HTML. The parsing algorithm is described in detail by the HTML5 specification.

The algorithm consists of two stages: tokenization and tree construction.

Actions when the parsing is finished

The browser begins fetching external resources linked to the page (CSS, images, JavaScript files, etc.).

At this stage the browser marks the document as interactive and starts parsing scripts that are in “deferred” mode: those that should be executed after the document is parsed. The document state is set to “complete” and a “load” event is fired.

Note there is never an “Invalid Syntax” error on an HTML page. Browsers fix any invalid content and go on.

10: CSS interpretation

Parse CSS files, <style> tag contents, and style attribute values using "CSS lexical and syntax grammar"

Each CSS file is parsed into a StyleSheet object, where each object contains CSS rules with selectors and objects corresponding CSS grammar.

A CSS parser can be top-down or bottom-up when a specific parser generator is used.

11: Page Rendering

Create a ‘Frame Tree’ or ‘Render Tree’ by traversing the DOM nodes, and calculating the CSS style values for each node.
Calculate the preferred width of each node in the ‘Frame Tree’ bottom up by summing the preferred width of the child nodes and the node’s horizontal margins, borders, and padding.
Calculate the actual width of each node top-down by allocating each node’s available width to its children.
Calculate the height of each node bottom-up by applying text wrapping and summing the child node heights and the node’s margins, borders, and padding.
Calculate the coordinates of each node using the information calculated above.
More complicated steps are taken when elements are floated, positioned absolutely or relatively, or other complex features are used. See http://dev.w3.org/csswg/css2/ and http://www.w3.org/Style/CSS/current-work for more details.
Create layers to describe which parts of the page can be animated as a group without being re-rasterized. Each frame/render object is assigned to a layer.
Textures are allocated for each layer of the page.
The frame/render objects for each layer are traversed and drawing commands are executed for their respective layer. This may be rasterized by the CPU or drawn on the GPU directly using D2D/SkiaGL.
All of the above steps may reuse calculated values from the last time the webpage was rendered, so that incremental changes require less work.
The page layers are sent to the compositing process where they are combined with layers for other visible content like the browser chrome, iframes and addon panels.
Final layer positions are computed and the composite commands are issued via Direct3D/OpenGL. The GPU command buffer(s) are flushed to the GPU for asynchronous rendering and the frame is sent to the window server.

12: GPU Rendering

During the rendering process the graphical computing layers can use general purpose CPU or the graphical processor GPU as well.
When using GPU for graphical rendering computations the graphical software layers split the task into multiple pieces, so it can take advantage of GPU massive parallelism for float point calculations required for the rendering process.

Post-rendering and user-induced execution

After rendering has completed, the browser executes JavaScript code as a result of some timing mechanism or user interaction (typing a query into the search box and receiving suggestions). Plugins such as Flash or Java may also execute while browsing Taringa. Scripts can cause additional network requests to be performed, as well as modify the page or its layout, causing another round of page rendering and painting.

That’s it! Was it interesting? A little bit maybe, if you’re really into that type of stuff.

Have a nice life!

Sources: