Roger Ngo's Website

Personal thoughts about life and tech written down.

Understanding Browser Rendering: A Trip Down the Browser Rendering Pipeline


This article is actually based originally off an engineering paper I wrote for at TSheets. The original content had received positive reviews from my peers. As a result of that, I find this to be essential knowledge to all web developers in the client-side centric world we live in nowadays. I have decided to polish the writing even further and present it here in this article.

The context and motivation behind the writing is explained early on, but you will find that I quickly move away from it. The initial pieces are just to build up motivation on why understanding how the browser rendering pipeline works is important to performance turning and why it is crucial to web applications and client-side user experiences.

This paper is informal, but will be quite long. It is mostly written to be read and understood in pieces. In consideration of that, I have mindfully separated each section in this discussion as digestible pieces. You can feel free to stop reading at any time at the end of every section and be able to come back later and pick up where you have left off. Or, you can grab a coffee, get cozy on the passenger seat while I take you on a trip down the browser rendering pipeline.

Fun fact, the title of this paper was inspired by Jim Blinn's A Trip Down the Graphics Pipeline book.


Around the time of March 2017: At my workplace, we started to notice that our Android application was behaving slowly the past couple of weeks. For brevity, I will not go into the discussion on the details of the architecture of the application. The Android application basically makes use of an Android WebView activity which wraps the web application within itself.

The performance issues we had been seeing was basically "janky" scrolling paired with poor haptic feedback on button presses. Screens with lots of data to scroll had jerky movements and items would render slowly onto the screen. Button presses from the user had a lot of lag time in between which lead to unresponsive behavior.

At first, our team had assumed that our user-base experiencing these performance issues were using cheaper and underpowered Android tablets. During my investigation, I tried to convince myself of this assumption and decided to do a test. I decided to create my own Android application using a simple WebView to contain the web application. In the process, I made sure to turn on any settings that would actually benefit performance such as caching, hardware acceleration and other necessary things to get the web application running properly.

With the WebView application pointed to my development machine as the server, and a very underpowered Android tablet, I did the following:

  1. Run the application through my web browser on the tablet.
  2. Run the application through my homemade WebView application.

What I found was surprising. The results threw out the reasoning of cheap Android tablets being the cause of these performance issues.

Running the application through the web browser brought a great user experience. It ran fast and the button touch feedback was great. With the wrapper application, it was much slower.

After some reading and surfing the internet, I learned the reason why WebView is much slower as compared to browsing through a web browser (Chrome). Apparently, an activity with a WebView has to make a callback to the Android Application system on actions taken within that activity. Essentially, there is really another layer for data to flow through across the operating system.

With this realization, I made the conclusion with my limited mobile application knowledge, that there is really no way in tweaking the WebView better than what it is already. Additionally, given the amount of time I really had to do this investigation, re-engineering the WebView application would be quite the undertaking given the amount of clients we already had using it.

I did figure that we could do one thing. The thought I had was that we could try to improve the perception of speed through UI trickery. The reasoning behind this is very old school. We don't have to really be that much faster in a quantitative sense. We can just be faster perceptually.

In order to create a perceptually better user experience, I needed to think about what I had noticed while using the application.

  1. All lists had scrolling issues. They all had the same "janky" effect when scrolling up and down.
  2. Button presses had terrible user feedback.

From here on, I hope I have motivated you enough in that this is a real life problem.

The Browser Rendering Pipeline

To understand browser performance, we need to have a discussion of the browser rendering pipeline. This tells us the theoretical reason in why our web application can be slow and what we can do to make it faster.

Understanding the Browser Rendering Pipeline is Important

We all need to give our dedicated SREs (site reliability engineers) a huge pat on the back. Sometimes they have to deal with our bad code that doesn't seem to improve in performance no matter what tricks they pull out of the hat on the infrastructure side.

An application can be tailored to only move so fast on the infrastructure and platform side. Back-end code can only be optimized so much. We often put a lot of care on the backend of the web. This is where we optimize our database queries and server-side code for processing and packaging our data as a response to a specific request in a very efficient manner. However, the client side for most engineers is a different beast in itself. All of the speed and optimization is useless if you cannot render the presentation layer at the client side quickly enough to give the perception of quick feedback to the user. All steps require some amount of time to be taken. A response rendering on the client side can take the longest. Both HTTP request and response take time.

Figure 1. A traditional 3-tiered web application architecture.

Front end web development has become more and more important in recent years. Nowadays, to maintain a rich user experience, more processing has now been offloaded to the front-end of the web and the back end for most conventional web applications has now become a simple data-retrieval tool.

This leads me to deliver a point in saying that we forget that there could exist, a bottleneck on "the other side". It just so happens that this "other side", the client-side is actually the slowest part of your application to be delivered. It is simply due to the fact that data has to be transferred over the wire, then having the browser interpret that data, construct it and present it in such a manner that was specified by the developer.

Modern browsers, especially evergreen ones like Chrome, or Firefox put a lot of effort into maximizing hardware to render pages faster. It is up to the developer to know the potential of this brings. Upon page load, there is a specific cycle of operations that happen before the page gets displayed on the screen to the user. For simplicity, we'll call this the browser rendering pipeline.

Achieving a rich and responsive user experience requires knowledge in how the browser rendering pipeline works. It will allow us to better use the hardware available to deliver the best experience possible. As a user, we sometimes feel like a website is absolutely choking our device when it comes to just simply browsing through web content. That is a sign that there could be much optimization in need of the rendering process of the page.

User Experience and the Perception of Smoothness

Without getting too much into specifics, it’s important to note that our screens on our devices in North America refresh typically at 60 Hz, which is basically 60 frames per second.

The higher the frame rate, the smoother the perceived animation for the human eye. If we do the math, 60 times per second gives us an interval of about 16 ms to deliver a new frame onto the screen (1000ms/60). This is the window of computation we must take advantage of in order to meet that rate of 60 fps. Suppose we take longer than 16 ms to do work and deliver to the client for each frame. That is, if we miss the allotted time too often, the browser will then skip frames. This resulting animation results in perceived jerkiness on screen. Not the user experience we want to achieve.

At a very high level, a web browser performs the following when receiving an HTTP response from the server be rendered out as a web page:

Figure 2. The browser rendering pipeline.

  1. The stream of bytes is sent back from the server to the client through a socket connection.
  2. The stream of bytes is then translated to character.
  3. The characters are then interpreted as data to be parsed into the DOM and CSSOM trees.
  4. Script execution occurs to modify any of the styling rules once the DOM and CSSOM are constructed.
  5. The trees are then accessed with their information to construct the render tree.
  6. The render tree is then used to perform style and layout calculations and the translation of data into pixels. We call this the paint and composite step, but simply can be also called rasterization.

While we live in a multi-threaded world and it is true that browser network connections can happen in parallel at a finite limit, it is not the case for rendering. The web browser essentially renders on a single execution thread.

All of the steps of the browser rendering pipeline can block another step. Receiving the data, construction of the appropriate trees, executing scripts and finally painting the data to the viewport to be seen by the user all happens under one thread. The most obvious thing to do when trying to render a page as fast as possible is to reduce execution time of the pipeline overall. We can do this by reducing the execution time in all steps of the pipeline, or we can pick and choose the steps we think is worth optimizing for.

Any time the browser needs to introduce new visual information to the user, it will go through this rendering pipeline. This means that at any time we manipulate the style and layout of a DOM element, the paint and composite process needs to run again to rasterize the pixels on the screen. That even means just scrolling up and down will cause a repaint onto the screen. If scrolling happens to be attached to an event that modifies the DOM tree, we can trigger what would be a "reflow" of the page. This reflow results in style and layout calculations followed by the repaint. The point is, it is important to be very mindful that any of these steps can result in slow rendering.

Situations in which we need to present and recalculate a lot of what is on the screen occur quite often. For example, scrolling through a list of with a lot of items with events attached to them is a common use case. Unfortunately, having a lot of elements increases the need to repaint a lot of areas.

The browser does indeed try its best to be as smart as possible and only re-render what has changed. Sometimes, the developer can actually control this and manipulate the priority of elements, or separate them completely from other seemingly dependent elements to tell the browser that the particular component is the only one that needs a re-render.

DOM Construction

When the browser receives the HTML as character data, it begins parsing the data immediately. Parsing involves two steps: lexical analysis and syntax analysis.

Lexical analysis essentially means to break down the data being received into a valid tokens. Tokens being words in which are valid in the vocabulary of the language in question.

After tokens have been recognized, a parser will then move into syntax analysis, where it will essentially take the tokens created from lexing, and then use a set of rules to construct a statement which is valid within these rules. These rules are the grammar of the language.

The grammar is what makes parsing trivial or non-trivial. If a language is constrained to a set of rules where these rules construct a statement which has a single and clear-cut meaning, then we consider this a context-free grammar. It really means what it says it is, no context involved in the interpretation of the statement we have just parsed.

HTML unfortunately, does not have a context-free grammar. This was done for reasons by design. HTML is meant to be a flexible document markup language that has a lot of leeway in terms of not conforming to the syntax strictly. This allows even less technical users to create documents for presentation. The browser tries its best to parse and present HTML in the best interpretation possible. Because of this nature of HTML, the browser rendering engine must resolve any malformed HTML on a case-by-case basis.

Figure 3. Malformed HTML being displayed. The strong tag was not properly closed and thus the HTML parser continued to parse the footer text with this tag as if it was still open.

Therefore, if the parsing of HTML is on a case by case basis, we can say that the parsing process makes use of the state machine concept. Each step in the parsing process reads characters that signify some sort of logic in which should be executed. For example, and opening bracket < will signify that a tag is open, and characters following it will be attributes and attribute values. The character > which usually follows the attribute data will specify that the attributes for that tag have been specified.

As the browser is parsing the HTML, it is creating an object for each valid HTML element it encounters. This is the process of constructing the document object model, or more commonly referred to as the DOM. The DOM is represented as a tree. Each element is node which contains links to children. The construction of this tree as a whole is a reflection of the document's structure.

Figure 4. A typical HTML document.

Figure 5. The HTML document has been parsed into a DOM tree.

Figure 6. Rendered HTML document.

Although tree traversal is logarithmic in time, time is still… time. A big tree with very useless traversals can really slow down the rendering pipeline. So, writing good HTML is key. Easily parsed HTML leads to faster construction of the DOM tree. This makes going to the next steps easier, as the rest of the steps rely on traversing the DOM efficiently to quickly "move on". So the rule is, the bigger the HTML document, the longer the DOM tree construction will potentially take.

CSSOM Construction

Although the DOM tree gives us the structure of the document, it does not tell us about the visual qualities of each HTML element. That is why we use CSS. The CSS received by the browser goes through a similar step of HTML parsing to the DOM. For CSS, the browser parses this data into the CSS Object Model, or CSSOM. This phase is part of the style calculation step in our pipeline.

As the browser is parsing the DOM, as it encounters any link tags with a stylesheet, the browser begins making network requests to retrieve the CSS. At that time, the browser then constructs the CSSOM.

CSS on the other hand, is a context-free grammar. The rules are concrete and so it is parsed like any other programming language. As CSS is parsed, the selectors are read and the browser will create a tree to represent the styling rules similar to the DOM construction.

Figure 7. Adding CSS to the web page introduces the construction of the CSSOM.

Figure 8. The CSS styled web page.

Figure 9. The CSSOM

Just like the DOM where we had to be mindful about the structure of our document, we also must be mindful about the naming of the rules of our CSS. An intelligent naming styling selection scheme is critical for efficient tree traversal once the style calculations are done to the render tree nodes.

When the web browser begins applying styles to the HTML elements, it makes a tree traversal to find any matching rules and as it encounters any nodes that are related, the styles become applicable. These rules effectively trickle down and can be thought of as being "stacked".

Just to illustrate and example of poor CSS rule construction, we can have a silly rule such as div div div div where a CSS rule gets applied to every div that precedes 3 other div elements.

The only way for the browser to know to apply a style for such a div is if it traverses down the child nodes of a subtree every time a div is encountered. Imagine if we had many nodes with just 2 or 3 nested divs instead of 4 nested divs in which we need to apply our styling. The browser would have wasted traversal of 3 children div only to find out that the rule does not apply to the descendant node. Then the traversal must go back up the tree and continue to find all other children in which the CSS rule applies to.

Figure 10. Wasted CSSOM traversal on the styling rule: div div div div

Therefore, specifying the selector for a particular CSS rule is thus very important. The recommendation and easy optimization we can do here is to actually be as explicit as possible for styling.

Since the CSSOM is constructed as the DOM is also being built, CSS is essentially a blocking operation in the pipeline. Most developers then put the CSS up on the head tag as to make it available for downloading and parsing as soon as possible. For basic performance optimizations, this means that CSS must be:

  1. As lightweight as possible.
  2. Efficient and intelligent use of styling rules for faster tree operations.
  3. If it is a remote resource, the styling rules must come from a fast network connection.

With the ever growing number of diverse devices that are now accessing the Internet and being able to actually browse web pages, more screen sizes are taken into consideration.

This has caused developers to create more and more CSS to accommodate the various sizes. This bloats up CSS and thus slows rendering. However, with the diligent use of media queries, we can specify to only block rendering under certain conditions.

By being more explicit in our CSS header includes, we can effectively tell the browser that it does not need to be concerned with waiting for a stylesheet to be downloaded. A good example is if our orientation is let's say, landscape, and the CSS in queue for download is for a portrait device.

Script Execution

Okay with all this, I still have not addressed the elephant in the room yet. What about JavaScript? JavaScript as we know, can manipulate the DOM by inserting, modifying, removing and also adding extra styling information to the elements in our document.

With all falling back to the basis that the web works in a single threaded model, where things are synchronous, script execution is also render blocking just like CSSOM construction. I would also mention that JavaScript itself is single-threaded and event-driven -- therefore most operations will not be executed immediately, but placed in a queue for future processing.

The fact that JavaScript allows us to manipulate DOM and CSSOM, every time JS is executed to modify the page, the browser will actually have to trigger the browser to redo the style calculations in order to present the updated information. Depending on how the script is written, this can cause the process to be unnecessarily expensive.

It is a common case where inline JavaScript can be found embedded within our documents. The browser is parsing character data as it comes through the wire. It makes sense that it cannot set aside data for later processing because it happens to be aware about the context of the data whether it be: HTML, CSS or JavaScript. Instead the browser halts DOM construction and parses the data within the script tag to execute it first.

Figure 11. Script download and execution flow.

JavaScript is unpredictable in that the browser does not know what the script will do the DOM. The browser here thinks it is better to play it safe than sorry and just yield DOM construction and give the time to the script and let it do its processing. This is why inline JavaScript can potentially slow down the initial rendering process of the page.

The basic workflow is as follows:

  1. Browser parses HTML and constructs the DOM.
  2. If a script tag is encountered, the DOM construction process is paused and the browser will hand execution off to the JavaScript engine.
  3. The JavaScript engine completes script processing and hands off execution back to the browser's DOM parser to resume DOM construction.There is another caveat to all this. The browser actually executes the script data after the CSSOM has been created. So if we have a huge CSS file that needs to be downloaded and parsed, the browser will block all DOM construction and script execution. Also, with what was mentioned previously, the browser will be blocked in constructing the DOM if there is a script to be executed... Oh my!

Figure 12. Blocking cycle.

For any external scripts loaded with the script tag with the attribute src, the rendering process can actually be even slower even though the browser will execute an asynchronous network request for this. The script being fetched, just like CSS, can reside in a remote network location. Depending on the speed of the network connection, the fetch can be many, many times slower. The fact of the matter is remote scripts have to be retrieved elsewhere, and this adds even more latency to the rendering of the page. Despite all the bandwidth available in today's network connections, latency is still the killer. A high latency connection to a remote server to retrieve a resource can reduce the perception of speed.

This is why having a lot of scripts being executed in succession can really slow down the page load.

Being smart with your JavaScript placement is critical in optimizing the rendering of a page. A very common trick to make sure that the JavaScript executes after most of the DOM has been constructed is to put the script tags at the very end of the page. This defers the exchange of execution to the JavaScript engine as long as possible.

Another solution is to load an external script with the async keyword attribute. This enforces the rule in that the browser is allowed to bypass the immediate loading of the encountered script by continuing DOM construction and letting the JavaScript engine execute the script when the Dom is finally ready.

JavaScript runs on the same thread as the rest of the pipeline. That means bad JavaScript can choke everything. When writing JavaScript, make sure to be practical and figure out the things you can do asynchronously as to not block the main thread.

If the JavaScript does a lot of computation-type work and does not directly affect the manipulation of the DOM, refactoring the operation to be done asynchronously or deferring execution of it can be helpful. The use of web workers can also offload this computation to another thread. Just be in consideration that web workers do not have access to the DOM.

The Render Tree

Once both the DOM and CSSOM are both created, the browser begins to construct the render tree. This is effectively a combination of the DOM and CSSOM. The render tree is constructed for the next step of the rendering pipeline: the style and layout calculations. As we will see shortly, not every element will be placed into the render tree.

What separates the render tree from the DOM and CSSOM is that the render tree only contains the elements that are explicitly visible by styling rules. Elements are usually displayed by inline, block, inline-block or none.

Although I won’t discuss the specifics on each of the properties mentioned, it is important to know that any HTML element with the following CSS styling rule: display: none; is not considered to be visible by the viewport. Therefore, upon construction of the render tree, the browser does not add the element to the render tree. This also means that any elements within the head element do not get rendered onto the viewport.

This brings a lot of optimization potential into our hands. Suppose we have a large number of div elements that are not immediately seen by the user. Our browser rendering engine will then, in code, go through what would be analogous to a switch statement and consider all the enumerations of the CSS rule: display. If the display rule is set to none, then the browser simply does not add the element into the render tree for our browser to render on screen for when it gets to the painting process. The optimization rule is then as follows:

If we are to operate very heavily on a DOM element, and we do not care about the intermediate states in which the DOM element undertakes, we can set the DOM element to have the CSS styling rule: display:none, then operate on the DOM element, and set it back to the original display rule when we are satisfied with the result of the operation. This way, we do not force a repaint every time a little bit of the DOM element changes.

The performance issue described above is what is normally deemed as layout thrashing. To define for brevity, layout thrashing involves doing a lot of layout calculations in a short period of time which leads to blocking during the rendering process. Some layout thrashing examples can be resizing a bunch of elements at the same time, animating a group of related elements, an event handler triggering change in colors to many elements, etc.

As a side note on elaborating on the above use-case, this is where modern UI Component-based libraries like React come useful. They make use of of a virtual DOM, where modification of a DOM element is done on a data structure representing an abstraction of the DOM first upon state change until all operations have been complete. The virtual DOM element’s attributes then get transferred onto the real corresponding node in the DOM tree. Of course, all of this is oversimplifying, but is the general approach.

Once we have the render tree, we begin to compute the style and layout of the page, as now we know which elements will be displayed on the screen. This step is also known as the "reflow".

The viewport is essentially the entire area which is viewable by the user in the browser window. The browser begins the layout calculation at the top of the render tree and recursively traverses through each node, calculating the appropriate layout values. This happens asynchronously. However with things that are known by the browser to guarantee to affect the entire document such as a window resize, the calculations happen synchronously. This partially explains why we experience a little bit of lag when we try to resize a very large document. In places of your application where smooth animation is critical, absolutely avoid any layout unnecessary calculations. It is expensive the more elements you have on screen.

Despite of all this, the browser does try to stay smart. It will not calculate the entire tree for every small event that happens on the window. instead the browser will mark, or flag each node in the tree as needing a recalculation.

Traversal of the render tree is only necessary to figure out the exact size and position for each element on the page. Once we have figured out the layout attributes for all elements, the browser will begin the paint and composite process. This is where we finally transform all our data into pixels on screen for the user to finally see.

Paint and Composite - Rasterizing Pixels to the Screen

The painting process can be quite complex, necessary, and very expensive. When a layout or style calculation happens, it is necessary for the browser to transform that data into pixels. The paint process is also where we will see the effects of jank, lag and any other UI defects.

Painting of components is done in different layers. And isn't processed in "one shot". This makes it beneficial for browsers to only be concerned with repainting things that are moving without affecting other elements. The advantage that this brings is that the browser will only repaint what is needed, and only operates on regions which have changed on the screen.

During the paint process, there is a specific order in which the browser renders elements. Elements are rendered in a stacked, or layered order. Each layer in this stack is called a context. The order of a typical render is:

  1. background color
  2. background image
  3. container borders
  4. all the contents of the children
  5. the outline of the parent

Figure 13. Stacking Context

After painting, the browser rendering engine will go through the compositing process where these multiple layers are then stitched together in the correct order to display the visual information accurately. If the browser rendering engine interprets something incorrectly, or has miscalculated, then we basically see our HTML elements laid out incorrectly on top over another.

This enables the browser to only re-render a specific layer within the element’s stacking context. If let's say only the background color changes, the browser does not have to render the whole DOM element again. Instead it just renders the context associated with the background color again.

Being aware of the browser's composite process behooves us to be aware of the z-index as it is the dimension involved in compositing. This third dimension exists to accommodate the operation of overlaying an element on top of another if the calculations of two elements fall into a common boundary.

If the web page then has a lot of data that frequently needs to be repainted due to expectation of a lot of scrolling, animating, or flashing, one can try promoting that area into its own layer within the stacking context.

CSS3 will leverage hardware acceleration available on the device. When a CSS3 styling rule requiring an effect or animation to be applied, the browser will offload the rendering to the GPU. GPUs are much more adept at pushing pixels than CPUs. Fortunately for the browser, GPU operations are also cheap compared to the CPU.

For any elements that are found to be repainting quite often, we can also give hints to the browser that the particular element will be in translation quite often. In most modern browsers, the will-change CSS property with the value transform, will hint the browser that the element will be triggered to be repainted often.

will-change: transform;

Figure 14. Using will-change: transform; to effectively promote the content to a new layer.

Another "hacky" solution for browsers that do not yet support will-change is to use the CSS rule translateZ(0); to trick the browser to thinking that a 3D transform will be involved. This will then offload rendering to the GPU.

Though as tempting as it may be to essentially promote everything to be its own layer in the paint and composite process, it is not that simple. Creating too many layers can lead to some disadvantage as it will dump a lot of data into the GPU and can clog the rendering queue. This not only uses a lot of memory, but can lead to performance penalties — essentially going backwards in what engineers typically try to achieve.

Figure 15. Forcing too many layers to the queue.

In summary, every time the browser needs to consume an HTML page to present the user, the browser will need to create a DOM and CSSOM tree, construct the render tree, then paint the elements onto the viewport to present the information to the user. If an element changes, the style and layout is then recalculated, which also forces a paint and composite step onto the browser. As we can see, painting is necessary and makes sense because elements are updated and need to be reflected onto the viewport.

Haptic Feedback is Important

This final section is not part of the browser rendering pipeline, but I have included it in this paper simply because it is important to a great user experience for your web application. It is a short section, but the advice given here has potential to go a long way.

As I had mentioned before, a common way to introduce the illusion of responsive user experience is through the timing of haptic feedback. In mobile devices with touch capability, touch events are delayed 300ms to anticipate a double-tap, or pinch to zoom.

In most cases, a single paged, full screen web application will usually make limited use of pinch to zoom. By looking at the specific requirements of the application, we can eliminate this 300ms delay. The result of removing this delay allows us to achieve real time feedback on any user button press.

The CSS rule touch-manipulation controls this delay. By simply just adding the following rule to the main stylesheet:

touch-action: manipulation;

We effectively remove the delay and allow buttons to be pressed with near instant feedback. For web applications that do not need this 300ms, this can be all the difference in between something that is usable as compared to not.

Input handlers within HTML elements also affect the feedback received by the user. If your input handler changes style on your HTML element, the style change can cause a new layout calculation, and consequently a paint and composite by the browser.

An alternative to handling style changes is to first do computation requested by the input handler and handle any style changes through a callback function.

Finally, avoid any long running event handlers on elements as it can affect the speed in which the browser re-renders an element. A lot of long running event handlers can cause the browser to queue up many elements to be re-rendered. This will cause a lot of skipped frames.

In summary:

  1. Do not make style changes as a handler for input
  2. Handle style changes with a callback function.
  3. Avoid long running handlers as they can affect scrolling


In conclusion, the whole point of this discussion is to remind us in that sometimes a solution to a problem is not the most obvious and could lead to fixation on using the latest and greatest library or technology to solve some sort of problem. Understanding the existing tools and technology is advantageous to push the boundaries of technology. Taking a step back and figuring out what we can do with what we currently have and getting creative can also yield surprising and pleasant results. These types of problems were solved before, right?