The first layer of performance onion is about asset delivery and caching. Each asset, which you do not have to deliver twice, makes your website faster. This blogpost shows, what you can do in this area, without touching the content of your website.
The main secret for this layer is called “Caching”. Caching is used in two areas: Browser and proxy caching. Browser caching is probably well known by most of us. This is, when browsers save assets for multiple usage.
Proxies are services, which cache assets and complete pages and deliver them to the browsers as if they would be delivered from the original server. There are (at least) two kinds of proxies in the world: Forward and reverse proxies.
Forward proxies are installed on the client side of the web. They normally act as a gateway between the “bad” internet and the “good” intranet. Another reason for such a proxy may be to save bandwidth.
Reverse proxies act on the server side of the request. The goal is to keep traffic away from the CMS and so reduce the server load. An open source player in this area is Varnish Cache. The last chapter of this series will dig a little bit deeper into varnish.
Caching is a good thing, but can be dangerous, too. Why? Because there is software (browser and proxies) between your client and his users, which you can not control. It is not possible to tell, the admins and persons behind the software to clear the caches manually.
This is where cache invalidation comes in. You need to use techniques, which tell the browsers and the proxies when to use the asset from the cache and when to request or revalidate the asset (again). This is done by setting HTTP headers in the server response correctly.
The HTTP headers can be grouped into three categories: Allowance headers (1) define whether an asset or page may be cached or not. Cache – Matching (2) headers identify the assets, which are cached and can be delivered to the client. The Freshness (3) headers are there to check, whether the assets are expired or not.
The Cache-Control header field specifies directives, which are honored by all caching mechanisms. Several directives can be set in this header. Important values are public, no-store, no-cache and max-age.
Most of them are self-explanatory. But the “no-cache” directive means, that the asset may be cached, but not be delivered to the client without revalidating the asset at the origin server. The setting of max-age sets the maximum of time when the asset must be revalidated.
The “max-age” parameter is quite similar to the “Expires” header (see below), as both determine the time when an asset must be delivered again. From that what I have read, the Cache-Control-Header always overrules the Expires header.
The MDN provides a good overview over all the possible options: https://developer.mozilla.org/de/docs/Web/HTTP/Headers/Cache-Control
The pragma header is a legacy header and had a similar function as the Cache-Control Header. It should not be used any more, but you can set it to comply to very old caching systems.
The Vary header is there to distinguish between different version for the same asset. Example is whether an asset can be delivered gzip compressed or not. Another one is the value “Accept-Encoding”. If you want to use this header, you need to be sure that it does not increase the cache too much and thus reduce the efficiency.
Etags are another way to identify the uniqueness of an asset. The Etag header is created usually by the webserver. The creation of such an ETag is not trivial and may lead to duplicate cache entries, even it is the same asset. A common scenario for this “risk” is, if the inode on the disk of a file is used as the ETag value. If you have only on filesystem, that’s ok, but if the file systems are behind a load balancer and the files reside on different filesystems, the inodes can be different for the same file. This will lead to two or more cache entries.
Due to the complex setup and testing, you should really consider, whether it is necessary to used it, or whether the other mechanisms are sufficient or not.
This header is a header used to match assets against the ETag header. The cache this header to the server. The answer will be a “304 – Not modified”, if the asset is still there with the Etag. If the Etag does not exist any more, the client will receive the new file.
The Last-Modified header contains a timestamp, when the asset was last modified. It is useful together with the header “If-Modified-Since”. It serves as a comparison value for the caches.
This is the header, which is sent by the client to check, whether an asset was modified since a certain timestamp. If it was modified, the new asset is sent to the client. The timestamp for this comparison is set by the Last-Modified header.
The Expires header is there to stale assets on client side. If the timestamp of an asset is reached, is will be requested from the server again. This setting should not be too far away in the future.
Implementation in TYPO3
TYPO3 comes with a htaccess template file, which has reasonable defaults for the expires header. Due the discussions around the ETags, these are disabled by default at the end of the htaccess file.
Versioned file names
The problem with invalidating a cache is mostly “gone”, if you include a version number in the filename. An example: A simple “basic.css” will be cached, until a expires or cache control header forces a reload of the the file. That’s really bad, if you want to update your css file within the caching period.
TYPO3 is capable to include a version number in filenames of files, which are created within TYPO3. You can activate this in the InstallTool in the section “Frontend”. This works only correctly if you activate a rewrite rule in the htaccess file of your web server. It is included in the TYPO3 htaccess template too.
TYPO3 is able to send cache control headers. You can activate it in TypoScript using the setting
config.sendCacheHeaders = 1 The values are based on the internal cache settings of TYPO3. For the details please have a look at the TypoScript documentation.
Custom HTTP header
If this is not enough and you like to set individual headers, the TypoScript config
config.additionalHeaders is your friend. Within this numerical TypoScript array, you can set any header you want to, including the above mentioned headers.
All popular browsers like Firefox or Chrome come with development tools, which reveal all request and response headers. The screen shot is from Google Chrome.
You find this view in the Chrome Dev Tools. Open them, click on the tab “Network” (1.). If the list of files is empty, you must reload the current page. In a second step, click on a transferred asset. Finally you will find the headers of this asset in the tab “Headers”.
If you prefer specialized desktop tools, you can check out the following:
This was the second post about TYPO3 performance. It covered the caching on the browser and proxy side and how to influence it.
The next post will be about optimizing the frontend code, in oder to shrink the transferred file size.
Overview over the series
Here is an overview over the past and upcoming articles.
- Asset Delivery and Caching (this post) – Layer 1
- Frontend Development – Layer 2
- TYPO3 Integration – Layer 3
- TYPO3 Extensions – Layer 4
- TYPO3 Core – Layer 5
- Services – Layer 6
- Hardware / VM – Layer 7
- What, if all fails?
This post is (partially) based on my talk at the TYPO3 University Days, which a prepared for my company in2code. Thanks for the idea and the time to prepare the talk.
I want to thank my supporters via patreon.com, who make this blog post possible:
If you appreciate my blog and want to support me, you can say “Thank You!”. Find out the possiblities here: