Persisting data in modern Web-Apps

Adrian Stanek
8 min readApr 25, 2022

--

It wasn’t long ago when cookies were the only viable option to store data persistently in the browser. Speaking of cookies, we talk about a limited string of characters representing the storage capabilities for quite a long time.

But technology evolved, and so the browsers have done too. As a result, today’s options to temporarily store data or persist in the browser are far better than some years ago and eventually become useful when it comes to application development.

Modern Web-Apps or PWAs aim to compete with native apps and websites alike; to accomplish that, the Web-App needs to be reliable, performant, and persist even higher amounts of data like text blobs or images. Therefore the new available Filesystem API, adopted by Apple WebKit this year, will be a gamechanger.

The following article will discuss the different approaches to storing data as temporary and persistent for websites and web apps.

Therefore the new available Filesystem API, adopted by Apple WebKit this year, will be a gamechanger.

Cookies — The secure grandfather of storage

Cookies are key-value pairs stored in a maximum of 4KB size string. Cookies are mainly used for three applications:

  • Token storage for authentication or session-identification
  • Tracking-Cookies for monitoring or marketing purposes
  • Simple App-States in a primitive way

Cookies are still viable because of their secure nature. Of course, you cannot store a lot, but you can safely do it in a browser environment and remove access from the JavaScript scope. This security gain can be accomplished by the httpOnly flag, which will remove access from the cookie and only pass it in the request header to the server via a secured HTTPS connection.

Another security feature is the SameSite directive, which shall help prevent cross-site request forgery attacks (CSRF) by restricting the sending of the cookie only to its originated site.

Caching with Service Worker

Caching in the browser is an old technique to reduce requests to the server and, therefore, reduce payload while increasing the overall user experience. Unfortunately, this basic caching has limitations as soon as the device connectivity is gone and the developer doesn’t have enough control over what is to cache.

With the introduction of the service worker came the capability to implement runtime caching strategies for specific routes in your app. It’s possible to cache and persist all network traffic of your Web-App and reuse it with a Stale-While-Revalidate (SWR) approach.

This approach stores all requests in the cache storage, and with the next call of the Web-App, the service worker, which acts as a proxy, will provide stale data and decides when to re-fetch by the current caching strategy. This basic technique provides offline experience and reliability, some fundamentals of progressive web apps.

In Google Workbox available strategies

  • Stale-While-Revalidate — Reliability-First
  • Network-First — Prioritize freshness of data
  • Cache-First — Prioritize availability locally, fallback to network
  • Network-Only — No caching at all
  • Cache-Only — Pre-caching option, ignore network

This approach is not meant to store and retrieve specific data on demand; it is intended to be a passive persistor of network traffic and already received data. A good example is a map app that downloads tiles once and makes them available in the cache for reliable offline usage or news app caching already downloaded for offline use later.

Caching is becoming more critical since mobile applications are limited to their current network, which might be of low bandwidth, poor quality, or just not available when requesting new data. Still, the app shall show no error code or failed state; instead, the user journey must continue smoothly. This behavior is widely standard for native apps and should be adopted by the web by utilizing caching API.

Indexed DB — Asynchronous go-to storage

To store on-demand text-based data in the browser, it’s recommended to use indexedDB, as long as there’s no sensitive data. With IndexedDB, the browser provides key-value pair-based databases with a basic interface setItem, getItem, removeItem. However, it’s important to mention that those actions are asynchronous and have to be utilized in JavaScript. In detail, IndexedDB API is more complicated than that, but there are solid promise wrappers like localForage or IDB available.

Unlike the synchronous cousin LocalStorage, IndexedDB provides multiple databases with individual database versions possible. So it’s possible to structure your data in a “bucket” approach, which is recommended anyway.

Quota-Limitations of storage

With Indexed DB comes the limitations from the local storage of the browser. This limitation can differ from browser to browser and even the OS. The exact quota limitations are a topic of its own and ever-changing. However, it’s safe to say that it got better in the last years, and it’s very different on each OS and browser. Even the installation state (or “Add to Homescreen”) makes a difference in what the app can store. I would strongly recommend you to keep yourself very up-to-date if the quota is mission-critical for your company or customer.

By the time of writing, Android Chrome lets you store around 60–80% of your free disk space (in the private mode, it’s significantly lower). At the same time, Apple Safari raised its limitation from 500MB to 1GB of data, and users will be prompted to raise the limit by 200MB steps with the user’s permission.

Interestingly, the installed version of a WebApp or PWA on Apple Safari receives separate storage and quotas, even if it is the same domain. The installation state affects the persistence of data as well. On Android, the installed version shares the same storage as the browser version.

LocalStorage — Looks better than cookies but should be avoided these days.

The synchronous LocalStorage feature looks great and is simple to use at first glance. But it has its downsides. It’s synchronous, blocking the main thread, and not recommended. This type of storage can work for a single string, but Storing more significant amounts of data is a huge problem and a no-go. In addition to that, the quota is relatively tiny at 5MB.

Security is the second reason, but this point goes alongside IndexedDB and Session Storage. These storages are always accessible through Javascript and not securable through for example a httpsOnly flag. So you should never store sensitive data like JWT in that store with that in mind.

Session Storage — Sibling of LocalStorage with dementia

This storage type works like LocalStorage, but its lifetime is bound to the open tab in the browser. A session can vary from browser to browser; The definition for a session can be browser-wide, with all open tabs for the same domain or a single tab. As well this storage is limited to 5MB of data and therefore it’s not meant to be used for files.

Filesystem — Solid file-based storage, now available on Safari as well

I read a WebKit blog post, “The File System Access API with Origin Private File System” early this year. The post flashed me because Apple was not supporting the Web App idea too much, but after some years of waiting, it finally started to move forward again.

What Filesystem seems to do better than IndexedDB

Storing large blobs up to several megabytes of data frequently was possible with IndexedDB, but it has its downsides. IndexedDB seems to have issues with many transactions in a short amount of time, while the payload to read or write is high. We monitored many quota issues with Sentry.io on Apple devices in one specific app, which we could reproduce with tools like local storage abuser (https://demo.agektmr.com/storage/).

Instead of providing a prompt for the user to extend the current quota, the database crashed; in some cases, restart restored the persisted data after restarting the webview; in others, the data was lost entirely. Those problems occurred predominantly on the older iPhone 11 generation with iOS 15. We mitigated the problem by reducing the “stress” on the databases in the number of requests and payload. Additionally, we substituted packages like “ionic capacitor” for photo capturing to have more control over I/O operations.

To avoid that problem entirely, we started to implement the File System Access API into our Progressive-Web-Apps. This was possible for us since iOS 15.4 or macOS 12.3, where the WebKit team implemented the API in a usable way for production-grade apps.

The database crashed instead of providing a prompt for raising the quota.

The Filesystem storage is persistent for real.

While clearing the website data will delete the other storage options after it is removed, it won’t happen with the Filesystem. By this, persisted files are harder to delete by accident; inexperienced users will be able to keep their data safe, even with improper app handling. (So the theory! :) )

“Golden Data” is a term for business apps with mission-critical data, like photos or JSON data, which were only possible to capture or create in a specific timeframe and a particular place with the app. Thus, the data cannot be restored and must stay safe on the device until the app can upload the data to the servers. We used to accomplish that with IndexedDB, and we had a lot of problems there, especially with the two main caveats I already mentioned. With the Filesystem, the data still exists even if the applications need to be re-installed, as long as the user doesn’t delete it on purpose.

Synchronous and asynchronous Filesystem API

The Filesystem can be accessed via file handles. This can be done asynchronously in the webview or synchronously within a web worker. The latter approach is meant to be more performant but, by the nature of a web worker, more complicated to set up.

Within a worker, access handles are required to get file handles, which was implemented as a layer of security. Unfortunately, the web worker implementation is only available on Safari when writing this article. (Yep, Apple implemented it, and Google doesn’t yet.)

The asynchronous webview approach is easier to access, especially when reading a lot of files to display them as thumbnails, for example. You don’t need to make sure the web worker is running, and you can access the stored data via promises.

Both ways are relevant, and the developer should understand the approaches before working with them in production. Web workers can also be complex when testing (Test-Drive-Development or Test-Last-Development). If you are interested in those things, you should prepare carefully here.

The primary browser developer tools currently lack tools to interact with the Filesystem. In our case, we developed a basic Filesystem-Explorer first, based on our varied experience with debugging remote devices. With this tool, we confirmed that the Filesystem API does work stably.

We will be early adopters.

The File System Access API is usable but not yet fully implemented with every security aspect and detail. I appreciate the current development a lot; in my opinion, it is a huge milestone and gamechanger, especially for PWAs. We will use this API as an early adopter in production on the major platforms.

Since Progressive-Web-Apps are progressive, we currently develop a filesystem wrapper that falls back on IndexedDB if the new API isn’t implemented. This approach feels solid and promising, and I will report back in the future.

--

--

Adrian Stanek

CTO @webbar & raion.io | Blogger | CTO-Newsletter | Advocates web-native technologies to become the leading platform for digital businesses