Building the Dagbanli Dictionary for Offline Use – Diff

Screenshot of the “Optimize for Offline Use?” splash screen with “Download Language Pack” and “Continue Online” buttons.

Many Dagbanli speakers live in areas with unreliable internet. We built the dictionary to work offline. Here is the architecture that makes that possible.

Introduction

What good is a digital dictionary if you need an internet connection to use it? For many Dagbanli speakers in rural communities, mobile data is expensive and signals are weak. A dictionary that only works online would be useless exactly when and where it is needed most.

From day one, building an offline-capable application was a core requirement, not an afterthought. We wanted the dictionary to be just as fast and functional in a village with no cell tower as it is on a university campus in Tamale. This meant designing an architecture where the entire word database lives on the user’s device, ready to be searched instantly, with or without an internet connection.

In our previous post, we covered how we built the audio pipeline. Now, we turn to the system that makes components of the dictionary work offline: from syncing data with Cloudflare R2, to storing it locally with IndexedDB, to turning the website into an installable Progressive Web App (PWA).

1. The Sync Architecture: From the Cloud to the Device

The Source of Truth in R2

All the dictionary’s data, the 11,000+ Lexemes, their Senses, Forms, audio URLs, and images, is compiled into a single JSON file we call dagbanli_master.json. This file is about 2 MB in size. It lives in an R2 bucket on Cloudflare, which is essentially cloud storage designed for fast, global access.

We chose Cloudflare R2 for several reasons. It is S3-compatible, which means our code is portable and not locked into a single provider. Crucially, Cloudflare has Points of Presence (PoPs) in Accra (Ghana), Ouagadougou (Burkina Faso), and Yamoussoukro (Ivory Coast), placing the Dagbanli-speaking regions right at the center of a triangle of low-latency network access. This significantly reduces the distance data must travel for our primary user base. When a user in Satani requests the latest dictionary data, they are not waiting for a round trip to Europe or North America. The connection stays on the continent (in fact, within West Africa), making the sync faster and more reliable than it would be with providers whose nearest servers are in Europe.

Alongside the master data file, we store a tiny file called manifest.json. This file contains only one piece of information: a version string (like v2026-03-15-123456). This version changes every time the data is updated by our automated harvest cron job.

Detecting Updates Without Downloading Everything

When you open the Dagbanli Dictionary, the app first does a quick, lightweight check. It fetches the manifest.json from R2 and compares the version string with the version of the data it already has stored locally.

If the versions match, the app knows its local data is up-to-date and can proceed immediately.
If the versions differ, or if this is your first visit, the app knows a new dataset is available and needs to be downloaded.

This simple manifest system means the app can check for updates with a tiny network request, without having to download the full 2 MB file every single time.

The User’s Choice: Explicit or Background Sync

On a user’s very first visit, we present a simple choice:

“Download Language Pack”: This triggers an immediate download of the full dagbanli_master.json file. The user will see a progress indicator, and once complete, components of the dictionary’s functionality will now work offline.
“Continue Online”: If the user is in a hurry or has low data, they can skip the download. The dictionary will still work by fetching data live (as explained in section 5), and the full dataset will be downloaded silently in the background later, for instance, while the user continues to perform searches. This ensures the app becomes offline-capable without interrupting the initial experience.

On subsequent visits, if the user already has data, the app simply checks the manifest. If a new version is detected, the new dagbanli_master.json is downloaded quietly in the background, ensuring the user always has the latest words without ever having to think about it.

2. IndexedDB via Dexie: The Local Database

Why Not localStorage?

If you are familiar with web development, you might wonder why we didn’t just use localStorage. The reason is simple: space. localStorage is typically limited to about 5 MB per website. Our dictionary data is around 2 MB now, but as the language grows and we add more features, it could easily outgrow this limit. We needed a more robust solution.

Enter IndexedDB

IndexedDB is a low-level API for storing structured data, including files and blobs, directly in the browser. It is an actual database, not just a simple key-value store. It has no hard limit (browser implementations can offer hundreds of MBs), and it allows for complex queries, which is essential for a searchable dictionary.

Dexie Makes It Usable

IndexedDB’s native API is notoriously complex and callback-heavy. To make our lives easier, we use a library called Dexie.js. Dexie provides a simple, promise-based wrapper around IndexedDB, making it feel more like working with a standard database.

Our database schema is defined clearly in the code:

_javascript
const db = new Dexie('DagbanliDictionary');
db.version(1).stores({
    lexemes: 'wikidataId, lemma, lexicalCategory',
    audioIndex: 'wikidataId',
    examplesCache: 'lexemeId'
});

lexemes table: This is our main table. We index it by wikidataId (for looking up a specific word), lemma (for search), and lexicalCategory (for filters).
audioIndex table: This is a lightweight table, simply storing the IDs of Lexemes that have audio, enabling the super-fast “Has Audio” filter we discussed in the last post.
examplesCache table: This stores the pre-built Usage Examples, grouped by Lexeme, for instant display.

When we need to search for a word, we run a query like this:

_javascript
const results = await db.lexemes
    .where('lemma')
    .startsWithIgnoreCase(searchTerm)
    .toArray();

Dexie handles all the complex IndexedDB interactions behind the scenes, returning results in milliseconds.

3. PWA: Install It Like an App

A Progressive Web App (PWA) is a website that can be installed on a device and behave like a native mobile app. It can be launched from the home screen and work offline. This was a crucial goal for us: we wanted the Dagbanli Dictionary to feel like a first-class citizen on a user’s phone without having to build separate iOS and Android native apps for them to install.

The Technical Ingredients

To turn our site into a PWA, we needed two main components:

A Web Manifest (manifest.json): This is a simple JSON file that tells the browser about our app. It defines the app’s name, icons of various sizes, the theme color, and that we want the app to open in a “standalone” mode (hiding the browser’s URL bar). We generate this file automatically using a tool called vite-plugin-pwa.
A Service Worker: This is the real magic behind a PWA. A service worker is a script that runs in the background, separate from the webpage. It can intercept network requests and serve cached responses. For our dictionary, the service worker uses a cache-first strategy. This means that when the app asks for a file (like the app’s own JavaScript or CSS), the service worker checks the cache first. If the file is there, it serves it immediately, making the app load incredibly fast, even on a slow network. Network requests for data, however, are handled differently, as we want to ensure we have the latest version.

The User Experience

On an Android phone, when a user visits dagbanli.info, they get a prompt asking if they want to “Add to Home screen”. If they accept, an icon for the dictionary appears on their home screen. Tapping it opens the dictionary in its own window, just like any other app.

On iOS, the process is slightly different. Apple does not automatically prompt users to install PWAs. Instead, the user has to use the “Share” button and then select “Add to Home Screen”. We help them by detecting their browser and showing further installation instructions in the About page, but we cannot automate the install itself.

Once installed, the app’s shell (the interface) loads instantly from the cache. The data is then fetched from IndexedDB, making the entire experience feel native and fast.

4. Media Caching for Saved Words

Our offline approach does not stop at words and definitions. We also wanted to make sure that a user’s favorite, or “saved”, words are as rich offline as they are online, including their audio pronunciations and any associated images. However, we also have to be practical about storage limits.

The Streaming Default

By default, when you view a word, its audio and images are streamed live from their respective sources (our R2 bucket or Wikimedia Commons). This saves a significant amount of storage space on the user’s device, as most people will only ever look at a fraction of the 11,000 words in detail.

What Happens When You Search Offline

When you are offline and you search for a word, the app can still find it. Because the entire list of Lexemes (the dagbanli_master.json file) is stored in IndexedDB, the search bar will show you results and even display preview definitions as you type. This works for any word, regardless of whether you have saved it.

However, clicking on a word to view its full detail card works differently depending on your history:

Recent Words: The app automatically remembers the last 10 words you viewed. If you click on one of these, you will see its full detail page, including definitions, Senses, and Forms, exactly as if you were online. This gives you a small, functional buffer of your recent browsing history.
Saved Words (Favorites): For a richer, permanent offline experience, users can click the heart icon on a word’s detail page to “save” it. When a word is saved, we proactively download and cache all of its associated media.

Caching on Save

When a user saves a word, we trigger a background process. The app fetches the audio file and any images associated with that word, converts them into binary large objects (blobs), and stores them in a dedicated IndexedDB table called favoriteMedia. This ensures that every time you view that word offline, the audio plays, and the images render exactly as they would online. Currently, a user can save up to 10 words in this way, striking a balance between utility and the storage constraints of a user’s device.

The Saved Words panel shows users exactly how much storage their offline words consume: individual sizes next to each word, a subtotal in the header, and a total cached amount to be cleared at the bottom.

To make this process completely transparent, the UI provides immediate, multi-level feedback about storage usage. In the “Saved Words” section:

A header summary shows the total number of saved words alongside their combined storage footprint (e.g., “Saved Words (3) 114.8KB”).
Individual indicators appear next to each saved word (e.g., “kurugu 106.8KB”, “kuli 8.0KB”), showing exactly how much space each word’s associated media occupies. Words without media (like “takubsi” in the example) show no size, making it clear they consume no offline storage.
A footer summary (e.g., “114.8KB cached”) reinforces the total, reminding users of their overall offline footprint.

This layered transparency empowers users to manage their saved words intentionally. They can see at a glance which words are storage-heavy, clear the entire cache if needed, or remove some items to free up space, all without guesswork.

A dedicated module, media-cache.ts, manages this cache. It ensures that when a user is offline and views one of their saved words, the audio player and image components check this cache first. If a blob exists, it is loaded directly from the user’s device, providing a seamless offline experience. This system means that a student in a village can download the language pack, save a few key words, and have a fully functional, multimedia dictionary for those words, even without a signal.

5. The “Live Data” Fallback

We believe in giving users choices. While the offline-first experience is our primary goal, we also built a fallback system for those who want to use the dictionary immediately without waiting for a download.

The Live Toggle

If a user skips the initial “Download Language Pack”, or if they have not synced the latest data, the app can fall back to a “Live Data” mode. When this mode is active, searches are not run against the local IndexedDB. Instead, they are sent as live SPARQL queries directly to the Wikidata Query Service.

The Green Badge

When the dictionary is in this live mode, a small green badge appears in the interface, clearly stating “Live Data”. This transparency is important. We want users to know that their experience might be slightly slower and dependent on an internet connection, but they can still use the dictionary immediately.

How to Get the Offline Data

Because the “Force Re-sync” option is an admin tool and not available to regular users, the path to a full offline dataset is different. For most users, the first time they search, the app will trigger a silent background sync, downloading the language pack for future use. If a user wants to ensure they have the data immediately, the best way is to accept the initial “Download Language Pack” prompt on their first visit. Alternatively, a hard refresh of the page can also encourage the background sync to run sooner. The live mode is essentially a no-friction on-ramp, ensuring that the very first interaction with the dictionary is instant, while the path to a richer, offline experience is always just a search away.

6. The Gballi Browser: A Power Tool for Discovery

The offline-first architecture we have described enables another key feature of the dictionary: the Gballi browser. Named after the Dagbanli word for “a fence weaved by hard grass”, this is a powerful tool for exploring the language on its own terms.

While search is great for finding a specific word, the Gballi browser is designed for discovery. It presents the full Dagbanli alphabet, with each digraph (like “GB”, “KP”, and “ŊM”) as a clickable button. Tapping a letter brings up a list of all the words that start with that letter.

The real power, however, lies in its filtering system. A user can combine filters to create highly specific lists of words, all of which are possible because the full dataset is available locally. For example, a linguist could filter for only those entries that are nouns, start with “kp”, and have a pronunciation audio attached to a form. A learner could look for all verbs that have a Visual Context image. This allows users to find high-quality entries and study the language in a structured way, all without an internet connection.

Conclusion

Building an offline-capable application is not just a technical exercise. It is a statement of values. It means designing for equity, ensuring that your tool is available to everyone, regardless of their connectivity. For the Dagbanli Dictionary, this meant architecting a system where data is synced from Cloudflare R2, stored locally in IndexedDB, and delivered through an installable PWA. It means that a student in a village in the Kumbuŋ District has the same access to their language’s digital resources as a researcher at the University of Ghana in Legon.

This architecture is now the backbone of our project. In the next post, we will look in depth at the Gballi browser. We will explore how it is designed specifically for the Dagbanli alphabet and its unique digraphs, and how its powerful filtering system makes it a discovery tool unlike any standard A-Z index.