Data Weight is a Bubble term we introduced in our book The Ultimate Guide to Bubble Performance in the chapter about Database Structuring, as a way to visualize how the setup of a Data Type can affect your app’s performance
In essence, any Data Type that contains a larger volume of data will take Bubble’s server longer to search through and download, thus degrading your app’s performance. We call these Data Types heavy, i.e. they have a high data weight.
The term itself makes intuitive sense – we’re used to big files taking longer to download and long documents taking longer to search through after all, and a database is no different. But what exactly is it that adds to a Data Type’s weight? To understand that, we’ll first need to talk about two other common database terms: structured and unstructured data.
What is structured data?
Structured data is any kind of data that follows a set structure. We’ll use the Contacts app on your phone as an example: let’s say each contact has a field for first name, last name, phone number and birthday. Each of these fields are structured, in the sense that they follow a common pattern:
- First/last name: Short text
- Phone number: Short text
- Birthday: Datetime (saved as a 10-digit number in Unix Time)
These are fields that you could easily include and format correctly in a spreadsheet.
Structured data is lightweight. With the current fields we have set up, one record in this data type wouldn’t contain more than about 17 bytes of data.
First name: John (4 bytes)
Last name: Doe (3 bytes)
Birthday: 1/1/2000 00:00 (saved as 1633028845 or 10 bytes)
If we were to download a list of 1,000 contacts, its total size would be a mere 17,000 bytes or 17 kb – smaller than a single average image. Bubble adds a bit extra information to your data type for searching and other purposes, so this isn’t an exact size, but it’ll do for this example.
So what happens if you add something like an actual 17kb image to the mix, like a portrait photo. Does that double the size? Well, keep in mind that the image file is not actually stored in the database – only the URL. So even adding a huge JPG would still only add a few bytes extra information to the database (though the file download can of course still add to your page’s loading time, but it’s unrelated to the database)
It’s when we add unstructured data that the size of the database records starts to grow.
What is unstructured data?
Unstructured data, as you may have guessed, is any kind of data that does not follow any predetermined structure. The article you are currently reading is a typical example of this. This website uses WordPress, but the database works in a similar way – the article is saved as HTML code in a field on a Data Type just like it would in Bubble.
Now, an article is a pretty lightweight piece of data in itself – it’s just text. Compared to images, audio and video, it’s not much, and if we were talking about a file, we wouldn’t give it much thought. But when we’re talking about databases, we’re moving from singular to plural. The questions is not how much data one article contains, but how all articles in a list take up. Let’s look at another example to illustrate. This time, we’re not setting up a phone catalogue, but a blog. We have a data type called Articles, which contains the following fields:
Header (short text)
Content (unstructured text)
Note that we’re saying unstructured, not long, since the article can of course be short. The point is, while an email address or a first name will always be short, an article can be a single word or the length of a book – we don’t know. In this way, unstructured data adds two challenges: long passages of text can take Bubble’s servers longer to search through, and the total size of each record can grow as the number of lengthy articles increases.
Let’s say we have an article of 5,000 characters (remember, this includes spaces). This is about the length of an average newspaper article.
Header: 20 bytes
Article: 5 kilobytes
Now, this may not seem much, but let’s again multiply this by a thousand articles, and you’ll have a total download size of more than 5 megabytes.
In most cases you will not download all articles of course – the whole purpose of using search constraints is to download only what you need – but it serves as an illustration that small amounts of data can quickly grow when multiplied. by the number of records. Not everyone is aware that using a plugin like Zeroqode’s Fuzzy Search for example, will actually download all records in your initial search, and then perform the fuzzy search client-side – unknowingly, you may be downloading a lot of data on page load.
Data Weight and structuring
Going back to our original term Data Weight, we can say that our first example has a low data weight, and the second has a higher one, helping us to determine how we should set up our structure in an efficient way. A high data weight is not a bad thing on its own – saving unstructured data is of course a necessity for all sorts of purposes like blogs, product descriptions, news articles and cake recipes. It’s only a challenge once it starts affecting performance, in which case we can choose to set up our database structure in a different way. One method is the use of Satellite Data Types to speed up searches and minimize download speed.
In the book, we go over a step-by-step process on how to plan and structure your database for optimal performance, and use the Data Weight concept and Satellite Data Types to help your app run smoothly.
Data weight and workload
When Bubble introduced workload, it added a new dimension to how we think about data. The concept of “data weight” provides a useful framework for understanding how downloading data impacts pricing. Looking at the workload activity table in the Bubble documentation, we can see there’s an activity that calculates the total amount of data returned by the database:
Each character of data returned from the database: 0.000003
The logic is fairly simple to understand: Bubble returns data from the database, and each byte of data has a cost of 0.000003 workload units. In other words, if you download 50 KB of data, the cost would be approximately 0.1536 workload units.
Still, there are a few things we should clear up to understand this correctly:
- Data returned from the database does not equal data sent to the user’s device: It means exactly what is says: the database returns data from a query, and it doesn’t make a difference whether it all happens server-side, or the data is sent to the client.
- In other words, you may find that returning data has a cost that you can’t see by counting bytes in DevTools, for example, since DevTools only reports what’s happening on the device.
- In other words, you may find that returning data has a cost that you can’t see by counting bytes in DevTools, for example, since DevTools only reports what’s happening on the device.
- Files, such as images, are not technically stored in the database: only their URL. As such, the length of the URL determines the cost of returning data from the database. In other words, a 1 megabyte image doesn’t cost 3.145728 workload units.
So from that, we can draw that we really are talking about text and numbers that the database returns from a query. While you see it as a list of things in Bubble, it’s all returned as a chunk of text in JSON format with a set number of bytes.
Data structure and workload
Considering the sections above, the logic is straightforward: higher data weight can translate to increased workload costs, as the database returns more data. Consequently, reducing data weight—such as by leveraging satellite data types—can help lower workload costs.
But let’s take a moment to reflect.
A common concern I have when discussing topics like this is that people may implement workarounds by default, taking the advice at face value without fully assessing whether those changes are actually necessary or beneficial for their specific use case.
Over-engineering is a very real thing, and workload is not your only concern: you have an app to build, and your goal is to get it to the market with live users. So before overhauling and restructuring your database, consider the following points:
- Workload is not that expensive: while workload introduced a cost calculation based on data size, the cost per byte is relatively small. Adding complexity to your app or spending additional development time to optimize for minimal workload may result in higher indirect costs than the workload itself.
- Consider how much data you are really returning: the data weight of a single record or a small set of records is rarely significant in terms of workload cost. It’s when you query and return thousands of records or more that the data weight starts to add up, as illustrated in the earlier examples. Simply storing large amounts of data in your database does not directly increase workload costs—it’s only when you perform searches or download large numbers of records that the size becomes a factor.
For instance, a plugin like fuzzy search can meaningfully impact workload costs because it often involves downloading an entire list of records. If these records contain fields with high data weight, such as blog posts or articles, you may end up transmitting a large amount of data. In scenarios like these, using satellite data types can significantly reduce the weight of each query. - A single/a few records with a high data weight does not warrant a database restructure. I may be repeating myself by this point, but it is sometimes necessary to drive the point home: a high data weight that’s downloaded as single or short lists of records should not lead to adding additional data types in an effort to reduce workload cost.
All that being said, if you are confident that you are downloading large amounts of data with a high frequency (such as on every page load), then restructuring your database by use of satellite data types can drastically reduce the data returned and in turn bring costs down.