It’s time to look at data production vs. storage capability
You’re likely to have heard the rumblings: there will be exponentially more data created, in the very near future. But, how much? How much is too much, i.e. can we store it all? Do we need to? And how does this affect investors?
The state of data production
Internally, Holon has done research to estimate the volume of data that autonomous electric vehicles (EVs) will create over the next three decades as car owners replace ICE (internal combustion engines) vehicles. Holon estimates the global EV fleet will accelerate from 12 million vehicles in 20211 to 569 million in 2035 and eventually 970 million in 2040. With adoption of Level 4/5 autonomy (Figure 1, below) likely to become standard across EV manufacturers by 2030, we estimate that passenger EVs will generate 11,900 Zettabyte (ZB) and 26,200 ZB of data respectively in 2035 and 2040.
Figure 1: Automation levels of autonomous cars
Source: SAE International
We also need to consider data production from autonomous EV taxi and ride-hailing fleets (e.g., Uber), which Holon believes will reach a combined total of 78 million and 90 million vehicles respectively in 2035 and 2040. With an average weekly usage of 30-40 hours per week verses just 5 for passenger vehicles, we believe both vehicle types will generate 8,000 ZB of data in 2035 and 11,900 ZB’s in 2040. This raises Holon’s autonomous EV data generation estimates for both passenger and hire vehicles to 19,900 ZB in 2035 and 38,100 ZB in 2040.
We have also estimated data production from IoT (Internet of things) devices/sensors, which are expected to be the second largest data generation source over the next decade. Softbank owned ARM Holdings, a UK based semiconductor chip designer, estimates IoT devices will surpass 1 trillion in 20352.
Using our more conservative estimate of 208 billion devices in 2035 and a device data production estimate of 50 KiB (kilobytes) every 30 seconds, Holon believes that IoT devices will produce 10,900 ZB of data. By 2040, our IoT device estimate of 622 billion units increases our data production estimate to 32,700 ZB.
The sum total of our EV and IoT data estimates is shown in Figure 2, below. This estimate excludes data production from 19 other technology platforms3 emerging this decade, including virtual & augmented reality, 5G, blockchain, robotics, machine learning and artificial intelligence. We believe those technologies could add an additional 50% or more data on top of these estimates, further illustrating Holon’s firm view that the current IDC estimates are massively underestimating global data production by a factor of 15-25x on their 2035 estimate of 2,140 ZB.
Figure 2: Holon estimate of total data created by autonomous EVs and IoT (ZB)
Source: Holon
The state of data storage capacity
Where we store our data is undergoing a fundamental change over the next decade as well. In 2021, approximately 5 ZB or 65% of global data storage was found in billions of end-user devices4 like laptops and mobile phones, while just 3 ZB (35%) is found in dedicated data centres. As the source of data production shifts over the next 5 years from humans to machines, data storage will shift significantly towards data centres.
Filecoin, a decentralised data storage project developed by Protocol Labs, has a 2040 data storage target of 1,000 ZB5, an increase of 330x over 2021’s global data centre capacity. For this to be achieved, annual data storage capacity must grow at a compound rate of 36.2% over the next 19 years, as seen in Figure 3 below.
Figure 3: Getting to 1.000 ZB data storage capacity by 2040
Source: Holon
Using these Filecoin network estimates, we can further estimate storage capability, i.e. the percentage of annual data production that can be stored by new capacity additions each year. Using Holon’s data estimates, global capability is set to dramatically fall from 0.6% in 2020 to just 0.1% in 2025, as seen in Figure 4 below. This occurs principally due to data production growth from EVs and IoT devices being much higher than storage capacity growth. Storage capability improves over the period from 2030 to 2040, reaching 0.2% in 2035 and 0.4% in 2040.
Figure 4: Annual data storage capability, 2020 – 2040
Source: Holon
This raises a critical question, how much data do we actually need to store?
The arrival of 19 additional new technology platforms6 over the next 2 decades, will all require substantial data storage capacity to manage massive datasets that are required to ‘train’ and develop their initial processes and/or skills. Think of an autonomous car that is simply required to drive in a circle in an empty car park, constantly checking its progress against stored datasets to improve outcomes.
Once this initial step is refined, increasingly more complex datasets are required to further refine processes towards a complete working model. In the EV example, this would be analogous to improvements in driving capabilities towards unassisted driving in urban environments.
Very little research is publicly available that adequately addresses the key question on whether sufficient global data storage capacity will be available to meet Holon’s demand expectations over the next few decades. Without this storage capability, a continuation of improvements in productivity and quality of life achieved through advancements in technology could be substantially inhibited.
What does it mean for investors?
Investors should consider building exposure to companies providing infrastructure to build the global data storage network. While building global capacity towards 1,000 ZB by 2040 looks unachievable at first glance (330x larger than 2021’s capacity), anything less risks causing a dramatic slowdown in technological innovation advancements over the next two decades. Without data storage, innovation simply cannot continue.
Few sectors offer as attractive a long-term growth profile as that offered by data storage, which Holon believes needs to maintain 35% annual growth for the next 20 years. Storage equipment providers (hard disc and solid-state drive manufacturers) offer the biggest opportunity.
Storage hardware needs to be replaced by storage providers every five years to ensure data is not lost. Using a 20% (straight-line) depreciation rate on this storage hardware, reaching 1000 ZB of global capacity in 2040 requires annual growth above 70% for the next 20 years. This would allow just 0.1% and 0.4% of annual data generation to be saved.
1 https://www.statista.com/statistics/970958/worldwide-number-of-electric-vehicles/
2 https://www.itu.int/hub/2020/04/arm-predicts-1-trillion-iot-devices-by-2035-with-new-end-to-end-platform/
3 19 new technology platforms include Blockchain and distributed ledgers, Artificial Intelligence, Virtual Reality, Augmented Reality, Machine Learning, Cloud and edge computing , Digital Twin technology, Natural language processing, Voice interfaces and chatbots, Computer vision / facial recognition, Robotics, 5G & 6D networks, Genomics and Gene Editing , Digital platforms, Drones / UAVs, Cybersecurity, Quantum Computing, Robotic Process Automation (RPA), 3D Printing / additive manufacturing, Nanotechnology and materials science
4 https://firstsiteguide.com/big-data-stats/#:~:text=1.,worldwide%20would%20reach%2079%20zettabytes.
5 https://spec.filecoin.io/#section-systems.filecoin_token.block_reward_minting.baseline-minting
6 https://www.itu.int/hub/2020/04/arm-predicts-1-trillion-iot-devices-by-2035-with-new-end-to-end-platform/