Definition: Digital Decarbonisation – revolves around the responsible and efficient utilization of knowledge and data within organizations. It emphasizes the adoption of digital best practices in sustainability strategies, with the ultimate goal of minimizing data-related carbon emissions.
This approach involves optimizing how data is generated, processed, and stored, ensuring that it aligns with sustainable practices. By promoting digital best practices, organizations can reduce the carbon footprint associated with data management and contribute to a more eco-friendly digital ecosystem.
Accounting for Data Emissions
We have compiled a set of numbers that are currently utilized to assess different facets of digital decarbonisation. It is important to note that these numbers are subject to change as ongoing global research endeavors strive for increased precision in measuring various aspects of digital decarbonisation. As the understanding of this field evolves, we anticipate more refined metrics and measurements to emerge.
Location of data set – Carbon Costs
Import Data Set
Cloud Server Formula: (Size of data set in Gb x Power consumption kWh per hour) * time period of storage 24 hours * number of days the data set will be stored * 0.23314 CO2 (1kHw to carbon output) = the carbon emissions data transfer and storage cost in CO2e
Store at Source
If the dataset is to be stored at the host i.e., the data analyst is not duplicating the data set and bringing it into their own cloud or on-premise set-up, there will be a minimum carbon cost related to the transfer of data for processing and the carbon cost of storing the data is already with the host. For the purpose of this calculation, the associated carbon cost of the data transfer is zero.
Data Velocity and Data Storage – Carbon Costs
Velocity
This section is about the cadence of the data sets to be used and the headings of Static, Batch, Near-real Time, and Real-Time refer to the frequency the data is required or should be processed. ‘Rate of size increase’ is calculated by determining the difference between the size of the last imported data set and the current version of the dataset to be imported. Note, that Static has no further carbon costs. For example, a data set might have real-time updates, but are these required, or could the dataset be updated every x number of months when a user report is required, for example. By updating imported data sets only when a report is required would have a substantial carbon saving if the data is stored in the cloud or on-premises, relative to the use of real-time updates that are not needed.
Cloud Server Formula: New data set size (Gb) – Last imported data set size (Gb) / days since last import (days) = Rate of size increase per day
The data carbon emissions in CO2e = (Rate of size increase/24 hrs) x Power consumption kWh per hour * time period of storage 24 hours * number of days the data set will be stored * 0.23314 CO2 (1kWh to carbon output)
Storage
Until now the assumption is the data set will be stored on the cloud. There are three options for storage in total:
- Store at Host: Zero carbon cost
- Cloud: There are a number of cloud options e.g. Data Lake, Data Warehouse and Data Centre. Currently it is only possible to calculate carbon emissions for a Data Centre, but not for Data Lake and Data Warehouse storage options (these would need investigation in a future project).
- On-premises: If the data set is to be stored on-premises this will impact the initial CO2 figure given thus far on the data carbon ladder and a new calculation is needed. The calculation for on-premises CO2e has been formulated by taking the difference of a cloud server using non-green electricity: 487 kg CO2e / year and server from on premise or data centre-server using non-green electricity: 975 kg CO2e / year and server (GoClimate, 2022). The difference between the two double that of a cloud server.
- Initial Data Set:
- On-premise Formula: (Size of data set in Gb x Power consumption kWh per hour) * time period of storage 24 hours * number of days the data set will be stored 0.23314 CO2 (1kWh to carbon output) = the carbon emissions data transfer and storage cost in CO2e.
- Rate of Change:
- Formula: New data set size (Gb) – Last imported data set size (Gb) / days since last import (days) = Rate of size increase per day.
- The data carbon emissions in CO2e = (Rate of size increase/24 hrs) x Power consumption kWh per hour * time period of storage 24 hours * number of days the data set will be stored * 0.23314 CO2 (1kWh to carbon output).
Data Analytics – Carbon Costs
Determining CPU for the various analytics methods alongside the data to process is not an exact science due to the varying factors. These calculations will be approximated, based on the size of the data to be processed and the level of AI associated with the following techniques e.g., training data, etc. The figures for the CPU can be replaced depending on the CPU to be used.
All of the data analytic approaches require CPU time. CPU power consumption constitutes around 30% of the total energy of cloud server (Lin et al., 2021). One hour of CPU processing equates to the power consumption of 0.0045 kWh.
Descriptive: This type of analytics is when an assessment of data, often historical, is used to answer the fundamental question “what happened?”. It looks at the events of the past and tries to identify specific patterns within the data. A model which focuses on generating insights through aggregating and analysing data. As this is very basic reporting, the CPU is likely to be used less than an hour for each report generated.
CO2e for single report run = (Use Figure 3 to determine time in hours given size of data set) * 0.0045 kWh (CPU energy consumption) * 0.23314 CO2 (1kWh to carbon output).
Predictive: once an organization can effectively understand what occurred and why it happened, they can move up to the next tier in analytics, Predictive. Predictive Analytics is looks to use data and information to answer the question “What is likely to happen?”. A model which can be used to assist with real time monitoring and prediction. Both the following need to be combined to determine the CO2e for predictive analytics:
CO2e to train the system = (number of hours to train the system) * 0.0045 kWh (CPU energy consumption) * 0.23314 CO2 (1kWh to carbon output).
CO2e used to run the system to gain result = (Use Figure 3 to determine time (hrs) given size of data set) * 0.0045 kWh (CPU energy consumption) * 0.23314 CO2 (1kWh to carbon output) * number of times the system will be run.
Prescriptive: is a method of analytics that analyses data to answer the question “What should be done?”. This type of analytics is characterised by techniques such as graph analysis, simulation, complex event processing, neural networks, recommendation engines, heuristics, and machine learning. This will be a model which utilises prescriptive analytics to propose interventions. Both the following need to be combined to determine the CO2e for prescriptive analytics:
CO2e to develop the model = (number of hours to train the system) * 0.0045 kWh (CPU energy consumption) * 0.23314 CO2 (1kWh to carbon output).
CO2e used to run the system = (Use Figure 3 to determine time (hrs) given size of data set) * 0.0045 kWh (CPU energy consumption) * 0.23314 CO2 (1kWh to carbon output) * number of times the system will be run.
Cognitive: applies human-like intelligence to certain tasks, and brings together several intelligent technologies, including semantics, artificial intelligence algorithms, deep learning and machine learning. It’s a model that will autonomously take actions and interventions. This type of analytics is assumed to run 24/7 to process the real-time data to make real-time decisions.
CO2e for 24/7 Cognitive analytics = 0.0045 kWh (CPU energy consumption) * 24 (hours) * number of days the system will be running* 0.23314 CO2 (1kWh to carbon output).
(Source: calculations are available open access at https://doi.org/10.1080/14778238.2023.2192580)
Data Calculator
The simple calculator has been developed to show the worse case scenario of carbon emissions from data. To view just dark data carbon emissions, click on the “Dark Data Cost” tab.
Our calculator has been developed using prior research and is grounded on two key factors: the rate at which data is generated every second, and the amount of CO2 emissions that are produced by data.
How much data do we create?
There’s no way around it: big data just keeps getting bigger. The numbers are staggering, but they’re not slowing down. It’s estimated that for every person on earth, 1.7 MB of data is created every second. Sources – IBM https://www.ibm.com/blog/netezza-and-ibm-cloud-pak-a-knockout-combo-for-tough-data/ and https://www.domo.com/assets/downloads/18_domo_data-never-sleeps-6+verticals.pdf
How much CO2 does data produce?
Saving and storing 100 gigabytes of data in the cloud per year would result in a carbon footprint of about 0.2 tons of CO2, based on the usual U.S. electric mix. Source: https://medium.com/stanford-magazine/carbon-and-the-cloud-d6f481b79dfe
How much does it cost to offset the CO2?
FT puts the nature-based carbon offset cost at $14.40 per tonne of carbon. Source: https://www.ft.com/content/29565f44-ba71-4a44-8e84-d1e421ddb958
We’ve converted ton to kg to get the $ cost. 1 ton = 907.185kg. Therefore 1 ton of CO2 costs $12.83 and to offset 100 gigabytes with a carbon footprint of 0.2 tons, costs $2.566 per year.
Calculations
• 1.7MB per second x 60 seconds = 102MB per minute
• 102MB per minute per person x 60 minutes = 6120MB generated per hour per person
• 6120MB x 7.5 hours in a working day = 45900MB per day
• 45900MB per day x 5 days (1 working week) = 229500MB per person, per working week
• 229500MB x 48 working weeks a year = 11016000MB for one employee for the year
• 100000MB or 100GB of data per year = 0.2 tons of CO2
• 1MB = 0.000002 tons of C02
• Annually 1 employee generates 11016000 MB * 0.000002 C02 per MB = 22.03 tons of CO2
Ronaldo – Instagram
A study conducted as part of an investigation by Channel 4’s “Dispatches” current affairs program determined that a single image released online by Ronaldo is viewed hundreds of millions of times, and the impact of it can be seen on the national grid (source). We extrapolated this based on 45k posts per minute around the globe (source). The number of homes per city was taken from Unites States Census Bureau and Statistica.
DVD
One person creates 1.7mb of data a second (source IBM). That equates to 10 full DVDs in a working day, given a DVD can hold 4.6gb of data.