Our journey navigating the technosphere

Author

### Katie Russell

Data Platforms, Data Science and Analytics @ OVO Energy

# Background

Hello from Marie and Katie. We’re Data Scientists at OVO. We’ve just been doing some work on energy data forecasting - to make sure that when we tell you how much you’ll probably spend next year, we’re getting it as good as we can. There is already an industry standard for this - we won’t bore you with the details - but it doesn’t work all the time. Anyway, as part of this analysis we were trying out a few techniques to analyse energy data and we thought this particular one was pretty interesting.

Heads up: this isn’t a post about deep learning. Sorry :D We actually do a lot of traditional modelling too.

# Signal decomposition

So, the idea with signal decomposition is to look for patterns and break up the signal (in this case, the record of the energy consumed over time) into parts. Some patterns repeat, some patterns are overall trends, others just seem to be random. We used it because we wanted to understand the patterns behind energy consumption in more detail. We wondered for example if there was a strong weekly pattern to some energy usage.

Before we start, it's important to note, all this work is on anonymised smart meter data. This is part of our commitment to doing what's right for customers.

# Getting familiar with the energy data

Here’s an example signal. It’s the daily energy (electric) usage for one random home from January 2015 to December 2016, and some things we noticed written below:

• More energy is consumed in Winter than in Summer - some seasonality.
• There are some periods of lower energy usage.
• There doesn't appear to a regular weekly pattern of use, it's quite variable.
• Christmas Day had higher consumption than normal :)

Here’s another home for the same period and some observations:

• There doesn't appear to be a strong annual trend
• For some lengths of time, there is one day when a lot more energy is used than other days (we checked and these were every Saturday). This looks like a strong weekly pattern.
• The daily energy usage is generally lower
• Christmas Eve and Boxing Day had higher consumption than normal

# Testing out the signal decomposition

Lets explore the underlying patterns in the energy usage data mathematically, with signal decomposition. We can test our assumption that there is a strong weekly pattern in some homes and set a periodicity of 7 days. Here’s the first level decomposition, now back on the first home:

The decomposition shows overall trends, the 7 day periodic pattern, and noise - a kind of residual. Compared to other kinds of regular signal, the 7 day pattern is fairly weak (as evidenced by smaller numeric values) compared to the residual, showing that the variability of the home is stronger than any weekly pattern. In the trend section, the seasonality and the extended periods of lower energy usage have been extracted too. The Christmas Day spike shows up as a spike in the residual.

Lets look at the second example home for comparison, to see how different it is and compare to our observations.

Here the weekly pattern, with a single day of usage which is typically higher than other days, has been reflected in the periodic signal. And there was so much extra usage around the festive period that it manifested as both a trend and as residual noise!

We think it's pretty cool, that all of this can be extracted, with just a very simple bit of analysis.

Perhaps just as a bit of further research, we're curious to extend this analysis a bit more and find out whether there's any further patterns in the kind of 'residual' that each home has. We find the residual data particularly interesting - because it truly defines the differences between homes, when you take the standard behaviour patterns and seasons away.

# If you like, you can try it out

We used the function seasonal_decompose from statsmodel. Note: the output 'seasonal' is what we have called 'period', because we didn't want to confuse between seasonality in the trend, and the so-called seasonality extracted by setting a frequency to search for.

This has been prepared using an additive decomposition. We tested, and this performed better than multiplicative decomposition - which makes sense, home energy consumption will build up as a sum rather than multiplication ... turning on more appliances doesn't amplify the usage of others.

If you want to try it for yourself, then you can, by downloading some publically available data from the Low Carbon London project, which as it happens was a trial designed by our colleague at OVO, James Schofield.

This data is half hourly, so it needs to be 'rolled up' to daily to be able to repeat the exact same analysis. Or perhaps you'd be interested to investigate on half hourly data, too.

Here some code to do it with this data, that we tested with Python 3 in an ipython notebook. There are a few other dependencies as you can see below.

## Set up libraries

``````# Loading libraries
import plotly
import plotly.graph_objs as go
import datetime
from statsmodels.tsa.seasonal import seasonal_decompose
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
``````

## Load data and set up

Grab the single sample csv https://files.datapress.com/london/dataset/smartmeter-energy-use-data-in-london-households/UKPN-LCL-smartmeter-sample.csv and note down where you saved it. (There is a lot more data available, but for the purposes of this exercise, one home for a few months, is enough).

``````# Set the directory (where you downloaded the data)
os.chdir('/path/to/data/')
# Change the style of graphics
plt.style.use('ggplot')
``````

## Format and clean up the data

A necessary evil. Data Scientists spend a large proportion of their time on this.
James did tell us later there is a cleaner data set from Low Carbon London, here.

``````# Loading and preparing (check what the csv saved as - ours was indeed csv.csv)
df = df.drop_duplicates()
df.columns = ['stdorToU', 'datetime', 'consumption', 'Acorn', 'Acorn_grouped']
df = df[['datetime', 'consumption']]

# Extract the date
df['date'] = df['datetime'].apply(lambda x: x[:10])

# NAs data treatment
df[df.consumption=='Null']=np.NaN
df['consumption']=df['consumption'].astype(float)
df = df.fillna(df.mean())

# Data aggregation by day (only if you want to do it daily and not half-hourly)
df = pd.DataFrame(df.groupby('date')['consumption'].sum())
df.reset_index(inplace=True)
df.date = pd.to_datetime(df.date, format='%d/%m/%Y')
df = df.sort_values('date', ascending=True)
``````

## Do a quick eyeball

We love a good 'eyeball' of the data! Checking if the values are in the right ballpark, whether we missed anything like sorting, looking at how gappy it is ...

``````# Plot the daily consumption
fig = plt.figure(figsize=(14,6))
plotly.offline.init_notebook_mode() # run at the start of every notebook
plotly.offline.iplot({
"data": [{
"x": df.date,
"y": df.consumption
}],
"layout": {
"title": "Daily consumption"
}
})
``````

## Lets do science!

Finally we can apply this technique we are interested in!

``````df.index=df.date
del df['date']
to_decompose = df.consumption

# With a period of 7 days - additive method

df['resid'] = result.resid
df['period'] = result.seasonal
df['trend'] = result.trend

# To plot the decomposition
result.plot()
plt.show()
``````

# Authors

Marie Peju and Katie Russell - we thank you and good luck!
Special thanks to Tali Ziskind for test driving our code.

Author

### Katie Russell

Data Platforms, Data Science and Analytics @ OVO Energy