How to deal with NILMTK energy data sets using Pandas

Christoph Klemenjak
4 min readAug 17, 2021

--

Datasets play a crucial role in the development of data-driven algorithms. Being able to access one or more real-world datasets is the deciding factor between success or failure. Before going further into the topic, we have to define NILM:

Non-Intrusive Load Monitoring (NILM) is the process of estimating the energy consumed by individual appliances given just a whole-house power meter reading. In other words, it produces an (estimated) itemised energy bill from just a single, whole-house power meter. — Source: [1]

For good reasons, many researchers have decided to rely on the NILM Toolkit (NILMTK) to evaluate the accuracy of their NILM algorithms (see [1] for details). Along with the toolkit, a custom data format was introduced in 2014 to ease experimentation as well as dealing with energy datasets. Over time, a respectable number of datasets has been released in that specific data format, which builds on the Hierarchical Data Format (HDF). For the sake of completeness, here’s a list of datasets compatible to NILMTK:

AMPds, Caxe, Combed, Dataport, Deddiag, DRED, ECO, GREEND, HES, HIPE, iAWE, RAE, REDD, REFIT, Smart, SynD, and UK-DALE.

Photo by KOBU Agency on Unsplash

While dealing with these data sets works well as long as you use NILMTK, many people have been wondering how NILMTK datasets can be used in conjunction with more common Python libraries such as Pandas. This is exactly what this post is about. In this post, we will learn how to extract metadata as well as time series of NILMTK data sets by using Pandas. Therefore, we don’t have to go through installing NILMTK. As energy data set, we will use the popular AMPds data set (see [2] for details).

Metadata

Besides time series, NILMTK data sets store metadata following the NILM metadata scheme (see [4]). In order to identify the time series of interest, it is of advantage to explore the metadata of a data set first. This can be done using a HDFStore object. To explore the energy meters of AMPds, we open the data set and print all available keys :

The keys of AMPds.h5

By printing all keys of the data set, we get a good overview. Clearly, these keys provide limited insights on the present electrical appliances (fridges, dishwashers, etc.). To access the metadata of building 1, we run:

Metadata of building 1: appliances as well as meter keys

This code snippet gives us information on the appliances in building 1 as well as the corresponding meter keys. For instance: we can see that the heat pump got the meter id 14. Now that we know the meter id of the heat pump, we will try to extract the corresponding data frame in the next step (i.e. all data related to it).

Loading Dataframes

At this point, we have successfully opened the data set and know what appliances are present. We also know the meter id of the heat pump and are ready to load data. We learned above that the meter key is 14, so we apply this knowledge to load the data frame using this code:

This gives us all data related to the heat pump in form of a single Pandas DataFrame object.

By looking at the columns of the data frame, we get an overview of the available physical quantities: voltage, current, power, energy, etc. That means there are several time series available. So clearly, the creators of the data set did a good job.

More good news: from here on, it’s business as usual in Pandas. To access the power readings, we run:

That’s all you need to know in order to extract metadata as well as time series from NILMTK data sets without actually using NILMTK.

We are done for today. How about a coffee?

--

--