AMI meter data has great value. But it can also be convoluted and difficult to work with. For analytics applications, it is important to have a clean data set to work with. As our ambitions for new and innovative advanced AMI analytics grow, our ability to adequately foresee and deal with inevitable data issues becomes ever more crucial. Many projects analytics project fail not because of an error in the analysis technique - but because the data was not accurate enough the support the methods. As the IT saying goes, "Garbage in, garbage out." We thought we'd share with you three of the most common AMI meter data issues we encounter for our analytics projects and what we do to solve them.
1. Time zones
One common mistake made with AMI readings data is to mislabel time stamps with the incorrect time zone. The easiest way to avoid such problems is to validate your data before performing analysis. One way to do this is to chart the aggregate load from a subset of meters. Is the set peaking at the expected time? If not, there may be an issue that you need to deal with before you continue with your analysis. It could be that your data is being stored in GMT and needs to be transformed, or that is already being stored in local time and you could be "double" transforming.
One client we worked with was operating a load control program for electric water heaters. The load control software they has purchased was sending "switch-off" messages to the AMI meters on an automated schedule. Unfortunately, the message were being sent at 3pm GMT instead of 3pm Eastern! This meant that the heaters were switching back on during peak at exactly the wrong time. Nothing in the load control system would have indicated this as an issue. It was only when we analyzed the AMI meter data to measure the effectiveness of the program that we noticed the issue and were able to offer advice on remedial steps.
Another complication for analysis projects is how to correctly interpret AMI meter reading time stamps for your analysis needs. Reading timestamps can be considered to be "hour ending", such that they capture the total amount of energy through the meter since the previous reading. So a reading labelled "01/01/2019 15:00" will typically cover the period from 2 pm to 3 pm. This is important to know when querying and summarizing your data. Failure to carefully consider this simple data structure can lead to incorrect query results and invalid analyses.
For example, if you are trying to determine how much energy is being consumed between your peak hours of 2pm - 7pm, then your query only need to consider the hours labelled 3pm, 4pm, 5pm, 6pm and 7pm. Including the hour labelled 2pm would introduce an extra hour of energy consumption (1pm to 2pm).
3. Data gaps
AMI systems are quite good at recording sending and storing meter readings. However, in any large system there will always be missing data points in the data, particularly at higher frequency intervals (e.g. 5-minute and 15-minute). Regardless of whether your meters report readings or consumption, data gaps can plague analytics efforts.
Typically when a particular time interval is missed the energy consumption is rolled up and reported in the subsequent time slot. The total amount of energy, therefore, is preserved but the precise breakdown of when that energy was consumed is lost. There are three primary ways of dealing with this issue, depending on your analysis needs:
- Ignore the gaps: if you are only interested in long term energy consumption trends and totals then it is perfectly acceptable ignore the gaps as they would not change your outcomes.
- Simple interpolation: the simplest way to fill the data gaps and create a complete data set is to use linear interpolation to create a new data point for the timestamp that was missed. To do this, take the energy consumption from the first kWh consumption value after the gap and divide that by the total number of time stamps that were missed plus 1. So for a single missed value, take the next read and divide it by two. Then you create a new data point for the gap and give both data points the same, interpolated value. This method will be sufficient for most analysis purposes.
- Advanced interpolation: if it is important that all of your data points are as close to the ground truth as possible there are many options for improving upon a simple linear interpolation. This will be more likely to be true for data sets with a large number of gaps (although in this case you should also consider investigating why you have so many gaps). One such improvement is to use the average consumption data from similar meters (e.g. those in the same rate class) to calculate the best allocation. If, for comparable meters, hour 1 averaged 10% higher consumption than hour 2, you should apportion your energy interpolation accordingly. This has been our preferred method since weather-driven variance is "baked-in" to the variation seen in the other meters, as is any other external factor.
These issues can be systematically avoided with care and attention paid to the data model you use and the systems you use to interact with your data. However, it is important to be aware of the subtleties in your data whenever you perform analysis, especially when this analysis may be used to inform business decisions!