Overcoming the Challenge of Imperfect Data

There's no such things as perfect data. Yet performance models need high-quality input data to produce high-quality outputs. What gives?

Overcoming the Challenge of Imperfect Data

by Steve Hanawalt

This is the fifth article in an eight-part series about the top four challenges in solar performance monitoring and how to overcome these challenges.

The Imperfect Data Challenge

As we discussed in the first article in this series, the purpose of a solar monitoring system is to characterize the operational performance of the plant’s equipment so we can ensure the equipment is performing well.

Model developers attempt to provide this information by creating performance models that consume plant operating data and then use that data to estimate what the equipment should be doing. If a large enough deviation from expected is identified by the software, an operating event is triggered notifying operators of an apparent plant or equipment performance problem.

I say “apparent” for two reasons:

  1. No performance monitoring system can determine with 100% confidence that there is an actual performance problem.
  2. Historically, solar monitoring systems have generated a high percentage of “false-positive” performance events.

As I talked about in the first article in the series, this is due in part to the problem of solar’s imperfect data — in other words, most solar monitoring platforms work well if they are fed accurate, continuously cleansed data, but, when fed the kind of messy data we see in the real world, they often fail.

In the first article, we also discussed that is not economically feasible to retrofit solar power projects with laboratory-quality sensors, meters and data acquisition systems to fix the problem at its source. So, if creating perfect operating data at the source is not feasible and performance models need high-quality input data to produce high-quality outputs, is there a commercially viable solution to the problem?

The Way Around the Problem

The answer is yes. But the industry needs a new approach to solar asset performance management (APM). This new approach is at the foundation of Power Factors’ Drive Pro asset performance management software platform.

A few of the key assumptions we designed into the product specifications include:

  • Noisy is normal: Throw out the expectation of getting a constant stream of clean operating data from solar power plants and work with the data you’re dealt
  • Models aren’t magicians: Don’t ask a performance model to detect a problem it cannot see; instead, guide users to the best available answers
  • One size doesn’t fit all: Apply the right model to the right asset at the right time to find the problem — there is no silver bullet in asset performance management
  • Support the entire workflow: Optimizing the operational performance of a solar asset touches many solar portfolio stakeholders. The industry needs a solution that supports the identification, analysis, assignment and restoration phases of the performance recovery process, providing an end-to-end, fully integrated solution for owners and operators

Drive Pro brings this innovative approach to solar APM. Our customers are enthusiastic about using a collaborative tool to optimize the performance of their solar assets.

Validation, Estimation and Editing (VEE)

When developing the data processing engine for Drive Pro, our fundamental assumption was that solar plant operating data is always imperfect. To assume solar data can be consumed by the Drive Pro event and analytics engines prior to contextualization, qualification and cleansing is a recipe for failure. Most performance monitoring applications fail at this critical first step in the data processing phase.

To solve this problem, Power Factors developed a powerful data validation, estimation and editing (VEE) engine that qualifies all the time-series data as it streams into the data processing engine. Coupling real-time VEE data processes with an innovative Data Capability scoring method creates an entirely new standard of what trustworthy solar operating data can and should be.

For example, one of the sensors that fails most frequently in a solar power plant is the irradiance meter. Without this data it is almost impossible to calculate the expected performance of a solar asset. To address this common problem, engineers developed a set of cascading data curation tests to estimate real-time plant irradiance if the primary meter stops communicating or delivers erroneous data.

For an irradiance meter, this is a six-tiered process of validating and estimating current plant irradiance. If the primary irradiance meter is not reporting properly, the confidence factor around expected performance is reduced so operators can factor that in prior to rolling a truck. If an insufficient estimate is identified for the primary meter, metrics dependent on that meter may be temporarily unavailable until they are. Instead of generating bogus alarms, Drive Pro guides the user through how to recover the meter and get the full set of analytics back online.

Data Capability

One of the novel approaches that the Drive Pro team came up with to overcome the fatal flaw of nuisance alarms was the introduction of a new data quality concept called Data Capability. Data Capability is an analysis done on data ingested into the VEE engine that qualifies the underlying process data and assigns it a score.

The score is assigned to each plant and informs Drive Pro users of which analytics are enabled based on the quality of the input data. The Data Capability tool runs over 260 tests across all asset types (plants, inverters, combiners, subarrays, sensors, etc.). Every single asset is given a score which rolls up to the plant’s “global score.” This global score is based on the scores of the child assets at the plant and provides users an easy way to prioritize field improvement action across a portfolio.

This important data processing step was missing prior to Drive Pro and accomplishes two things. First, it sets realistic expectations around which analytics can and cannot be generated by Drive Pro given the current time-series data streaming in from the plant. Secondly, it provides insight into which sensors, meters and metadata are reporting bad or missing data so actions can be taken to correct the source of the data.

Now That I Have Good Data, What’s Next?

Now that we have cleansed, contextualized and classified data with a Data Capability score, we are ready to analyze it and identify real equipment performance problems. Without these important initial steps, monitoring systems will either miss or improperly report plant performance anomalies.

Conditioned operating data feeds the Drive Pro analytics engine which then identifies performance losses, using a standard classification system, by asset. Losses are totalized at the plant and asset level and compared with modeled and expected performance. Underperformance events are generated, and users are notified of problems — with a high degree of certainty on the asset affected, the amount of lost energy and the underlying source of the problem.

Now that they can trust the results of the performance management platform, asset managers, performance analysts and operators can turn their focus to activities that add the most value to the portfolio — instead of chasing down data problems


There is a way around the problem of imperfect operating data for the solar power asset class. The answer lies not in hoping and waiting for better data, but in incorporating a data processing engine that can generate high-quality, reliable results despite the faulty data received from solar power plants. The innovative approach used in the new Drive Pro solution is restoring owners’ and operators’ confidence in solar performance monitoring — and they’re using it to improve portfolio returns.

Want to learn more about how Power Factors’ Drive Pro asset performance management (APM) platform helps you overcome the challenge of imperfect data?

Steve Hanawalt is EVP and Founder at Power Factors.

Back to news & insights