Summary: Request for Information on Performance Data for Solar Photovoltaic Systems – Acquisitions, Access, and, Sharing

On October 14, 2022, the U.S. Department of Energy (DOE) Solar Energy Technologies Office (SETO) released a request for information (RFI) on Performance Data for Solar Photovoltaic Systems: Acquisitions, Access, and Sharing for public comment and response.  

The purpose of this RFI was to solicit feedback from various stakeholders, such as industry, academia, government agencies, and research laboratories.  

RFI Categories 

Respondents addressed questions in three different categories:  

  • Cost and Value of Data (from a data owner perspective)
  • Access, Availability, and Value of Data (from a data user perspective) 
  • Value-add Ancillary Datasets (from a system developer/owner perspective) 

SETO received and reviewed a total of 14 RFI responses. This document presents aggregated information from all RFI responses, organized by the categories above. 

 Note: This is only a summary of information gathered by DOE. None of the information in this summary is a commitment to perform work on any specific topic area. There is no potential funding tied to this summary. DOE will use the information gathered to determine how and whether to develop future programming. 

Cost and Value of Data from a Data Owner Perspective

Topic Area 
Key Inputs and Identified Issues  
Costs of Data Collection, Curation, and Storage  

Higher data volume leads to increased costs due to: 

  • Hardware procurement 
  • Hardware installation 
  • Hardware maintenance  
  • Data collection 
  • Data organization 
  • Data management 
  • Data analysis 

Granular data collection escalates costs, but economies of scale exist.  

Opportunity Cost of Public Data Sharing 

Two primary reasons that publicly sharing historical performance data incurs an opportunity cost: 

       1. Lack of perceived financial or strategic benefits 

       2. Risk of harming a data owner’s brand  

Operations and Maintenance Records 

Sharing O&M records presents challenges due to: 

  • Proprietary information 
  • Confidentiality concerns 
  • Lack of standardization  

This results in high costs and potential for misinterpretation. 

 

  • In this section, we asked four questions to understand the cost and barriers to data sharing from the data owner perspective. These questions inquired on the overall costs of curating a multi-year, high-quality dataset for a photovoltaic (PV) system, including the added cost associated with collecting, curating, and storing data. We explored respondents’ perceptions of the opportunity cost of sharing their data publicly. Finally, we assessed the costs of curating and sharing O&M records. 

    Costs of data collection, curation, and storage 

    Data owners generally validated the assumption that a higher volume of data incurs a higher cost, which can be traced back to costs of hardware procurement, installation, and maintenance. There are additional costs associated with data collection, hosting, organization, management, and analysis, which all generally scale with volume, although at a slower rate. Granular data collection (e.g., from strings of PV modules or from individual trackers) will rapidly increase the volume and some of the associated costs, however, there are economies of scale to be realized. Data exchange can also be costly in the absence of standardization and automation. 

    Opportunity cost of public data sharing

    Based on the answers from most data owners, we can confirm that there is an opportunity cost in publicly sharing historical performance data. One source of that cost is the absence of perceived financial or strategic benefits, especially regarding the real costs associated with the curation of the data and the approval process for sharing it. Another source is the risk of harming the data owner’s brand if the shared dataset creates a perception of underperformance. 

    The respondents proposed a few different mitigating factors. We asked if the opportunity cost could be decreased by collecting less precise values or by sharing old data and received input from six respondents. Collecting less precise data did not appear advantageous but everyone acknowledged sharing some form of anonymized data, particularly rounding geographic coordinates or normalizing measurements, would decrease the opportunity cost. Two responses cautioned that too much obfuscation could decrease the value of the shared data, while a different respondent asserted that a partially anonymized data set could still maintain enough precision to be useful. Respondents generally agreed that sharing historical data over current performance data would decrease the financial and opportunity costs. However, there were differing opinions on how much the opportunity cost would decrease for historical over current data, ranging from a slight to substantial improvement. A couple of respondents also suggested that better data standardization could decrease the overall costs of preparing and analyzing data, and this could also decrease the likelihood of misinterpretation. 

    Operations and maintenance

    The challenges and risks associated with sharing operations and maintenance (O&M) records are even steeper than the ones associated with time series data. Concerns about proprietary information and confidentiality were widely expressed, as O&M records may contain sensitive information. In addition, numerous responses noted the lack of a common data schema to easily host information originating from monitoring platforms. This lack of standardization poses multiple challenges, including high costs, lack of scalability (partially due to the natural language often included in O&M records), and misinterpretation of data. 


Access, Availability, and Value of Data from a Data User Perspective 

Topic Area
Key Inputs and Identified Issues
Availability and Desired PV System Data 

Data users require: 

  • Longer historical records (five years minimum) 
  • Good quality metadata 
  • System specifications 
  • Maintenance records 
  • Weather data  
  • Metadata (information about data collection and correction) 

Granular information like DC-side data, multiple irradiance measurements, environmental data, and real and reactive power data is essential. 

Residential System Data 

High-resolution time-series data from residential systems, complemented by DC and environmental data, is valuable in areas with many small rooftop systems for: 

  • Accurate performance modeling  
  • Effective O&M 
  • Dispatch scheduling  
Data Interfaces  
  • Most respondents prefer accessing data via an Application Programming Interface (API) for long-term value.  
  • Interactive interfaces and tools enhance data exploration and analysis, and support remote analyses.
Environmental Data 

Respondents suggested a series of additional environmental data, that if collected at high granularity, would be helpful to more accurately model and forecast the performance PV systems. This included: 

  • Particulate matter
  • Snow depth
  • Ground surface albedo

Respondents also highlighted existing datasets collected by other agencies such as the remote-sensing datasets collected by NASA and the ground measurements collected by state-sponsored meteorological sensor networks (mesonets). 

 

  • This section was designed to understand the needs of data users through six questions. First, we asked whether data users had access to the data they needed and what the minimum and optimum sets of PV data would be for their uses. Then, we explored the value of residential data and any unique parameters that cannot be estimated from larger systems and models. Further, we sought to understand what type of interactive interfaces exist for accessing time series data. Finally, we asked about the availability and necessity of environmental data. 

    Availability and desired PV system data

    Responses from data users regarding the availability of data pointed to the spectrum of needs for R&D and analysis by different types of users. Still, recurring themes included the need for longer historic records (with five years mentioned as a minimum length), as well as the need for good quality metadata and system specification data. Respondents also identified maintenance records, weather records, and information about data collection and correction as important. Some pointed out that certain analyses require high sampling rates and/or time-aligned records of real and reactive power from the inverter. Others highlighted that while lengthy performance records for newly deployed technologies are not yet available, they will eventually be necessary to generate accurate analyses for such systems. 

    There was broad agreement regarding the types of data that should be collected from a PV system to allow optimal analysis of its performance. While basic system specifications, basic weather information, and AC power output from the system are enough for a first order analysis, many respondents highlighted the need for more granular information, such as DC-side data from the inverters and the combiners, multiple irradiance measurements, enhanced environmental data such as wind speed, snow coverage, module soiling, and back-of-module temperature, etc. 

    Residential system data

    Respondents were asked about the value of high temporal resolution (at least as often as four times an hour) timeseries from residential systems. There was a broad consensus about its value towards a more effective O&M practice in that space, especially if it is complemented with additional data beyond energy output. That additional data (mainly from the DC part of the array) is necessary to quantify the impact of soiling, snow, shading, and limited backside convection. Additionally, high-resolution data and high-fidelity information about system orientation and size were deemed essential for generating efficient dispatch schedules by grid operators in areas with a large number of small rooftop systems. 

    Further, there were multiple parameters that are challenging to model without time series data. Numerous respondents commented that O&M events cannot be modeled for residential systems. Other environmental variables like snow cover, soiling, shading, and roof and equipment temperatures also pose a challenge, and modeling could benefit from their data collection. 

    Data interfaces

    Most respondents agreed that the maximum long-term value from time series datasets is extracted when the data are accessed via an Application Programming Interface (API). Many highlighted that interactive interfaces add value to the datasets by allowing users to browse efficiently across many geographically distributed systems. Such interfaces allow the overlay of ancillary information either in timeseries or geo-indexed, which also supports exploratory analyses and situational awareness. Finally, interactive tools can be used to run analyses remotely, which may decrease costs and time associated with data ingestion.  

    Among the recommended examples of public interactive interfaces to PV performance data are the Map of New York State Distributed Energy Resources Facilities and the Photovoltaic Geographical Information System (PVGIS) of the European Commission

    Environmental data

    Finally, respondents provided a series of additional environmental data that, if collected at high granularity, would be helpful to more accurately model and forecast the performance of PV systems. Among them were particulate matter (including smoke and pollen), snow depth, and ground surface albedo. In addition, the respondents highlighted the opportunity to leverage existing datasets collected by other agencies such as the remote-sensing datasets collected by NASA and the ground measurements collected by state-sponsored meteorological sensor networks (mesonets). 


Value-Add Ancillary Datasets from a System Developer/Owner Perspective 

Topic Area
Key Input and Identified Issues 
Potential Value-Add Data

Respondents stressed the need for granular information from the field. This information can be obtained via: 

  • Sensors at the string level 
  • Sensors on trackers that are accessible independently from the tracker controllers 
  • Documentation of inverter fault codes 
  • “As-built” records with precise geolocation  
  • Higher quality maintenance logs 

In general, a consensus appears around the need for multi-level observability of the power plant. 

Aerial Inspection Data  Respondents highlighted the value of annual aerial imaging for identifying faults at the module or cell level. 
Extreme Weather Damage Data 

Some system owners reported they collect data related to damage caused to PV systems by extreme weather events using: 

  • In-field metrology  
  • Aerial and satellite imagery  

Respondents mentioned the potential usefulness of the data to inform permitting and insurance policies. They also highlighted the opportunity for the National Labs to provide a systematic data collection and analysis of storm impacts and responses to affected systems. 

 

  • In this last section, we gained the perspective of system developers and owners through three questions. We asked questions about which data could provide additional value but is currently not collected, and further, what barriers to collecting it exist. We sought clarity on the use of aerial inspection for operational evaluation. Finally, we requested information on whether developers and owners collected data on damage from extreme weather events.  

    Potential value-add data 

    System owners stressed the need to collect additional, more granular, and higher-fidelity information from the field. That information can come from "upstream" sensors at the string level, sensors on trackers that are accessible independently from the tracker controllers, as-built records with precise geolocation, documentation of inverter fault codes, and higher quality maintenance logs. In general, a consensus appears around the need for multi-level observability of the power plant with (a) timeseries data collected from the module strings all the way to the Point-Of-Common-Coupling (POCC) and (b) additional, high-quality information from the Engineering, Procurement, and Construction (EPC), as well as the Operation and Maintenance (O&M) phases.  

    While most responders agree that the additional information will improve insight into and efficiency of power plant operation, they concede that the additional cost to support the information collection, analysis, and storage needs to be justified by a proof of its value.  

    Aerial inspection data  

    In the same vein, there is agreement regarding the value extracted from aerial images of power plants: it is a cost-effective way to identify faults at the module or even cell level. The respondents identified an annual aerial imaging as the minimum acceptable frequency, which can be further reduced for systems with consistent above-average performance.  

    Extreme weather damage data 

    Finally, some system owners reported they collect data related to damage caused to PV systems by extreme weather events. The collection techniques can range from in-field metrology, to aerial and satellite imagery. While most respondents use such data to assess the impact on the installed assets, some mentioned the potential usefulness of the data to inform permitting and insurance policies. Lastly, other respondents identified the risk of losing forensic data during the repair/replacement/recycling process, but also highlighted the opportunity for the National Labs to provide a systematic data collection and analysis of storm impacts and responses to affected systems.  


Learn more about SETO’s photovoltaic research, view current funding opportunities, and sign up for our newsletter to stay updated on the latest SETO news.