Below is the text version of the webinar for the Building Performance Database: Analyze Building Energy Data, Understand Market Trends, Inform Decisions, presented in October 2015. Watch the webinar.

Elena Alschuler:
Presentation cover slide:

Hello, everyone, and welcome to the Building Performance Database webinar. We're excited to share with you all the ways that this project has grown and expanded in the past year. And for those of you that are new, we'll be sort of reviewing the basics, as well, of what it is and why we're doing this. So go ahead to the next slide, please.

Next slide:
So we're going to do a quick introduction to what the Building Performance Database program is. I'm going to talk a little bit about sort of the philosophy and theory of change and how we built it, and things like that. And then we'll hand it over to Paul Mathew from Lawrence Berkeley National Lab, who's going to do a demo of the tool. And actually, we'll spend most of the time today actually demonstrating the tool and showing you how it works and giving some examples. And then we'll wrap it up at the end. Nancy's going to talk a little bit about how to contribute data. So just a reminder: We have everybody on mute. So if you could please enter all of your questions through the chat box. We're going to try and save 15 or 20 minutes at the end to answer everyone's questions.

Next slide:
So as you think of them, as we go, go ahead and enter them. OK. So, we sort of have two ways of thinking about buildings and data here in the Commercial Buildings office in Department of Energy. One of the things we're trying to do is integrate data throughout the building lifecycle. From the design and simulation process all the way through construction, operation, and renovation of the building, you can track all of the information both by the physical condition of the building and how it's operated, and its energy and other sustainability performance. So you really have sort of a living record of every building. I'm not going to go into too much detail here, but we have several tools available for the design and simulation phase. We have a new Energy Asset Scoring Tool that works like a light audit to assess the current conditions of an existing building and identify opportunities for improvements. And then many of you on the phone are familiar with Portfolio Manager, which sort of gives you like a miles-per-gallon rating of how your building is actually performing. Next slide.

Next slide:
So our second goal is to increase the availability and consistency of this data across buildings and markets and between stakeholders, right? So the previous slide we were talking about sort of the lifecycle of a single building, and now we want to talk about, you know, all the different actors who are exchanging and utilizing information about buildings and how we look at porfolios of buildings. So at the bottom of the slide here you see the BEDES dictionary, the Building Energy Data Exchange Specification. It sounds really wonky, but for those of you that work with data or software, know the importance the interoperability and standardized terms and definitions to really ensure that there's no accuracy or content loss when you're translating between tools. That can be a really painful, time-consuming thing. And so we're promoting this sort of Rosetta stone of terms and definitions to be used across the industry, both in federal tools as well as energy efficiency incentive programs and private-sector products. And then sort of building off of that, we have several different schemas, which are basically dataset use cases for specific uses. So Home Performance XML for residential audits. Building Sync is a standard format for commercial audits. Many of you may be familiar with Green Button for energy data. So the idea is there's this big dictionary but you don't use every word in the dictionary every time you write a letter. You just use the words that are relevant for what you're trying to say. Then we have the SEED platform, which we're not going to talk too much about today. But that's a tool that state and local governments can use to manage data about buildings in their jurisdiction. A lot of times they have data coming in from many different sources, like tax assessor records, Portfolio Manager, audits, participation in incentive programs, building permits, stuff like that. And then, where we're going to spend most of our time talking about today, is the Building Performance Database, which is the big, central anonymized dataset of building performance information. So basically, we've gone around the country, and we've said, "Hey, everybody, give us your data. We'll clean it, we'll anonymize it, we'll put it all in one place for other people to analyze and look at." Next slide.

Next slide:
So this is what the BPD looks like. We'll be opening it up in just a few minutes. So the BPD is the nation's, and I think maybe the world's, largest publically accessible dataset of information about buildings. It has information about the physical and operational characteristics of the building. Right now we've got about 870,000 records in there, but they are expanding all the time. We're always adding new data. And you'll see when we go into the tool, there's two main ways of looking at the data. There's an explorer interface, where you can define your own set of buildings that you want to look at. So you can zoom in on commercial buildings, or residential buildings, or homes in Pennsylvania, or big-box retail stores in California, or something like that. And look at the characteristics of the specific defined dataset. And then the compare tool allows you to compare two defined datasets based on a given variable. So you could look at the prevalence of different kinds of HVAC system types for commercial buildings in different climate zones. Or you could look at the relationship between Energy Star scores in different cities or something like that. So Paul will give a bunch of examples of this in a few minutes. Next slide, please.

Next slide:
So the BPD, the purpose is to make the data available publically. So the idea is, it's all real data. We don't have any modeled or simulated data or assumptions. It's all actual collected data. And what we hope it helps you do is assess efficiency opportunities, forecast the likely savings from investments, and understand the performance risks. So a lot of times people say, you know, if you replace this technology, it will save 10 percent. And that's sort of a modeling estimate with a single point output. But we know that in reality, there's a range of likely savings. Like maybe you have a 80 or 90 percent chance of saving at least 8 percent. Or you have a 70 percent chance of saving 10 percent. And so you want to get a sense of what that distribution of likely savings looks like. And that's really what we're trying to achieve here. It's very similar to what insurance companies do when they're pricing health insurance, or car insurance. Something like that. So we hope that this will both help you make better decisions and invest more intelligently in energy efficiency, as well as help generate more data in the future. Next slide.

Next slide:
OK, so actually, I think I'm going to hand it over to Paul, and he's going to talk through a few cool examples we've seen of people using the BPD in the past year.

Paul Mathew:
Thanks, Elena. Hi, everyone. Again, this is Paul Mathew with Lawrence Berkeley National Laboratory. So we'll just highlight a couple examples here, which also were featured in a blog post that we had on the BPD that went out a few weeks ago from the Department of Energy.

Next slide:
So here's our first use case, and a pretty common one, actually, which is to just use the BPD to kind of benchmark and sanity-check where a particular existing building might be. So this is a company, Facilities Dynamics out here on the West Coast. And they actually use it to train students to compare the facility they're looking at to their peers in the BPD, in order to potentially understand where they stand, and what the total savings potential might be, based on actual, real -- a peer group with real data.

Next slide:
Another example is just sanity-checking things like savings estimates. So there's a nonprofit called the Energy Coalition down in southern California, which is an efficiency program administrator that uses the BPD to double-check detailed savings calculations that are based on detailed models, and so on, so then they just want to look at, say that alright, we're saving a lot, we expect that this building's going to be really efficient -- how does that actually compare, the end savings figure to what its peers look like. And again, they like the idea that there's a data source out there with a bunch of, you know, actual measured data.

Next slide:
A third use case is helping to prioritize which buildings to go after. So the Vermont Energy Investment Corporation, for example, has used it -- or consultants there have used it -- to compare the information they have on a building and compare the EUI profile against similar facilities to see, and then help determine which ones they might actually want to go after. And by the way, in this case what was interesting also was that we actually don't have a lot of buildings from Vermont. But what they're able to do is that at least to broaden -- the BPD allows you to customize your peer groups -- so they could kind of broaden the peer group to include other buildings in that climate zone and get a better sense for what the opportunities might be.

Next slide:
And then I should also mention that the BPD, in addition to the actual user interface itself that we'll be showing you in a second here, we also have an application programming interface, or an API, that allows other third-party applications to kind of access the database directly and query the database to get results, that they can then display those results within their own applications. So we have a couple of these, as well. Actually, we have about 15 licensees. You actually have to license the API. So we have about 15 licensees right now. And I'll just highlight a couple.

Next slide:
There's a company called Choose Energy. I think it's based out of Texas, actually. Anyway, they use the BPD's data on energy consumption to help validate some of their internal models, their fundamental businesses to help users pick a utility service where you actually have a utility choice.

Next slide:
The second example of an API use is a small company called ResiSpeak, which is on the residential space. And they use the BPD to compare home performance to peer groups. So the person that started this company is actually based in North Carolina. They had a bunch of data from North Carolina, but they wanted to expand into other states, as well, so they were able to use the BPD to kind of provide peer group data for a bunch of the other states that they're trying to offer the folks to.

Next slide:
So with that, I'll hand it back to Elena, and then I'll come back for the demo.

Elena Alschuler:
Alright, thanks. Let's see ... So a couple of things to keep in mind about the BPD before we dive into the demo here. As I mentioned, it's all real data, not modeled data or anecdotal evidence. We do pretty rigorous cleansing and validation of the data and scrub out anything that looks suspicious and translate it into the BEDES format that I mentioned earlier, before we put into the Building Performance Database. Paul talked a little bit about the application programming interface, so here you sort of see the flow of the data here at the bottom. We get data from lots of difference places. It all goes into the BPD, where it can be accessed by the public or by other tools. Next.

Next slide:
Quick overview of the data that we have in the BPD. As I mentioned, we have about 870,000 buildings today. We have 128,000 commercial, which is about 2.3 percent of the U.S. commercial building stock. And then, 743,000 residential, which is a little bit less than a percent of the residential stock. We even know we have more residential buildings. That's actually a smaller percentage as a portion of the total building stock. One thing to keep in mind about the BPD is it is not meant to be representative. If you're looking for sort of high-level representative statistics about the U.S. building stock, you should really look at CBECS and RECS. They have a very rigorous sampling technique and it's really meant to be a perfect sampling of all the buildings in the country. The BPD, it's basically a crowd-sourcing effort where we're trying to get data from as many places as we possibly can. And so what it means is that you can drill in and create control parameters for the things that matter to you. So you can drill into a specific local market, or a specific kind of building of a specific age, in a way that the CBECS data doesn't have that level of steps and granularity, and create the control variables so that you feel relatively comfortable that the buildings that you're looking at are the kind of buildings that you want to see. I'm happy to take further questions on this after the demo. So I think with that, we should be handing off to the demo now. So Paul, take it away.

Paul Mathew:
Screen demo, beginning with BPD home page:

Alright. Thanks, Elena. You're looking at the DOE home page for the BPD, where you can find a link to the tool, a way to contact us via email, and so on, as well as some general information on the BPD. There's an FAQ list, and so on. I'm just going to go ahead and click on the tool here, which brings you to the main tool site. And the BPD is public domain. Anyone can create an account. There's no charge or fee or anything like that. There's a fairly simple sign- up screen. I'll just quickly show you what that looks like. So for those of you who have done this yet, can see it's the usual sort of thing. Your name, email, that sort of thing. And there's some terms and conditions. And then you'll be good to go. So I'm just going to log in with my info here, and we will get started with the demo. OK.

So when you first log in, you will actually be brought to this page called the explorer page, which, as the name suggests, allows you to explore the data by creating and analyzing a particular peer group that's of interest to you for analysis. So let's get you oriented. So when you first log in, in fact when you first create an account, it will actually bring up all commercial buildings. When you log in subsequently, it basically always goes back to the last one that you create. So I'm going to just bring up all commercial buildings. As you can see, we have about 127,000 commercial buildings in the database. So to get you oriented, the right pane here (hopefully you can see my mouse) is what we call the data filter pane, that allows you to kind of customize your particular peer group. And on the left are three main displays of the data. We have a histogram, a scatter plot, and a table, as well. So I'll speak about each of these in a second here with an example. So let's say, for an example -- I was trying to think of examples to do, and I always end up taking California example, so I was determined not to use a California example, so to be sure I'm going to show you office buildings in Massachusetts, as far away from California as we can get. So under building classification, I'm going to -- you can see there's a whole bunch of different building types over here -- this is similar to the CBECS list but not quite, because again, we get data from many different sources. So I'm just going to deselect "All" and then select "Office." And you'll notice the counter went from 127,000 to about 40,000. So we have about 40,000 commercial office buildings in the database for the whole U.S. And then it updates the chart. But then we need to now hone in on Massachusetts. So I'm going to select "Massachusetts" ... which is right there. And so now, we have the data for Massachusetts. So the histogram -- so now let's actually look at the data. So -- oh, I should probably quickly mention that you can, of course, save this dataset. Let's just do that, in that ... "Office Demo," and save that ... Oh -- it was already taken. Because I did that previously to this. OK, now it's Demo 1. And I'll just mention that you can save any number of datasets, and that shows up under "My Saved Datasets" here, and then right below that, we also have a bunch of example datasets in that we've kind of pre-created, so that you can also start from there, kind of edit that, as well.

OK. So let's look at the displays. The histogram, I think most of you should be familiar with. that. The Y axis here -- sorry, the X axis here -- shows the actual metric of interest. In this case, it's source energy use intensity, or EUI, shown in various bins. There's a bunch of other variables. I'll speak about a little bit more about those in a second. And then the vertical axis shows you the percentage or you could also show it as a count of buildings. I prefer percentage; I think it's more meaningful. The black line here with the bubble shows you the actual median value of that metric for this particular peer group that we've selected. So in this case it shows you a median value of source EUI is 163, KBtu per square foot per year. And then the lines on either side indicate the 25th and the 75th percentiles. So the 25th percentile is 107 KBtu per square foot per year, and the 75th is 214 KBtu per square foot per year. So this is your classic, basic comparison chart that allows you to compare yourself against a peer, and you can customize that peer group in various ways. And I just mention that you can use it certainly for existing buildings, to kind of do a quick check and a comparison, but even in new construction. In fact, we've had a lot of use cases where architects and engineers who are looking at simulated values during the design phase can actually sanity-check it against measured data. So that you can say then, well, if I'm claiming to be 40 percent better than average, I want to look at what my EUI looks like. It should look like it's, you know, definitely on the lower side of the EUI compared to a peer group, and so on.

I just want to quickly mention a little bit more about the other metrics. One of the things we added in this latest version of the BPD that we kind of beta-released in January -- in fact, did a full release in July -- was to gather a whole bunch more variables that you can analyze. So in addition to the usual suspects of EUI, you could also look at the distribution of floor area, and year built, and number of people, and ENERGY STAR® rating, and so on. I'm not going to get into all of those, since we don't have time for that. But I would like to show you just a few comparisons with the scatter plot, which of course, allows you to plot two variables against each other. Look at showing you source EUI versus floor area. And the typical pattern that you often see, which is that there's actually no strong relationship between EUI and floor area, and building site, basically, except that there tends to be usually more of a scatter in smaller buildings than in larger buildings. Some other ones that might be of particular interest is year built. It's always best to know whether buildings are getting more efficient or at least less energy-intensive, and often we find they're not. But not necessarily because they're less efficient, but because often newer buildings have more services, and so on. You know, more elevators and escalators, and so on. So the overall energy intensity doesn't necessarily come down. And again, we've seen this anecdotally in many other analyses, as well. Another one is, you know, against the operating hours. You can look at floor area against operating -- oh, sorry, source EUI against operating hours. A lot of scatter here, as well, as you can see. For any given range of operating hours, like 60 or 70 hours per week, there's still a huge site variation. And not a very strong relationship in terms of, you know, more operating hours necessarily means high EUI. I mean, it is true, obviously, that the lower the operating hours, generally, as you can see on this chart, the more dots with the lower EUIs. And as you go to more operating hours, there are fewer dots in the lower EUIs.

And then finally, I think another one that's sort of of interest is the ENERGY STAR rating. You can plot source EUI against Energy Star rating. And here you do, in fact, have a relationship pretty much as you would expect. With lower EUIs you get higher Energy Star scores. But of course, quite a range, because again, you normalize the various other parameters in addition to -- when you're actually looking at an ENERGY STAR rating. That's the scatter plot. And in the final display that we have on the explore page is the table. Actually, let me just collapse that window so you can see this. And this essentially gives you a tabular view of the data. You can analyze essentially the EUI metrics -- site, source site, electric, and fuel -- and you can categorize the -- or you can analyze it by various categories. It gives you high-level statistics. So for example, I can look at, categorize it by year built. So it shows me the mean, standard deviation, and the percentile values for source EUI for different year categories for year built. Or I can do it by (inaudible), categories of floor area, and so on. And then you can download this data to CSV files and do further analysis on it as you'd like.

One thing that I want to point out before we goon to the compare part of the tool and some of the other filters, is that you'll notice the counters on each of these. So the counter in the dataset filter pane shows you the total number of records we have of that particular set of filters. So it's 482 records of offices in Massachusetts, of which 476 have -- we have source EUI, essentially, for 476. So that's why you see "476" here. And only 285 of those do we actually have -- have both source EUI and Energy Star ratings. So the concept essentially is to show the data available in that particular display.

OK. So now I'd like to just show you a few more of these filters. We actually have five categories of filters and about 35 different filters that you can use to customize your particular peer group. But I'll make a caveat about the availability in a second, which is that's not to say that we have data on all those parameters for all these buildings. In fact, the data can be quite sparse for some of these parameters. So I'm just going to select now a different dataset back here on the left coast. So we're going to take California retail facilities, and just walk you through some -- I just want to give you a quick flavor for some of these filters. So I already showed you the building classification. You can be at a commercial or various commercial types or residential types. And then under the location, there are four ways you can filter for location. I showed you state. You can also filter by climate zone, and view the actual climate zone. You can filter by city. We have a handy-dandy type-ahead list, so you can, you know, select just buildings in San Francisco, for example. As you select these filters, you can see how the count sort of changes up here. That shows you that we only have about 124 retail buildings in San Francisco in the database, for example. So I'm going to take off that filter. And then you can also filter by zip code, as well, if you want to hone in on a particular zip code.

Then on the building information side, at the high level, you could filter by the year built. Say you only want to look at buildings that are built since 1980, for whatever reason. Or if you only want to look at buildings that have more than 80 operating hours per week, or 90 operating hours per week, you can play with that filter. Number of people, occupant density. You can, of course, filter by size of the building. You can choose to just have buildings that are less than 250,000 square feet, and so on. Then you can also -- you know, as you set these filters, if you want to kind of quickly turn them on and off, you can just check this enabled button, and it kind of takes care of that. You'll notice we can also filter by -- you can for instance choose to just look at ENERGY STAR labeled buildings, or those buildings that have an ENERGY STAR rating, and so on.

So then we also have a set of filters related to building systems. One of the key features of BPD, when we started out on this adventure four years ago, is that we said that we don't want to collect just whole-level building data, but we also want to collect system-level data, so that you can look at the relationships between building systems and overall building energy use in a sort of an empirical or an epidemiological way. So we set out right from the outset to collect as much of system-level data as we could. But having said that, we also have been really humbled, over the last four years. Building system data are very, very hard to come by. And so it's much more limited, compared to other filters. But we continue to add it, and as the database grows, you'll be able to view more of these kinds of analysis. But it so happens that we have a fairly rich filter -- sorry, building systems data from California. So for example, you can go and say, I only want to look at retail buildings that actually have packaged DX units for cooling, and then it shows you, well, you have 286 of these. And then you apply that filter, and then you can get the actual data for just that particular cohort, which is, again, California retail buildings with packaged DX units. And you can see the median EUI for that, 199, which is actually a little higher than the median for all retail buildings, for example.

And then finally, the filter -- final filter I point out here is energy use intensity. So for instance, you can choose to say that, well, I only want to look at buildings that have a certain range of source EUI or site EUI. And actually what we find it's useful for is if there's a really odd outlier or something, you can just say that, well, I want to ignore the outliers and only show me things with source EUI less than 1,000, or something. And then that way, we get sort of the outliers, and the chart is a little more easy to kind of look at and process and so on. So that's the explorer page. Of course, again, I just want to point out again that as you can constrain the filters, the dataset does get smaller. So as with any data analysis of this sort, there's a tradeoff between the specificity of your peer group and the total number of data points that you have to kind of analyze. OK.

With that, I would like to show you now the compare feature, the compare page of this tool. So when you come to the compare page, essentially what it does is that it first -- it brings in the last dataset you were analyzing in explorer. Remember we were looking at California retail? So that's what it has over here. And this page essentially just allows you to compare two datasets by ogling them on top of each other. As you can see, the layout is very similar to the explorer page, except that now you have two filter panes: one for Dataset 1, the purple dataset, and one for Dataset 2, the yellow dataset. And you can pretty much compare any two datasets. It doesn't restrict what sort of comparisons you can do. So you have full license to do pretty meaningless comparisons if you really wanted to, as well. But there are two common use cases. One is to compare buildings with different system types. So I want to compare buildings with HVAC Type A to buildings with HVAC Type B. Another is, as Elena alluded to a little earlier in the intro, was that you want to compare buildings that are in different categories or in different locations. If I'm a program administrator, I want to compare what my office stock looks like to my retail stock. Or small versus large. Or ENERGY STAR rated buildings in San Francisco versus those in New York, for example. So any of those kinds of comparisons you could do.

I'm going to give us a quick example of just a technology comparison. So I'm going to compare office buildings in California that have constant volume systems with office buildings in California that have variable volume systems. So the purple dataset is constant volume systems; the yellow dataset is variable volume systems. And as you can see, the histogram essentially overlays them both. You can see the medians and so on, and the scatter, as is sort of to be expected here. The variable volume ones have -- as the overall cohort -- have slightly lower median EUI, but you know, there's quite a bit of overlap here. Because obviously there's a bunch of other confining factors here, as well, right? So that's the histogram and the scatter plot.

In addition to those two, we have a third different -- slightly different display or way of kind of viewing this data. This one, admittedly a little more wonky, but we think rather valuable for those who want to really try and understand the differences in these distributions. It's essentially a distribution of the relative differences between the two datasets. In fact, I'm going to pull up a little tool tip. We have a tool tip throughout the tool here, so hopefully you can read this. It might be a little small on your screen. But I'll read it out to you. So it says here, the difference histogram is a "distribution of the relative differences between the two datasets," calculated as follows: It's Dataset 2 minus Dataset 1, divided by Dataset 1. So the X axis shows you the range of the difference, expressed as a percentage, and the Y axis is the number of all the (inaudible) buildings in each of those things. So bars to the left of 0 indicate that Dataset 1 -- sorry, Dataset 2 -- has smaller values than Dataset 1. And bars to the right of 0 would indicate that Dataset 1 has smaller values than Dataset 2. So as you can see in this particular case -- this is where 0 is, and most of the bars are in fact to the left of 0. The median difference is 21 percent. So what that means is that there's a 50 percent chance that a building in Dataset 2 has at least 21 percent lower source EUI than a building in Dataset 1. And then you can select any of the bars here to give you a sense of the cumulative probability of achieving this certain percentage difference. So for example, if I select the bar immediately to the left of 0, that shows you -- it indicates -- you can see a little writeup up there -- it says, "There is a 71 percent chance that a building in Dataset 2, variable volume dataset, has lower source EUI than a building in Dataset 1." So it's not 100 chance, but it's a 71 percent chance. And the congruous way of expressing that, of course, is by looking at the cumulative probability to the right of 0, which is: 100 minus 71, of course, is 29, so there's a 29 percent chance that a building in Dataset 2 has a higher score source EUI than a building in Dataset 1.

And then you can look at, likewise, the cumulative probability of any percentage difference of interest. Say you're particularly interested in what's the chance that there's at least 10 percent lower EUI? Well, I can go kind of to the 10th percent bar, which is right around there, and it says that there's about 64 percent chance that it's at least 10 percent lower. OK? So the way this is calculated is, there are two methods of calculating this distribution. One is called the actuarial method, and the other is a regression-based approach. And I don't have time to get into the details of how these are calculated, but we do have some technical documentation on that, if you're interested. But very quickly, the actuarial approach sort of repeatedly samples pairs of points from each dataset and uses the differences to generate distribution. And the regression, we actually use a combination of the two datasets, create a regression model, and then look at the parameter of difference and then predict those differences, essentially, as a distribution here. So again, we have some technical documentation on that, if you'd like to get into the weeds on that particular issue.

OK, so a few other quick points before I wrap up here. Some ancillary features. We have a map that shows you the density of data for the whole country, because people often wonder, hey, do you have any data from Montana? And, well, it turns out, not a lot. We have a lot of data for Washington state, California, Texas, the Northeast, the Southeast. But we're kind of thin on the Midwest and the Mountain states. So anyone who knows of or has good data, please do let us know. We will try and reward you in any way that doesn't involve monetary compensation, essentially, and you'll be helping the public cause here of growing a nice dataset. We also have a map of the actuary climate zones, for those of you that may not be familiar with those. And then we have some aids. There's a guided walk-through. In fact, the first time you log in, you'll be kind of forced into the guided walk-through, but you can sort of dismiss it if you don't need it. There's a bunch of videos on the various features that look at -- and very importantly, we have a feedback button right here. We very much welcome your feedback. Not just on bugs that you may find, or annoying features, but also new features that you'd like to see included in the tool. That helps us prioritize our development plans.

So before I hand it back to -- oh, I should also mention just very quickly: In addition to DOE and LBNL, we had some great companies working with us on this tool. And that included Earth Advantage from Portland and Sustainable IQ from Massachusetts. So before I hand it back to Elena and Nancy to wrap it up here, I would just like to close at two points, two final points on what kind of the tool is and is not. Our intent here really is to make data broadly available and allow the user to explore and compare data in various ways, for various purposes. I mean, there's a range of different use cases for these data. So the tool is not directed at a specific, narrow use case. So it's not going to kind of walk you through Step 1, Step 2, to get you to a particular result. Because again, there's a range of different use cases. So frankly, a commissioning agent looking to benchmark existing EUI would use the tool quite differently than a program administrator looking to identify sort of systemic shifts in EUI for buildings with different technologies. So we've intentionally kept the user experience somewhat open-ended, as opposed to, you know, really directed. So there are pros and cons to this. I mean, obviously, it gives the user a lot of flexibility, tools with analysis. The con is, of course, that you could do analyses that aren't particularly meaningful, as well. But hopefully you'll be able to get the hang of it fairly quickly and figure out how to use it effectively for your particular use case. And we're quite gratified that the BPD now has over 10,000 users, actually. And actually, a good chunk of them are in fact consultants and architects and engineers and so on. And then a second final point I want to make, and perhaps most important: You've probably heard the adage about "lies, da--ed lies, and statistics." So there's no doubt that a core value of the BPD here is that it's real data on hundreds of thousands of buildings. So it really is an incredible resource in that respect. But it is also true, and those of you that have worked with empirical data know this, that empirical data can be very messy. And sometimes have a poor signal-to-noise ratio. So as with any data-driven analysis, especially when you're working with empirical data, you should interpret results with a careful assessment of data characteristics and kind of just a sound understanding of the principles of building energy use. So again, we think it's a great tool, and can be used for a bunch of use cases, and we look forward to growing it. And with that, I think I'll segue into what Nancy will now talk about. So let me pull up your slide, Nancy, and let you get started on that.

Nancy Gonzalez:
Thanks, Paul. So just to give folks a little bit more information. I see some questions coming in, talking about where the data comes from, what the data is.

Next slide:
I think you see just all the parameters that are selected and included for folks who are in the BPD, as Paul walked through in the demonstration. But as Elena mentioned earlier, all the data is voluntary data from multiple private and public sources. Of course, we do require some fields within your dataset in order for it to imported into the BPD. We do ask that all datasets that are contributed contain a minimum of 50 building records. The cleansing process that the LBNL team goes through is quite rigorous, and so we do like to have a minimum of 50 building records. So one of the requirements, in terms of fields that must be included to contribute your data, we do need an address. So city, state, and zip code. The type of your facility: commercial, residential, hospital, home, office, retail. Also, your square footage. And of course, we need one year of energy consumption data, and when the building was completed. Of course, we are collecting, as Paul was mentioning, more in-depth data about the building and its systems. Portfolio Manager does do a good job with collecting some of the operational data fields, such as occupants, operating hours, and the activities that are associated with the floor space. But then we do want to collect energy equipment and asset information. So your heating and cooling equipment types and efficiencies, hot water equipment type and efficiency, building envelope information, and lighting systems. Next slide.

Next slide:
So how do you contribute data? So the first step is expressing interest, contacting our BPD team. And then we will work with you to make sure that your data meets the minimum requirements. We do have a very stringent privacy agreement that we ask all contributors to review and accept. It basically treats all the data that's contributed as proprietary information. So there are different ways when transferring your data to the team. Once you decide that you agree on the privacy terms, you will be sharing your data with the LBNL team. And there are different ways you can transfer your data. Portfolio Manager is, of course, one of the most popular ways of sharing information. There are certain fields that are not captured in Portfolio Manager, so if you have all this data or equipment or assets information, of course, that will require a separate transfer. So we do work with you on what electronic transfer serves your needs. And we just ask that any file we collect, an Excel file or a CSV file, just be labeled as proprietary. And then LBNL's team will work with you, whether it's setting up a file transfer upload or emailing your dataset to them directly, depending on what your need is. They'll work with you to do that. The team goes ahead and cleanses your data to anonymize it and remove any ears or outliers. And then the team will also go ahead and provide you with the cleansed version of your dataset, if that's of interest. And then we'll go ahead and let you know once your data has been posted onto the BPD. So if you're interested in checking out the BPD, we have the website here for you to do that. One thing to note, too, is there's also an API. So if you're interested in querying the BPD data, you can request a license and you can contact our team to do that.

Next slide:
And with that, I think we'll just open it up for questions. So one of the first questions is, where does the BPD draw the data from? I can go ahead and answer that. I mentioned that we collect data from both public and private sources. So this includes federal program data, data from state and local governments, we have data from (inaudible), from housing authorities, real estate, and asset management firms, schools, utility programs. A slew of stakeholders that have voluntarily contributed information.

Elena Alschuler:
And you can check out -- we have a full list of our data contributors on our website, buildings.energy.gov/bpd.

Nancy Gonzalez:
Let's see ... So one question, Paul, on accessing data, twofold: Can raw data be accessed, and can the data be downloaded for analysis in other softwares, it says. Excel or XPSF.

Paul Mathew:
Yea, good question. And that's a very common one, one of the most common questions we get. Unfortunately, we cannot provide access to the raw data, meaning the individual records, because that will violate our privacy constraints. Actually, I realized something I forgot to mention in the demo, that if your query results in less than 10 buildings, we in fact don't display the data precisely for that reason. Because we want to protect the privacy of the data. So you can't download individual records. But through the API, you can make essentially all the same queries that you see through the user interface, you can make through the API, and then get -- you know, for instance, all the table data and do the analysis at that level. But not the individual records, unfortunately.

Nancy Gonzalez:
Great; thank-you. Another question is interested in how to see hourly energy use profiled, if possible. And whether it could be viewed through using the API.

Paul Mathew:
OK, yea. You know, when the BPD started out four years ago, we wanted to make it a repository for all sorts of building data, I mean, everything from the hourly level or even the minute with control system data, all the way up to aggregate annual data. And then as we proceeded and went through the collection and so on, we kind of honed in first on the use cases of looking at more annual data. So we really don't have -- maybe we only have one data source with hourly data, that's actually not in -- I mean, hourly level data is not actually in the database itself, because we don't allow that level of analysis right now. At some point, again, you know, our development's going to be very data-driven. If we get a lot of data that is hourly data, there's undoubtedly enormous value in that, and hopefully some interesting analysis and decision-making that one could make based on that. You know, when we get to that point, we'd be happy to extend the capability of the BPD to accommodate that. But right now, we don't have that.

Nancy Gonzalez:
OK, thanks. A number of folks are wondering how the team manages the accuracy of the data, and how you keep duplicate data from being uploaded.

Paul Mathew:
Aha, good question. So I'm actually going to point you -- if you want all the gory details -- on the main DOE website (let me just pull it up right now) ... There is, on the .... taking a second to come up here; I guess the Internet's a little slow at the Department of Energy this morning. Anyway, I'll just mention it. There's a technical report on the data-cleansing procedures. So that has all the gory details of how we cleanse the data, as well as how we deal with data duplication. So there's this link on the right over here -- hopefully you can see my mouse -- says "BPD Data Preparation Process." And that is a PDF file with that technical report. But yes, we do run de-duplication of all the data based on various criteria. And we do sanity -- well, we do data cleansing, again, on all the data. What we do not do, which I should be clear about, though, is that we're not going and doing primary confirmation of the data. So we don't actually, obviously, visit any sites and confirming that those are in fact -- I mean, when we get data from a contributor, we talk to them about the data, where it came from, what data issues might be, and then we run our own check to each of the fields, whether they're within ranges, and so on and so forth.

Elena Alschuler:
And in many cases, we actually know the address and other identifying information about the building, so we can remove duplicates that way, even though that's not visible to the user.

Nancy Gonzalez:
And folks are asking about building information models and whether that's posted in BPD.

Paul Mathew:
It is not. The building information -- well, maybe I shall qualify that a little bit. Because "BIN" can mean different things to different people. But not building information model in the classic sense of these, you know, these large schemas. That is not what's in the BPD. The BPD hosts the data, again, to the extent that we have these data: high-level building characteristics and building system characteristics consistent with the BEDES data format or data dictionary. That's essentially what we store right now. But we don't store schemas of buildings, for example. Like IFC or XML, for example.

Nancy Gonzalez:
Some questions about the tool itself. A couple folks want to know if you can talk about the explore tool and its relationship to energy audits. And the BCN identification.

Paul Mathew:
Sure. I think if I understand the question correctly, it's like how it could be used in an audit process. I would say the first thing you want to do in an audit -- any attempt of using the tool in an audit process -- is simply benchmarking the way your building stands relative to its peer group. And you can compare it with different peer groups. Maybe just peer groups at the level of, you know, by building type and location. And then by size, and then by year built, and so on. So you know, compare your building to various custom peer groups. So that's the sort of thing you would do. And even in that analysis consistently you'll see that your building is not performing as well, it's average or worse, then you probably know that, OK, that is in fact a good candidate for an audit, for example. And then the second thing is that where to the extent data are available in the BPD, and this again, varies quite a bit, across different locations, you could do some basic system level analysis by comparing cohorts of systems just like I showed you. You know, how do buildings with HVAC Type A compare to buildings with HVAC Type B? But again, to be honest about it, I mean, our system-level data are somewhat limited. And so you'll just have to see, based on what location and building type, the extent to which that's available.

Nancy Gonzalez:
One user, going back to the tool itself again, was wondering why there's two different counters. That's in reference to the comparison tool. Two different counters on the bottom.

Paul Mathew:
Correct. Yea, so -- oh, in the compare tool? Yes. It's the same reason, essentially, that -- oh, I see. These two different counters is probably what the person is referring to. So the purple counter refers to the number of buildings from the purple dataset, and the yellow counter refers to the number of buildings from the yellow dataset that are displayed. I'm assuming --

Elena Alschuler:
Yea, and it's for which we know these variables, right? So there's a little bit of a difference here, right? So in Dataset 2 on the right, you see it has 870,000. That is the number of records that meet the limiting criteria that were selected. Right? But then when you're actually going to display it, of source EUI, there might be some data for which we don't know the source EUI, and so that's why it drops down from 870 to 770. And it looks like for Dataset 1 you know, for all of those records that are in California, office buildings that have constant volume, we know the source EUI. So it's the two datasets you're comparing, and then do those two datasets actually have the variables you want to compare on? We know it's really confusing. We went back and forth a bunch about how many counters to have and stuff like that.

Paul Mathew:
And actually, one of the reasons we added these counters was that in the old version of the BPD we only had the one counter that shows you the total number of buildings. And then people would wonder why a bunch of them didn't show up in the displays. It was because, well, you didn't have that particular variable for that building. So because we don't have all variables for all buildings. So that's why we decided to show multiple counters here. And there is, again, a handy-dandy tool tip here that kind of explains what these are.

Nancy Gonzalez:
So I think we should probably talk a little bit more about the anonymization. Some folks are asking, or saying, you only see the histogram or the scatter plot instead of being able to see the real data in an Excel, or you see a file. So it sounds like folks are wondering how to get access to the broad data, which I think Paul already mentioned, that if you want access to query the data, you can go ahead and request access to the API.

Paul Mathew:
Correct.

Elena Alschuler:
Yea, because of the privacy restrictions, we're not able to give out the raw data. We have the CBECS and RECS full detailed raw data. We have data from the state of California. We have data from energy efficiency programs and energy efficiency program evaluations. We have data from private real estate companies, and contractors and all sorts of business proprietary information. And the way that we've been able to get all this data is by promising folks that we will aggregate it and anonymize it. And so of course, ideally we'd all like to see all the raw data. But the only way we've been able to reach this scale and make this level of knowledge and information available is by aggregating and anonymizing it. And we do have the histogram and the scatter plot. We also have the table view. So, you know, you can still get a sense of sort of the number of buildings that fall in different types and categories, using the table view. And then we also have the API connection, as Paul mentioned. So if you know anyone who's a whiz at software coding, you can submit your own queries and not have to build every one through the user interface, if you want to do a bunch of them.

Paul Mathew:
Yea, and I'll just mention, for the API documentation, we have a site for that. Those of you that might be interested in that might want to just go and check out the API functions and the documentation to be able to do that.

Nancy Gonzalez:
So some questions about certain fields. One is roof type and the kinds of consideration in collecting some information on kinds of roof types. So reflective roof or green roof. And then other questions on whether the BPD could possibly in the future be expanded to include energy cost information or water usage information.

Paul Mathew:
Yea. Good questions. So on the roof category, quickly: If you log in, you'll see these are the roof type categories that we have in here. I don't remember off the top of my head frankly how much of this info we have. But obviously we have at least one building. We only include variables here if we have at least some buildings that have this. So yes, we do cover some of these, and you can see one of them is, in fact, green roof. And then sorry, Nancy, what was the second question on that?

Nancy Gonzalez:
It was on whether is there consideration to expand the BPD to include, collect fields of water usage and energy cost information?

Paul Mathew:
Oh, right, right, right. So I think that -- well, certainly anything's technically possible. I think I'll go back to, whether we want to do it or not is really driven by you, the users. A, demand for it, and B, data available for it -- we'll do it. We can begin to make a case to be able to fund that, and so on. So I think that -- it's really going to be driven at this point by demand and data. ... (inaudible -- multiple voices)

Nancy Gonzalez:
One question on being able to print or download charts for reports.

Paul Mathew:
Yes. You can download the table data by clicking this little button over here that I'm pointing to right now. The charts yes, we need to have a little download feature. I mean, you can, of course, just do a screenshot, as well, but we have it on our little punch list here, to add a little download feature for the charts, as well. We'll probably try and get to that this year, at some point.

Nancy Gonzalez:
And then some interest in whether there's public documents that summarize any of the data.

Paul Mathew:
There is ... (public documents that summarize any of the data?) ... I know we have an article -- let me go back to the DOE main page. Nancy or Elena, maybe you remember this more off the top of your head. I believe we have an article that talks about the data.

Elena Alschuler:
Yea, I think we published a paper for the ACEEE summer study two years ago, or a year and some ago? So that's already a little bit out of date. I think we were talking about doing some publications this year, letting the smart folks at Lawrence Berkeley go wild and see what fun things they can discover in the data. So keep your eye out for some publications coming out.

Paul Mathew:
Yea, and actually we did publish a paper back in December in Energy Policy. Sorry -- not Energy Policy. Applied Energy. It was a paper. So if that person, whoever asked the question, could just send us an email, we can forward that paper.

Elena Alschuler:
Paul, let's get a link of that paper put up on our website, too.

Paul Mathew:
Yea, we should probably do that, too, yes.

Elena Alschuler:
You see in the screenshot that Paul is showing, the bottom right, "Getting Real With Energy Data," all the way at the bottom of that list there -- that's the one from about a year ago. It has some sort of comparisons of BPD data to other data sources and some analysis of the data.

Nancy Gonzalez:
Thanks. A question on why the API has to be registered.

Paul Mathew:
Registered? I'm not sure what that means.

Nancy Gonzalez:
Licensed.

Paul Mathew:
Oh, licensed. Gosh, I guess I'm stumped there. I'm sure there is a reason, because I have no desire to --

Elena Alschuler:
It's because there's a code of conduct. First of all, in order -- well, two reasons. One is that we would like to be able to justify continuing to fund this project. So in order to do that, we want to keep track of who's using it, right? So we can go back to, you know, the senior people at DOE and Congress and say that your tax dollars are being well-spent on this project because people are using it. And the second reason is that there is a code of conduct that's part of the API agreement. So basically, if you're doing a whole bunch of calls on the API that look fishy to us, as if you're trying to isolate and identify individual records, you're violating your license agreement, and we'll terminate it.

Paul Mathew:
Right.

Elena Alschuler:
And I think we're actually at the hour. So if people have additional questions, they should feel free to email Paul or myself at a general BPD in-box. It was great to get so much questions from folks. Sometimes we're on these webinars, it's hard to tell if people are engaged. But these are all really good -- dare I call it discussion? -- and input from folks. So thanks so much for tuning in, and we hope you use the BPD and send us data and send us feedback, and we look forward to continuing to grow this project.

Paul Mathew:
Thank-you, everyone.

Elena Alschuler:
Thanks. Bye.