Program Evaluation: Glossary

Evaluation and performance measurement professionals use terms that are common to the field of program evaluation. Knowledge of how these terms are used by evaluators will help program managers communicate their evaluation needs and expectations to a contractor. The definitions are listed alphabetically.

The terminology used reflects usage by federal evaluations experts. In some cases it differs from that used by energy-program evaluation experts in the private sector. For example, the private sector has not adopted the distinction between "outcome" and "impact" used by logic modelers and this site. The private sector uses "gross" and "net" impacts (or outcomes) to describe the concepts intended by "outcome" and "impact" on this site.

A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z

A

Accuracy. The degree of correspondence between the measurement made on an indicator and the true value of the indicator at the time of measurement. (Cochran 1977, p. 16)

Activities. The action steps necessary to produce program outputs. (McLaughlin and Jordan 1999)

B

Benchmark. A measurement or standard that serves as a point of reference by which process performance is measured. (GAO, Glossary)

Bias. The extent to which a measurement or a sampling or an analytic method systematically underestimates or overestimates a value. (GAO, Designing Evaluations 1991, p. 92.)

C

Comparison Group. A group of individuals or organizations that have not had the opportunity to receive program benefits that is measured to determine the extent to which its members have taken actions promoted by the program. Like a control group, the comparison group is used to measure the level to which the promoted actions would have been taken if the program did not exist. However, unlike a control group, a comparison group is chosen through methods other than randomization, e.g., selection on the basis of similar demographic characteristics. (OMB 2004. Campbell and Stanley 1963, p. 13.) See also "representative sample." Some sources also use this definition for "control group." (European Commission, RTD Evaluation Toolbox 2002, p. 256)

Construct. An attribute, usually unobservable, such as attitude or comfort, that is represented by an observable measure. (GAO, Designing Evaluations 1991, p. 92.

Control Group. A randomly selected group of individuals or organizations that have not had the opportunity to receive program benefits that is measured to determine the extent to which its members have taken actions promoted by the program. The control group is used to measure the level to which the promoted actions would have been taken if the program did not exist. (Campbell and Stanley 1963, p. 13.) See also "comparison group."

Cost-Benefit and Cost-Effectiveness. Comparison of a program's outputs or outcomes with the costs (resources expended) to produce them. Cost-effectiveness Evaluation analysis assesses the cost of meeting a single goal or objective, and can be used to identify the least costly alternative to meet that goal. Cost-benefit analysis aims to identify and compare all relevant costs and benefits, usually expressed in dollar terms. The two terms are often interchanged in evaluation discussions. (GAO, Definitions 1998, p. 5)

Cross-Sectional Data. Observations collected on subjects or events at a single point in time. (GAO, Designing Evaluations 1991, p. 92)

D

Deemed Savings. An estimate of an energy savings or energy-demand savings outcome (gross savings) for a single unit of an installed energy-efficiency or renewable-energy measure that (1) has been developed from data sources and analytical methods that are widely considered acceptable for the measure and purpose, and (2) will be applied to situations other than that for which it was developed. That is, the unit savings estimate is "deemed" to be acceptable for other applications. Deemed savings estimates are more often used in program planning than in evaluation. They should not be used for evaluation purposes when a program-specific evaluation can be performed. When a deemed savings estimate is used, it is important to know whether its baseline is an energy-efficiency code or open-market practice. The most extensive database of deemed savings is California's Database for Energy Efficiency Resources (DEER). The deemed savings in DEER are tailored to California.

Defensibility. The ability of evaluation results to stand up to scientific criticism. Defensibility is based on assessments by experts of the evaluation's validity, reliability, and accuracy. See also Strength.

Direct customers. The individuals or organizations that receive the outputs of a program. (McLaughlin and Jordan 1999)

E

External Factor. A factor that may enhance or nullify underlying program assumptions and thus the likelihood of goal achievement. Goal achievement may also be predicated on certain conditions (events) not happening. They are introduced by external forces or parties, and are not of the agency's own making. The factors may be economic, demographic, social, or environmental, and they may remain stable, change within predicted rates, or vary to an unexpected degree. (OMB, Circular No. A-11 2003, p.3)

External Validity. The extent to which a finding applies (or can be generalized) to persons, objects, settings, or times other than those that were the subject of study. (GAO, Designing Evaluations 1991, p. 92)

Evaluation. Evaluations are systematic, objective studies conducted periodically or on an ad hoc basis to assess how well a program is working. They help managers determine if timely adjustments are needed in program design to improve the rate, or quality, of achievement relative to the committed resources. (GAO, Definitions 1998, p. 3. OMB Circular No. A-11 2003, Section 200-2) In EERE, Evaluation Guide 2005, this definition applies to general program evaluations.

G

Generalizability. Used interchangeably with "external validity." (GAO, Designing Evaluations 1991, p. 92)

I

Impact Evaluation. The application of scientific research methods to estimate how much of the observed results, intended or not, are caused by program activities and how much might have been observed in the absence of the program. This form of evaluation is employed when external factors are known to influence the program's outcomes in order to isolate the program's contribution to achievement of its objectives. (GAO, Definitions 1998, p.5)

Indicator (also Performance Indicator). A particular characteristic used to measure outputs or outcomes; a quantifiable expression used to observe and track the status of a process. An indicator constitutes the observable evidence of accomplishments, changes made, or progress achieved. (Wisconsin Extension 1996, p. 8. OMB, Circular A-11 2003, Section 200-2.)

Internal Validity. The extent to which the causes of an effect are established by an inquiry. (GAO, Designing Evaluations 1991, p. 92)

L

Logic Model. A plausible and sensible diagram of the sequence of causes (resources, activities, and outputs) that produce the effects (outcomes) sought by the program. (McLaughlin and Jordan 1999)

Longitudinal Data. Observations collected over a period of time. The sample (instances or cases) may or may not be the same each time but the population remains constant. Longitudinal data are sometimes called "time series data." (GAO, Designing Evaluations 1991, p. 92)

M

Measurement. A procedure for assigning a number to an observed object or event. (GAO, Designing Evaluations 1991, p. 93)

N

Needs/Market Assessment Evaluation. An evaluation that assesses market baselines, customer needs, target markets, and how best to address these issues by the program in question. Findings help managers decide who constitutes the program's key markets and clients and how to best serve the intended customers. When performed at the beginning of a program, needs/market assessment evaluations also establish baselines against which to compare future progress. (EERE, Evaluation Guide 2005, p. 2. California Framework 2004, p. 429)

O

Outcome. Changes or benefits resulting from activities and outputs. Programs typically have multiple, sequential outcomes, sometimes called the program's outcome structure. First, there are "short term outcomes", those changes or benefits that are most closely associated with or "caused" by the program's outputs. Second, there are "intermediate outcomes," those changes that result from an application of the short-term outcomes. "Longer term outcomes" or program impacts, follow from the benefits accrued though the intermediate outcomes. (McLaughlin and Jordan 1999)

Outcome Evaluation. Measurement of the extent to which a program achieves its outcome-oriented objectives. It measures outputs and outcomes (including unintended effects) to judge program effectiveness but may also assess program process to understand how outcomes are produced. (GAO, Definitions 1998, p. 5)

Output. The product, good, or service offered to a program's direct customers. (McLaughlin and Jordan 1999)

P

Panel Data. A special form of longitudinal data in which observations are collected on the same sample of respondents over a period of time. (European Commission, RTD Evaluation Toolbox 2002, p. 263)

Peer Review. Objective review and advice from peers. EERE defines peer review as: "A rigorous, formal, and documented evaluation process using objective criteria and qualified and independent reviewers to make a judgment of the technical/ scientific/business merit, the actual or anticipated results, and the productivity and management effectiveness of programs and/or projects." (EERE, Peer Review Guide 2004, p.5)

Performance Measure. An indicator, statistic or metric used to gauge program performance. Also referred to as a performance indicator. (GAO Circular A-11 2003, Section 200-2)

Performance Measurement. The process of developing measurable indicators that can be systematically tracked to assess progress made in achieving predetermined goals and using such indicators to assess progress in achieving these goals. (GAO Glossary)

Persistence. The estimated or described changes in net program impacts over time taking into consideration all known factors that degrade the performance of a desired outcome, including retention in use and technical degradation of equipment performance. (ORNL 1999, p. 4; California Framework 2004, p. 435)

Probability Sampling. A method for drawing a sample from a population such that all possible samples have a known and specified probability of being drawn. (GAO, Designing Evaluations 1991, p. 92)

Process (or Implementation Process). Assessment of the extent to which a program is operating as intended. Process evaluation examines the efficiency and effectiveness of program implementation processes. It assesses program activities' conformance to program design, to professional standards or customer expectations, and to statutory and regulatory requirements. (GAO, Definitions 1998, p. 5)

Program. "Program" refers to a collection of activities that are unified with respect to management structure and overall goal. (EERE, Peer Review Guide 2004, p.2)

Portfolio. A collection of projects. A single individual or organization can have multiple R&D portfolios. (DOE, R&D Portfolio Management)

Q

Qualitative Data. Information expressed in the form of words. (GAO, Designing Evaluations 1991, p. 93)

Quantitative Data. Information expressed in the form of numbers. Measurement gives a procedure for assigning numbers to observations. See Measurement. (GAO, Designing Evaluations 1991, p. 93)

R

Random Assignment. A method for assigning subjects to one or more groups by chance. (GAO, Designing Evaluations 1991, p. 93)

Reliability. The quality of a measurement process that would produce similar results on: (1) repeated observations of the same condition or event; or (2) multiple observations of the same condition or event by different observers. (GAO, Designing Evaluations 1991, p. 93)

Replication. An outcome effect that occurs when energy savings identified at a site are implemented elsewhere, e.g., at a different site, internal or external to the site. (As used in ORNL 1999.) The replication process usually is initiated at a program-participant site. (See also "spillover.")

Representative Sample. A sample that has approximately the same distribution of characteristics as the population from which it was drawn. (GAO, Designing Evaluations 1991, p. 93)

Resources. Human and financial inputs as well as other inputs required to support the program's activities. (McLaughlin and Jordan 1999)

Retention. An outcome effect that describes the degree to which measures or practices are retained in use after they are installed or implemented. (California Framework 2004, p. 438)

S

Simple Random Sample. A method for drawing a sample from a population such that all samples of a given size have equal probability of being drawn. (GAO, Designing Evaluations 1991, p. 93)

Significance Level. The probability of getting a particular value in a sample result - e.g., a mean of 43.0, or a proportion of 0.6, or a difference between two means of 3.0, or a quantitative relationship between the program treatment and an outcome—when, in fact, the hypothesized true value is some other value (that you must specify beforehand, e.g., a zero difference). The probability is often expressed using the Greek letter alpha (a) and should also be chosen before the data are collected. Probabilities (significance levels) of less than 0.1, 0.05, or 0.01 are typically selected for tests of significance. (Kachigan 1986, p. 165)

Spin-off. Savings estimates that are based on verbal or undocumented recommendations from an energy-efficiency program output. (As used in ORNL 1999, p. 14)

Spillover. The benefit of a program intervention that accrues to individuals or organizations that are not direct recipients of the program's outputs. (EAO, Toolkit 2003, chapter 9. California Framework 2004, p. 441.)

Strength. A term used to describe the overall defensibility of the evaluation as assessed by use of scientific practice, asking appropriate evaluation questions, documenting assumptions, making accurate measurements, and ruling out competing evidence of causation. (GAO, Designing Evaluations 1991, p. 16-18)

Structured Interview. An interview in which the questions to be asked, their sequence, and the detailed information to be gathered are all predetermined; used where maximum consistency across interviews and interviewees is needed. (GAO, Designing Evaluations 1991, p. 94)

T

Treatment Group. The subjects of the intervention being studied. (GAO, Designing Evaluations 1991, p. 94) See also "direct customers."

V

Validity. See "internal validity" and "external validity."