Program evaluation is the systematic collection, assessment, and dissemination of information to provide constructive feedback about a particular program, project, or policy. Information gleaned from a program evaluation can determine whether and why a program is needed, whether and how a program is being implemented correctly, whether and to what extent a program is actually making an impact, and other specifics useful to facilitate a program’s development, implementation, or improvement. These specifics are then used by the program's clients or stakeholders in making decisions about whether to continue or modify the program.
A form of applied research with wide application, program evaluation is most commonly used in health and human services, education, business administration, economic development, and public policy.
Types of program evaluation
The broad scope of program evaluation has inspired the creation of multiple classification schemes dividing evaluations by strategy or purpose. The most basic distinction in evaluation types is that between formative and summative evaluation. Formative evaluations are conducted concurrently with the program's implementation and are intended to assess ongoing program activities and provide feedback to monitor and improve the program; summative evaluations are conducted retrospectively of the program's implementation and are intended to assess a program's outcomes or impacts.
Formative and summative evaluations can be further subdivided into other evaluation types, but their classification is only important insofar as it helps evaluators clarify key questions, such as for what purposes an evaluation is being done and what kinds of information are needed for it. After all, different types of program evaluation yield different types of information. Although programs are most often evaluated to measure their effects, program evaluation may be conducted at any stage of a program’s life to assess the program’s necessity or goals, logic or theory, process or implementation, outcomes or impact, and cost-benefit ratio or cost-effectiveness.
When conducted in chronological order for a single program, these different types of evaluation are better thought of as stages in a multi-step evaluation process. First, a needs assessment tells about whether and to what extent a program is necessary. Second, a program theory, also known as a logic model, details how and why a needed program’s activities will bring about the program’s outcomes. Third, the implementation of those activities is assessed in a process evaluation. Fourth, once the program activities have been implemented for a long enough time, its outcomes may be evaluated to see what it has achieved—either through outcome evaluation or impact evaluation. Finally, after the outcomes (both the costs and the benefits) are known, a cost-benefit analysis or cost-effectiveness analysis may be done.
A framework for program evaluation
By conducting a program evaluation in logical steps or stages, evaluators at least partially adhere to an evaluation framework. An evaluation framework is useful to "summarize and organize the essential elements of program evaluation, provide a common frame of reference for conducting evaluations, and clarify the steps in program evaluation."
Treating a program evaluation as sequential steps is only one of two important components in following an evaluation framework; the other is abiding by a set of predefined standards when carrying out each step.
Standards in program evaluation
Standards for program evaluation address important concerns such as whether an evaluation yields accurate information and whether it is done in an ethical manner. Although the standards are not unified, the American National Standards Institute approved a set of standards published in 1994 by the Joint Committee on Standards for Educational Evaluation (JCSEE). The JCSEE established thirty standards and divided them into four categories:
- Utility standards, which are designed to ensure that an evaluation is useful to clients or stakeholders by providing timely, clear, and above all pertinent information.
- Feasibility standards, which are designed to ensure that an evaluation is practical, politically viable, and cost effective.
- Propriety standards, which are designed to ensure that an evaluation is done legally and ethically.
- Accuracy standards, which are designed to ensure that information gleaned from an evaluation is carefully documented and considered for its validity and reliability.
Measurement in program evaluation
Among the standards proposed for accuracy are common standards of measurement. Measurement is essential to program evaluation. A government agency conducting a needs assessment of a financial assistance program for the indigent must measure the number of needy people to define the target population. A private school conducting a cost-benefit analysis of a merit-based promotion program for its teachers must measure both the costs and the benefits of the program to calculate the cost-benefit ratio.
When doing these evaluations, however, both the government agency and the private school must ensure that their measurement instruments are valid, reliable, and sensitive, or risk distorting the results of the evaluation. A measurement is valid if it measures what it is intended to measure, reliable if it produces the same results when used to repeatedly measure the same thing, and sensitive if it can accurately discern changes in a scrutinized measure.
Consider, for example, the government agency that measures the number of needy people in a given area. Is the U.S. federal government’s historical poverty measure, which is based exclusively on family cash income, a valid measure of neediness? The Department of Commerce’s Rebecca Blank denies that it is, citing both expansion in federal safety net programs (which may overstate a household’s neediness) and relative increases in necessary expenditures like out-of-pocket medical costs (which may understate a household’s neediness). By failing to account for many potential changes to disposable income (e.g., Medicaid assistance for a disabled individual), the historical poverty measure is also insensitive to corresponding changes in neediness.
A brief history of program evaluation
Standardized measurements and formalized procedures for program evaluation are relatively new and, in the United States at least, correspond to, and were practically necessitated by, the explosive growth of major domestic programs in the 1960s, such John F. Kennedy's New Frontier and Lyndon B. Johnson's Great Society. In 1993, Congress passed the Government Performance and Results Act (GPRA), which is partly overseen by the Office of Management and Budget (OMB) and which mandates that all federal agencies "report annually on their achievement of performance goals, explain why any goals were not met, and summarize the findings of any program evaluations conducted during the year." A 2004 report from the Government Accountability Office (GAO) remarked that the GPRA's requirements had "established a solid foundation of results-oriented performance planning, measurement, and reporting in the federal government," but concluded that more work yet remains to achieve the lofty goal of a truly "results-oriented government."
- Center for Disease Control (CDC) detailing the purpose of an evaluation framework
- Although the JCSEE is primarily concerned with educational evaluation, its set of standards may apply to all types of program evaluation.
- Brookings Institution testimony by Rebecca Blank
- GAO Report 2000 (PDF)
- GAO Report 2004