In my last post, I introduced the idea that measurement should first and foremost be a tool to help organizations improve their impact. A natural question follows that assertion: Which type of measurement best accomplishes this purpose, monitoring or evaluation?
To get at an answer, think about the differences between the two. Performance monitoring (sometimes called performance management) is the continuous tracking and analysis of program, operational, and financial data—usually done by an organization’s own staff through a data system. Evaluation is periodic studies or assessments, typically to achieve deeper understanding about programs and conducted by third-parties. (The most common type of evaluation is impact evaluation, which seeks to prove the effectiveness of a program, though there are other types as well, including implementation evaluation and needs assessment.)
Which type is better suited to helping organizations improve their impact? I think the answer is both—and neither. Performance monitoring is great because, done well, it means that the organization can learn in real time and use what it learns immediately to guide decision making. Monitoring also builds the capacity of staff to integrate measurement into how they do their work, and it is generally cost effective. The problem is that, very often, monitoring is not rigorous enough for organizations to use its findings to make changes to their model confidently; for example, deciding which beneficiaries they should target or which program components they should strengthen, continue, or eliminate. A staff member at one leading evaluation firm recently told me that across his firm’s evaluations historically, he did not see a correlation between the outcomes a program reports (from performance monitoring) and the outcomes the evaluation firm finds when it does a rigorous study. Why is that? Because organizations often lack the expertise and resources to do deep analytics on the data they collect, to compare their performance to a carefully-matched comparison group, and so on.
As you can probably guess, evaluation has the opposite problem. While it can provide deep learning that enables clients to improve their program models, evaluations typically are more expensive and take longer to share back data and insights. Those organizations lucky enough to undertake them typically get data back far too late to make real-time decisions; what’s more, the data is often presented in reports that are difficult for the organization’s staff to understand. One organization I worked with got an evaluation study back so late that it had already changed the model it had originally contracted to evaluate!
Now imagine there was something to fill the "missing middle." Some type of measurement that was rigorous enough to provide the right signals, affordable enough that organizations can undertake the task in the first place, fast enough so organizations can make decisions, and understandable enough to transfer skills and build capacity.
Over the past few months, I’ve gotten some "weak signals" suggesting this missing middle might finally be getting filled. TCC Group recently shared a new form of evaluation it calls "research and development"—which it describes as quick and inclusive (program leaders and evaluators meet often to analyze data, identify meaningful insights, and make improvements). A new organization focused on rigorous evaluation in global development, ID Insight, describes its approach as "client-centered, timely, affordable, and guiding managerial decision-making." And a leading evaluation firm recently shared that one of its biggest growth opportunities is in ‘mini-RCTs’ (or randomized control trials), which it described as gold-standard impact evaluations that take under six months.
How do all three of these approaches get the "best of both worlds?" They focus on evaluating "proximate outcomes" or the outcomes that the program most directly achieves. For a tutoring program, this approach might focus on improvements in mathematics grades as opposed to high school graduation rates. For a malaria treatment program, this approach might focus on proper bed net use by clients as opposed to reduction in malaria. These evaluations would then rely on literature reviews that show how the shorter-term outcomes studied lead to the longer-term outcomes not studied. Of course, therein lies the big limitation: Such evaluations could provide misleading signals if performed on interventions where the literature doesn’t adequately demonstrate clear ties between short- and long-term outcomes in the specific context in which an organization is operating.
A panacea? Definitely not. An exciting development to pay attention to? Absolutely.
What are your reactions? Would you give this form of evaluation a try? I look forward to your thoughts!
Matt Forti is Bridgespan’s Performance Measurement Practice Area Manager. He can be reached at [email protected]