Collaborative Problem Solving: Developing inclusive criteria for meta-evaluation in education
In February 2017, we were asked to participate in a series of calls with USAID’s Office of Education. They were interested in developing a tool for assessing the quality of USAID-funded evidence in the education sector. They also wanted the international education community to validate this tool and somehow use it.
Finally, they wanted us to identify the body of reports from which to take stock of the lessons learned around topics of interest to the Office of Education related to the 2011 Education Strategy, which was coming to an end.
We certainly had our work cut out for us to produce such a synthesis.
USAID education programming is wide-ranging, from improving the quality of basic and higher education to fostering youth workforce development and increasing access to education in conflict and crisis settings. Every year, dozens of reports are published summarizing findings of USAID-funded interventions.
However, this body of evidence seldom goes through a systematic review.
To get started, we had to set a criterion for inclusion in the meta-evaluation. We knew that these needed to be USAID-funded evaluations related to the 2011 USAID Education Strategy published in 2013, as indicated by the Office of Education. We agreed that the single, latest published report should be used for each activity, so that the review was based on the most updated evidence.
We initially identified 27 evaluations that met the inclusion criteria. Then USAID clarified that it wanted the synthesis of lessons learned to be drawn from impact, as well as performance evaluations. Systematic reviews often focus on impact evaluations, so the decision to include performance evaluations was somewhat unusual.
On the one hand, this more than tripled the evidence base to 92 evaluation reports. On the other hand, developing a tool that could be used to assess the quality of different evaluation methodologies was going to be a challenge, as we couldn’t rely on a pre-existing evidence rating system like the What Works Clearinghouse.
Now evaluation quality needed to be defined in a way that was responsive to both evaluation types.
Enter BE2 and Principles of Evaluation Quality
Luckily for us, a framework for assessing the principles of quality of impact and performance evaluations had been proposed by the Department for International Development (DFID) in the UK a few years ago and were incorporated by a donor working group known as Building Evidence in Education (BE2).
The BE2 framework broke evaluation quality into multiple dimensions, which they called “principles of quality”, and is described in the BE2 guidance note on Assessing the Strength of Evidence in the Education Sector.
Looking at quality from multiple dimensions was exciting because it allowed for a comprehensive assessment of the areas where the body of reports wasn’t strong and could therefore be improved by future efforts. Basing the tool on an evaluation framework that had the backing of the BE2, which included USAID, DFID, the World Bank and United Nation agencies, also ensured that the tool was based on an internationally recognized framework.
For each principle, we worked with the Office of Education to develop assessment items based on the USAID Evaluation Policy and relevant USAID Automated Directives System (ADS) sections for evaluation.
Crowdsourcing the Review of Evaluations
Armed with our data set and a working analytical framework, we developed the tool in February 2017. We co-presented the pilot with the USAID at a workshop at the Comparative and International Education Society’s annual conference in March.
Not only were workshop participants supportive of the framework, they had useful feedback on the items and expressed interest in being part of the review process. The Education Office was thrilled by the enthusiastic support of the participants, and in April they asked us to devise a plan to implement a crowdsourcing approach to the review process.
Working closely with our USAID counterparts throughout the spring, we identified organizations to be invited to volunteer staff and created minimum qualifications for volunteer reviewers. Then in June, the Education Office requested that organizations nominate staff to serve as reviewers.
At the same time, we developed an online platform for each evaluation to be reviewed by two reviewers and provided them an online orientation. In July, reviews started.
As part of the process, each pair of reviewers also met virtually to reconcile any differences in scoring and produce consensus responses. In collaboration with the Education Office, we provided online training and support and hosted a full-day validation workshop where reviewers provided feedback to all items and item descriptors in the tool. A total of 36 reviewers from 21 organizations took part in the review process.
This collaborative process achieved three of the Education Office’s objectives for this study: 1) validating the tool with the international education community, 2) disseminating the BE2 framework, and 3) providing an opportunity for community members to read and discuss each other’s evaluations.
At the end of this process, we worked closely with the Education Office team to incorporate the reviewers’ feedback into a final tool. The Assessing the Quality of Education Evaluations tool has now been posted by the USAID Learning Lab as a public resource.
The results of our meta-evaluation showed that:
- Between 2013 and 2016, USAID published approximately 92 education-related evaluations;
- Out of these, 27 reports were based on impact evaluations;
- Only 19 of those evaluation reports met minimum criteria for evaluation quality set by the USAID Office of Education; and
- If systematic reviews were limited to quality impact evaluations only, about 80 percent of the reports would be left on the cutting room floor.
Increasing inclusivity of systematic reviews, by also considering performance evaluations, allowed for important lessons learned that otherwise would not be included in the evidence base for the synthesis around topics of interest related to the 2011 Education Strategy.
However, while it made sense to include performance evaluations in this case, it should be noted that it may also lead to an evidence base that is less rigorous. There is a fine balance to be found in the trade-off between rigor and utility.
Blog posts on the MSI blog represent the views of the authors and do not necessarily represent the views of MSI.