Elementary mathematics interventions are an example of the challenges facing the What Works Clearinghouse is using research to evaluate the effectiveness of educational programs. On its website WWC counts 73 interventions in elementary math (August 2011). It found that two of these showed "evidence of positive or potentially positive effects for at least one improvement outcome" and five showed "evidence of mixed or no discernable effects for all improvement outcomes." For the remaining sixty six, there were either no studies or "no studies meeting evidence standards."

The following discussion analyzes the Intervention Reports on two widely used textbook series: *Everyday Math* and *Saxon Math*, which were among the seven interventions with research that met the WWC evidence standards. These two interventions each have strong (and mutually exclusive) proponents. They are also widely used, as found in a study of New York city charter schools (this study identified no statistically significant differences in student outcomes for the three math programs):

Curriculum | Saxon math | Everyday Math | Scott Foresman math |
---|---|---|---|

Percent Usage | 39% | 30% | 8% |

*Everyday Math*, a widely adopted series, was one of two interventions that WWC put in the first group.

To do so, WWC looked at 72 studies. Only one met its evidence standards, with reservations. This study was a PhD thesis examining results in an unnamed "large North Texas urban school district." The district had received an NSF grant for $1 million per year for five years to introduce *Everyday Math* at 17 schools. This grant included 40 hours of training for teachers and administrators. Only seven of the schools were included in the study, however. The others were excluded because they had not implemented *Everyday Math*, citing costs or decisions by new principals to stick with the district's standard program. Several classes in those schools were also excluded from the study because teachers were using other books.

The comparison group were a group of students using the district's standard program. They were closely matched to the experimental group on baseline math achievement scores, student demographics, and geographical location. The study offers little information on this program other than it was "a more traditional mathematics curriculum approved by the school district." There is no mention of training for teachers in the comparison group. It appears there was no attempt to assure that teachers in the comparison group were actually using the standard program.

WWC concluded that while the effect was not statistically significant, "the effects on math achievement were large enough to be considered substantively important (that is, an effect size--the ratio of the difference to the standard deviation--of 0.25 or greater*). Based on this one study, the WWC categorized the effect of "*Everyday Mathematics*® on overall math achievement as being a substantively important positive effect."

### Saxon Math

*Saxon Math* was one of five programs classified as having "mixed or no disceranble effects." This conclusion was based on three studies (of 16 examined). Two studies were classified as "meeting standards with reservations" and were found to show no difference between *Saxon math* and the comparison groups (which were not listed).

The third study was deemed to meet evidence standards. WWC concluded that it showed "that impacts for *Saxon Elementary School Math* were significantly greater than the three comparison curricula considered jointly." In this study 110 elementary schools were recruited and randomly assigned to one of four math programs: *Investigations in Number, Data, and Space*; *Math Expressions*; *Saxon Math*; and *Scott Foresman-Addison Wesley Mathematics*. While the intervention report on Saxon math does not list the comparison reports, a Quick Review (also from WWC) does.

For second graders, the ranking (from top to bottom) is *Saxon*, *Expressions*, *Investigations*, and *Scott Foresman*. The difference between Saxon Math and Scott Foresman is statistically significant.

For first graders, the ranking (from top to bottom) is *Expressions*, *Saxon*, *Investigations*, and *Scott Foresman*. The difference between *Expressions* and the latter two was statistically significant. (In the first year of the study, *Saxon* was also statistically ahead of the latter two programs for first graders.)

### Comments

One concern about many studies is the lack of information on comparison interventions, both what they were and whether they were given the same support and attention as the intervention being studied. So far, at least, it seems unlikely that the Intervention Reports will resolve the question of what works best.

* One interpretation of this difference is that if students in the comparison group are considered to be performing at the 50% percentile, a difference with an effect size of .25 would put students in the target program at arount the 60% percentile.