The school district’s Academic Performance Index (API) for 2012 has been released, and it breaks through an important psychological barrier: 800. The state has set that number as the target for all schools, and last year the district fell just shy at 796. This year — 807.
“Surpassing the 800 API mark is a huge milestone for our city and our schools,” Superintendent Carranza was quoted as saying in the school district’s press release on the API data (PDF). “San Francisco can count itself among only a few large urban school districts in the State that have exceeded the 800 target for academic performance.”
Out of 98 schools reporting, 51 have an API score of 800 or above; of the schools with an API of 799 or less, most met their state “growth targets” — the minimum level of improvement expected by the state.
Of course, it’s important to keep these things in perspective –many schools did not meet their growth targets for all subgroups — African American students, Latino students, Samoan students, students with disabilities–and the school district continues to have a broad gap in achievement between different racial groups, between English speakers and English learners, and between students with disabilities and their non-disabled peers. Still, the state has set the yardstick: an API over 800 means that more students are achieving at grade-level than not, and that is something to pause (briefly) and celebrate.
Dennis wrote: “4). Redistributing CMA results to CST BB and FBB results in a much larger downward adjustment (805 -> 789, difference = 16).”
If you look at Chart #5 (Doug’s Edsource article http://www.edsource.org/today/wp-content/uploads/5-table-v2.jpg ) you’ll see the cumulative effect of API inflation … Statewide API scores are reported to be 38.5% higher than they are in reality, for Grades 2-6 … and 26.7% higher for grades 7-8).
Inflation of 16 points a year may not seem like a big deal to some, but over the 5 year period it makes a big difference in the scores.
Honesty about the data is what is needed.
27% of African American Students in SFUSD of testing age do not take the CST, that is what is alarming. About 12-13% take the CMA, a bit more than 1% take the CAPA … and the rest? They are just forgotten, are they suspended from school during the testing days? Or truant? Is anybody paying attention? Does anybody care?
I posted the below at Doug McRae’s EdSource article and wanted to fill you in. The conclusion is that the Scale Calibration Factor (SCF) does adjust the API score lower for CMA results, but the adjustment is small (5 points lower). If the CMA results are redistributed to CST performance levels of below basic (BB) and far below basic (FBB), then the adjustment is larger (16 points lower). I’m not sure if this redistribution is fair since there are no studies on calibrating CMA results against the CST. My API calculations are based on the 2012 SFUSD STAR results. I calculated an API of 801 for SFUSD vs. the actual API of 807 (close enough – the calculation is complex and I didn’t expect to calculate the exact same API).
“Hi Doug, Thanks again for providing the details behind your calculations. I was thinking that the Scale Calibration Factor (SCF) would provide some calibration between CMA and CST scores. Since the SCF is negative for SWD results and positive for non-SWD results, the same proficiency level on the CMA and the CST would result in a lower API score for the CMA case. I used the 2012 CST & CMA results for a large urban school district and calculated the following API scores:
1) 801 – CST results entered into non-SWD worksheets (positive SCF) & CMA results entered into SWD worksheets (negative SCF)
2) 810 – CST results entered into non-SWD worksheets and CMA results thrown out
3) 805 – CST and CMA results both entered into non-SWD worksheets
4) 789 – CST results entered into non-SWD worksheets and CMA results entered into non-SWD worksheets with the following mapping: CMA Advanced and Proficient distributed to CST Below Basic (BB); CMA Basic, Below Basic and Far Below Basic distributed to CST Far Below Basic (FBB) (similar to how you redistributed CMA results)
So indeed, the SCF does adjust the API score lower for CMA results, but the adjustment is small in my example (805 -> 801, difference = 4). Redistributing CMA results to CST BB and FBB results in a much larger downward adjustment (805 -> 789, difference = 16).”
Doug McRae’s response:
“Yes, Dennis, your calculations confirm my analysis on the effect of the Scale Calibration Factor. The SCF does partially correct the APIs for the effect of the initiation of CMAs, but does not provide a full correction and as a result over time the API trend data has been artificially boosted due to the introduction of the easier tests without adequate adjustments to keep API trend data apples-to-apples over time.”
API scores aren’t broken down by gender, but CST results are. For the 2012 CST, there were 20681 male and 19624 female test takers in the District, or roughly 50-50. Some statistics by gender (percentage at proficient or advanced levels):
ELA – female: 64%; male: 57%
Math – female: 58%; male: 56%
Life Science – female: 62%; male: 61%
Biology – female: 52%; male: 50%
Chemistry – female: 43%; male: 43%
Earth Science – female: 47%; male: 45%
Physics – female: 44%; male: 49%
There are wider spreads in ELA and physics, but otherwise, the percentages between females and males are close in the other subjects.
The calculation of the API is documented here: http://www.cde.ca.gov/ta/ac/ap/documents/infoguide12.pdf.
From the bottom of page 8 – “The addition of CMA into the API does not change the API test weights, and the same test weights and calculation rules used for the CST also apply to the CMA.”
It’s clear from this statement that the CST and CMA are given the same weights as defined on pages 42-44.
What I can’t quite figure out is whether the Scale Calibration Factor (SCF) also affects the “weighting”. From page 46 – “The purpose of the SCF is to preserve the API scale and maintain consistency in the statewide average API from one reporting cycle to the next. The SCF provides a positive or negative adjustment to each API each year.” A table on the same page shows negative SCFs for students with disabilities (i.e., CMA test takers, e.g., -33.81 for grades 3-5) and positive SCFs for students with no disabilities (i.e., CST test takes, e.g., +22.32 for grades 3-5). The way I interpret this table is that the same proficiency level on the CST and CMA would result in a higher API score for the CST case.
Doug McRae wrote an excellent piece for EdSource onthe CMA testing inflating API scores:
“In addition, CMA scores count the same as CST scores for API calculations, even though the state Department of Education acknowledges that the CMA is an easier test. The result has been to inflate reporting of API trend data over the past few years, and more importantly to cause a subtle but substantial lowering of academic standards that we expect for our students with disabilities in California.”
“It’s true that each year more of the lowest-scoring special education students now take a different assessment (the California Modified Assessment) that removes them from the API calculation…, but I don’t know how much or how little this phenomenon affects the district’s API scores…”
I think the CMA scores are still included in the API calculation. The CDE provides a spreadsheet for estimating API (http://www.cde.ca.gov/ta/ac/ap/documents/calc11b12g.xls), and there are separate worksheets for students with and without disabilities. There is a negative scale calibration factor for CMA and a positive factor for CST. In other words, the same proficiency level on the CMA and the CST would result in a lower API score for the CMA case. So it looks like the API calculation does include the CMA and it also tries to calibrate the results between the CST and the CMA.
I took a look at the SFUSD STAR again after reading the article, “SFUSD STAR Test Scores and Student Achievement: Another Look.” Here’s what I found:
2012 SFUSD African American
3374 took CST ELA
1179 scored proficient or advanced on the CST ELA
519 took the CMA ELA
% proficient or advanced (CST only) = 1179/3374 = 35%
% proficient or advanced (all) = 1179/3986 = 30%
2007 SFUSD African American
4964 took CST ELA
1092 scored proficient or advanced on the CST ELA
% proficient or advanced (CST only) = 1092/4964 = 22%
% proficient or advanced (all) = 1092/5083 = 21%
The data above corroborates the claim in the District’s press release (http://www.sfusd.edu/en/assets/sfusd-staff/news-and-calendars/files/8%2031%2012%20SF%20Students%20Show%20Academic%20Achievement%20Gains.pdf) that the “five-year proficiency growth in English Language Arts for African-American students was 13 percentage points.” One (pessimistic) way to factor in the CMA is to assume that all CMA results are not proficient or advanced. In this case the 5-year growth in ELA is 9% instead of 13%.
A similar analysis for the SFUSD Latino student population shows a 5-year growth in ELA of 8% instead of 10% when factoring in the CMA.
One other statistic that I found interesting is that the number of African American students in the District has declined by over 1000 over the last 5 years. The Latino student population, instead, has increased by about 350 over the same time period.
Matt and Rachel,
You might interested in reading this article: http://www.beyondchron.org/articles/School_Beat_SFUSD_STAR_Test_Scores_and_Student_Achievement_Another_Look_10611.html
Unfortunately, while the CDE has gender breakdowns for some measurements (dropout rate, course enrollment, etc.), they do not report the results of tests by gender. Too bad–that would have been interesting data.
I’m going to see if the stats for enrollment in advanced math and science classes could be a reasonable proxy for math and science performance, but even if it is, I don’t know that that would tell me anything about the situation in middle school.
@M – Thanks! Glad you were able to test your hypothesis 🙂 Did you happen to look at the gender question in the CDE data?
Thanks, Rachel, for that informative link. It turns out that my scenario was completely wrong: not only were the demographic changes in the district completely trivial, but the scores of all of the subgroups increased by similar amounts–on the order of 12 points. I’m still not convinced that the change from any given year to the next is significant, but the small gains have been steady over the past five years.
My fear had been that economically disadvantaged families had moved out of the city as the economy worsened starting in 2007. But that has not been the case, at least as reflected by the numbers of students taking the test.
So modest kudos to the district and board!
HI Matt – Thanks for the comments. Generally I would say the district (basically the administration and the Board) are not that keen on evaluating schools solely on the basis of testing, and are clear-eyed on the limited usefulness of API as a measurement of student progress. Still, 800 is something, and it’s important to let parents and staff at school sites know that the hard work they put in every day is making a difference and is being noted by the state.
I don’t have the statistical acumen nor the detailed data that would be needed to analyze whether the increase in API is due to demographic changes — the district’s overall demographics have not changed all that much in recent years. It’s true that each year more of the lowest-scoring special education students now take a different assessment (the California Modified Assessment) that removes them from the API calculation — I’m concerned about this and have raised this question publicly a number of times, but I don’t know how much or how little this phenomenon affects the district’s API scores for the better.
On the gender question, I don’t recall seeing that analysis, but it is a great question. I will ask it. In the meantime, you can analyze the API, demographic changes, and achievement on the California Standards Test over time yourself by using data at this link: http://dq.cde.ca.gov/dataquest/
I notice that the API scores, like many of the measures of achievement, break out results by various factors that include ethnicity, language, SES, etc. I don’t recall seeing such breakouts by gender. Is there any concern on the board about the very well-documented gender gap in math and science, evident beginning in middle school? Or is the scale of this problem not large or dire enough compared to other gaps the district is addressing?
I wonder just how much we should be celebrating these achievements. As you say, the 800-barrier is entirely a psychological one. Moreover–and this is something I see frequently in school district program assessments–there is no indication given of the statistical significance of changes in these scores. Does the gain of 5 points that a particular school made have any meaning whatsoever? Absent some notion of variance, point estimates such as these are not useful for decision making.
In addition, I think the district should be considering that even these modest gains might be due to processes outside of their control. It is just as likely that changing demographics are completely responsible for the change, for example, as that there has been any improvement in educational technique. You’ve noted that there are subgroups that perform differently. Assuming that these differences are statistically significant, can you say anything about how the proportions of children in these groups have changed since last year? A very simple model that assumes static test scores within demographic groups just might give increases similar to what you show here.
Educational outcomes are the products of many factors. With this summary, the district seems to be ignoring all of those complexities and assuming what is most convenient to assume–that their policies are responsible for the observed increase. I can only trust that there is a deeper analysis, to which rigorous standards have been applied, that is used to drive actual decision-making. A statistician drawing inferences from something this simplistic would be guilty of professional malpractice.
As to the focus on API scores more generally, I can see the utility in having SOME metric for school performance. But do the Board and the District ever consider that they might have been drawn too deeply into the Bush-era assumptions of NCLB? Have there been assessments of the costs and benefits of buying into the test-administration machinery? Might there be better, more cost-effective ways to assess YOUR performance, than by administering a test to every student in the district, and reading a 1.5% change as something positive?