MESQUITE, Tex. — Reading is 9-year-old Kristin Hernandez’s thing. She pores over mystery books, stories about vampires and even a college-level anatomy textbook that her mother is studying to become an X-ray technician.
So when her parents, Jessica and Alberto Hernandez, found out last summer that she had scored below grade level on the reading section of Texas’ annual high-stakes standardized test, they figured she had just had a bad test day. After all, Kristin is prone to nervousness, pushing her tortoiseshell glasses onto her forehead and rubbing her temples.
But now a group of prominent state school superintendents and education experts is arguing that Texas has mistakenly identified Kristin and thousands of other students as falling short, when in fact their performance on the state test is well within grade-level reading standards.
The test, the State of Texas Assessments of Academic Readiness, or Staar, can have profound consequences not just for students but for schools across the state, hundreds of which have been deemed inadequate and are subject to interventions that critics say are undue.
Many Texas students who were told they had not reached grade-level reading expectations on the Staar test also received separate scores that are at grade level. In addition, experts have raised concerns about the quality of questions on the exams, and whether they are appropriate for children in the tested grades.
Facing growing pressure from educators, the State Legislature has scheduled a hearing this week to consider the future of the test. But the Texas Education Agency continues to stand by Staar, saying the system is fair and supported by research.
The battle over reading in Texas is the latest in a national war over the future of education reform. From teacher picket lines to the halls of state capitols, public school educators and their political allies are pushing back against decades of laws they say have been punitive to traditional schools.
A persistent narrative of failure, backed by low student test scores, has undermined the public’s trust in local education systems, critics say, and has paved the way for policies that shift students and taxpayer dollars toward charter schools and private school vouchers.
On the other side of the debate are school reformers who contend that tough accountability systems like Staar are a civil rights imperative, and that they protect low-income students and students of color from what President George W. Bush famously called “the soft bigotry of low expectations.”
The 2018 Staar tests found that 58 percent of Texas third graders are not reading at grade level. On the 2017 National Assessment of Educational Progress, given to a sample of fourth graders across the country, 72 percent of Texas students were not proficient in reading — a fact the state has cited as evidence that tough local standards are warranted.
More than half of the state’s public school students are Hispanic and nearly 60 percent come from low-income families. About a fifth are still learning English.
Texas is, in many ways, the birthplace of the American education reform movement. It was among the first to use student test scores to rate schools. But the state has also been accused, repeatedly, of lowering standards to inflate performance, and has made a concerted effort in recent years to raise them. Now it is being accused of overcorrecting.
“Every parent wants their kid to do better,” said Jeff Cottrill, deputy commissioner of the Texas Education Agency. “When you hear maybe it’s the test’s fault, it makes you feel a little bit better.”
But Mr. Cottrill defended the Staar exams, and warned against a false sense of complacency. Any attack on standardized testing, he said, “has the ability to destabilize high expectations for students.”
One Reading Test, Two Scores
Kristin took the Staar last spring, when she was in third grade. Her parents, who live in this working-class suburb east of Dallas, were later told in a report that their daughter was “approaching” grade level in reading, like a third of Texas third graders who took the test — some 128,000 students.
There are four categories of performance on the Staar: “did not meet grade level,” which means a student failed the test and, in some grades, could be held back; “approaches grade level,” which means a student like Kristin did not meet all expectations and will be targeted for extra help; “meets grade level”; and “masters grade level.”
When Ms. and Mr. Hernandez turned the page on the score report, they saw a second reading score, called a Lexile measure. Teachers and administrators across the country regularly use Lexile measures, developed by a company called MetaMetrics, to help them match students to reading materials. With a score of 680L, Kristin was considered ready for books like “The Boxcar Children.”
Kristin’s Lexile measure was considered on grade level. In order to stay on track for college and the job market, third graders should be reading and understanding texts at a measure between 520L and 820L, according to national goals developed in 2012 as part of the Common Core State Standards Initiative.
That this number even appeared on Kristin’s score report is surprising: Texas was one of a handful of states that rejected the Common Core, a national set of reading and math standards
For years, Texas’ conservative elected officials, including Gov. Greg Abbott, denounced the Common Core initiative as an attempt to diminish state and local control of education. In 2013, lawmakers voted to ban its use.
Mr. Cottrill, the education agency’s deputy commissioner, warned against comparing the two scores, saying the Lexile measure did not fully capture the full scope of the state’s reading standards.
The two scores, though, have resulted in confusion among educators.
“We are always looking to improve,” said Andrea Bailey, coordinator of elementary English and language arts for the Mesquite School District. “If we don’t understand what the mark is, it’s really hard to reach that place.”
It is up to schools how to provide extra help to students in the “approaches grade level” category. Students might get assistance in the classroom, or in pullout sessions that in some cases mean they will miss art or music instruction.
Schools have every incentive to raise “approaches” students like Kristin into the higher categories. Texas grades its districts on an A through F scale, in part based on how many students are meeting or exceeding grade-level standards.
The Mesquite district received a C last year. This year, individual schools will also receive a letter grade. If Kristin’s school, Ruby Shaw Elementary, performs as it did last year, it will receive a high D.
Persistently failing schools, and districts with just a single such school, can be shut down or taken over by the state — a threat facing the state’s largest school system, in Houston.
The pressure is especially acute in schools like Shaw Elementary, where most children come from low-income families and are more likely to struggle on standardized tests.
Shaw Elementary asks some students to come to class on Saturdays. All week long, teachers build lessons around concepts tested on Staar. Mesquite is one of several districts lobbying the state to place less weight on Staar.
Kim Dumaine-Banuelos, the principal of Shaw Elementary, said that not only was the state grading the test too harshly, but that some of the reading passages on the tests themselves appeared to be harder than they should have been.
Two academic papers, published in 2012 and 2016, concluded that, on average, reading passages on Staar tests were written one to three grade levels above the tested grade level. The Texas Education Agency has said it was in the process of making changes to the exam.
The New York Times asked an independent expert on reading and testing, Peter Afflerbach of the University of Maryland, to examine last year’s third-grade Staar reading test. He found that the test held a risk of underestimating students’ capabilities.
Several of the reading passages were longer than average for a third-grade test, Professor Afflerbach said. The content of two of the passages, about making sand sculptures at the beach and stargazing with a telescope, could disadvantage low-income students, who would be less likely to have had such experiences. One of the questions appeared to have two potentially correct answers, while another question seemed to have no correct answer, he said.
A spokesman for the Educational Testing Service, the nonprofit that produces Staar, referred questions to the Texas Education Agency. The agency said it stood by the test questions, which had been approved by a panel of teachers and field-tested on Texas students.
Just a few questions on the test can make a big difference for a student. One of Kristin’s schoolmates, Jacob Weempe, missed the cutoff between “approaches grade level” and “meets standards” by a single question. When his mother, Joanne Nagahiro, a medical assistant, found out his score, she signed him up for private tutoring. It cost $200 per month and, along with Saturday classes at the school, took up much of Jacob’s weekend for a time.
Jacob’s Lexile score was 710L, according to his report, which was squarely on grade level according to the Common Core measures.
Now that Ms. Nagahiro knows there are questions about the test’s quality and grading, “I have guilt,” she said. “I feel like I put a lot of pressure on my child.”