Your Data Is Incomplete. That's the Point Good Systems Don’t Replace Human Judgement - They Extend It

I don’t have a philosophical issue with standardised testing. I understand why it’s contentious, but when used well, it is one of the most efficient and least biased ways to understand patterns, gaps, and program efficacy at a school level over time. It allows for longitudinal analysis, comparability across cohorts, and direct mapping to curriculum outcomes in ways that internal assessment alone cannot reliably do.

This isn’t a popular view. I know that. But I’ve seen firsthand what happens when we treat this data not as a judgement, but as a signal. And it has changed everything about how I see school improvement and how I consult with schools.

A Case Study in Mathematical Methods

A case in point. As a result of analysing a Victorian Certificate of Education (VCE) cohort in a P-12 school (for those readers not in Australia this is our final two years of secondary education), I noticed persistently lower outcomes in Mathematical Methods and limited enrolment in Specialist Mathematics. Anecdotally, teachers spoke about skill gaps and readiness issues, but rather than relying on perception, I returned to the most defensible evidence available: NAPLAN and PAT standardised testing.

Each dataset offers different insights. NAPLAN provides curriculum-linked strand performance assessed every two years from year 3 to 9; PAT offers adaptive, scaled insight into student readiness and growth. By analysing two years of data across both primary and secondary, clear patterns emerged. Measurement and Geometry were consistently flagged in NAPLAN across specific year levels in Primary and Secondary, and this finding was mirrored in PAT data.

When these findings were mapped against the skill demands of Mathematical Methods and Specialist Mathematics, the connection became explicit: gaps in Measurement and Geometry directly undermined the conceptual foundations required for higher-level mathematics.

In this context, standardised testing didn’t replace professional judgement, it disciplined it. It allowed us to see what intuition alone could not, and to move from blame or anecdote to precise instructional, and curriculum and assessment reform.

 

"Again, this is about making the time and space for collaborative conversations and evidence based practice. Testing isn’t useful if only one person looks at it...The data didn’t undermine teacher expertise, it protected it."

 

Dr Ingrid H Lee

 

From Data to Systemic Reform

Working with curriculum leaders, we traced these gaps back to curriculum mapping and delivery design. Inconsistencies in scope and sequence, uneven emphasis on key outcomes, and assessment designs that conflated marks masked underlying weaknesses. Internal assessments were not consistently aligned with curriculum standards, giving students inflated indicators of readiness that were not borne out in standardised measures.

 The data didn’t undermine teacher expertise, it protected it. It prevented us from relying on anecdote, bias, or assumptions and instead allowed us to ask better questions: Where exactly is the breakdown? When does it begin? Who is affected? And why?

These collaborative approaches helped me define the strategic plan for numeracy across the school, but more so developed the data literacy of leaders and teachers and the value of evidenced based documentation to document and measure our approaches for growth.

The Duty of Care in Student Placement

In some cases, I have also used standardised data to support student placement into programs. This decision wasn’t taken lightly. Where internal assessment data was inconsistent or misaligned, its validity as a comparative measure was compromised. Here, my focus was on creating pathways for student success. To do this, I designed a placement process that used a careful triangulation of standardised and school-based data. This provided a well-rounded understanding of each student's abilities and potential, allowing the curriuclum coordinator and Head of Mathematics and myself to recommend programs that genuinely aligned with student readiness and set them up for a positive and challenging learning experience. In those circumstances, I turned to the only remaining data sources that are externally benchmarked, longitudinal, and standardised: NAPLAN and PAT.

This approach reduces bias. It counters assumptions about student capacity in junior secondary classes that can unintentionally arise, particularly when teachers have not taught senior secondary courses and may not fully appreciate the cognitive load and pacing required. The intention was not to override professional judgement, but to calibrate it against shared, external reference points.

This tension is important. When readiness is overstated, students are set up for disappointment later. Using robust, external data earlier allows for honest conversations, targeted support, and more ethical pathway guidance. It protects students from being placed into programs where the likelihood of success is low, not because of effort or attitude, but because of unmet prerequisite skills. This reframes assessment not as a gatekeeping tool, but as a duty of care.

 

Author: Dr Ingrid H Lee. Making space for possibility in education. I write about curriculum, learning, governance, and leadership in education - examining accountability, systems, and what holds up when pressure hits. When I'm not thinking about systems, I'm usually hand-milling flour for sourdough, sketching and painting in the countryside, or being supervised by my two miniature poodles, Monty and Ivy.