
Rose Wang, PhD Candidate at Stanford University, on The Effect of Tutor CoPilot for Virtual Tutoring Sessions: Testing an AI Intervention to Improve Tutor Instruction
Research Learning Community
This year’s theme was “Proximate Measures of Engagement, Learning, and Relationships for High-Impact Tutoring.” Research continues to demonstrate that high-impact tutoring meaningfully improves student academic outcomes. However, the status quo is to use distal outcomes, like those available in district administrative data, to measure the impact of tutoring. As a group, researchers considered how we can learn what is going on faster and how to gain more nuanced insights into the tutoring experience.
Learnings from the field
The breakout focused on how engagement, relationships, and learning can be measured on a spectrum from proximate to distal outcomes. Proximate outcomes are immediate, direct measures that can be observed or collected in close temporal proximity to the intervention. Distal outcomes are long-term, broader measures that reflect the overall impact of an intervention over an extended period.



Carly Robinson, NSSA’s Director of Research, touched upon how observations, surveys, formative assessments, and exit tickets are a step in the direction of providing more nuanced and proximate measures of a student’s tutoring experience.
Dora Demszky, an Assistant Professor at the Stanford Graduate School of Education, then introduced how natural language processing (NLP) methods can help us scalably examine tutor-student discourse. She provided an overview of how researchers can think about fine-tuning large language models (LLMs) for measurement, and gave examples of measures her team has created to date.
Jing Liu, Assistant Professor of Education at the University of Maryland, concluded the presentation with a discussion of the potential for audio and video data sources to inform how we measure engagement, relationships, and learning. He talked specifically about using audio data to identify instances of rapport building and how video data can be used to detect emotions. Overall, researchers were left considering the why, the how, and the trade-offs of using these measures.
Exciting new directions
Ultimately, researchers engaged in discussions around these questions:
- What are the compelling proximate measures of engagement, relationships, and learning?
- What exciting questions do these measures enable?
- How can we better share these measures with one another?
Some highlights of the discussion included the following topics:
- Building upon measures and approaches used in other fields
- Considering what surveys can and cannot answer (and how they might backfire)
- Developing measures of tutor-student rapport using audio and video data
- Creating measures of student attention from videos
- Considering how progress monitoring be used more effectively and in real-time
- Measuring culturally responsive actions and considering how we account for differences in context
- Identifying areas of the curriculum that are not engaging or causing unproductive struggles
- Using these methods to overhaul tutor development + coaching
- Developing more open source materials
- Considering when and if it is appropriate to use proximate measures for evaluation
