By Dave DeFusco
At the Katz School’s Graduate Symposium on Science, Technology and Health, a team of mathematics and occupational therapy students presented a project that could reshape how we understand, assess and support children’s play. Their project, “AI-Powered Play Assessment Using Video Language Models,” promises to automate one of the most nuanced tasks in pediatric care: evaluating joint play between children and their caregivers.
Dengyi Liu, a student in the Ph.D. in Mathematics, and Vanessa Murad and Chana Cunin, both students in the Occupational Therapy Doctorate, developed and tested an artificial intelligence model capable of analyzing parent-child interactions with unprecedented efficiency and precision.
“Observation-based assessments are powerful but time-consuming,” said Liu. “We wanted to build a system that could take in a 10-minute video and identify, track and evaluate play behaviors using the same criteria clinicians use—only faster and without fatigue or inconsistency.”
At the heart of this work is the Parent/Caregiver Support of Children’s Playfulness (PC-SCP), a validated observational tool developed in 2023 by Dr. Amiya Waldman-Levi, a clinical associate professor of occupational therapy at the Katz School, and Dr. Anita Bundy, a professor in the College of Health and Human Sciences at Colorado State University. The PC-SCP evaluates the quality of joint play experiences, which are crucial to children’s social, emotional and cognitive development.
“Even with trained raters, scoring PC-SCP manually can take hours, and the results can vary depending on who’s watching the video,” said Dr. Waldman-Levi. “We needed a better way to scale our work while keeping the rigor.”
That’s where AI comes in. Liu and the team fine-tuned Qwen2.5-VL, a cutting-edge video language model designed to understand both visual and textual data. “The model’s architecture allows it to process video frames and textual prompts together, much like how a human therapist watches, listens and interprets at once,” said Liu.
By feeding the model annotated video Q&A datasets, the team trained it to recognize key elements of joint play—cooperation, initiation, responsiveness, and more. The study recruited 39 mother-child pairs, encompassing both neurotypical and neurodiverse children between ages 2 and 6. Most mothers were college-educated, English-speaking and married, with moderate to high household income. Importantly, both manual and AI assessments were conducted on the same 60 video clips. Manual PC-SCP scoring showed strong inter-rater reliability (75%–100%), which served as the gold standard for evaluating the AI’s performance.
“The AI model achieved a top-five accuracy of 61.3% and 40.7% precision on key scoring items,” said Murad. “That’s a solid result considering the complexity of what it’s being asked to do, which is essentially replicate a trained clinician’s observational judgment.”
Cunin emphasized the real-world impact. “This could change how occupational therapists engage with families,” she said. “Instead of spending hours reviewing video and scoring interactions frame by frame, clinicians can rely on AI to do the heavy lifting, freeing their time for intervention and counseling.”
Dr. Waldman-Levi, who served as a ¶¶Ňőapp advisor, noted the broader vision: “This is not about replacing therapists; it’s about extending their reach,” she said. “With automated scoring, we can conduct larger studies, reach more diverse populations and reduce human bias in assessment. That’s an ethical win as much as a scientific one.”
Dr. Honggang Wang, professor and chair of the Department of Graduate Computer Science and Engineering, said the project is the kind of interdisciplinary collaboration the Katz School aims to foster. “Combining deep learning with behavioral science opens entirely new frontiers in healthcare diagnostics,” he said. “It’s not just innovation—it’s mission-driven innovation.”
Still, the team is aware of limitations. Variability in video quality and diversity of interaction styles remain a challenge. “Generalizing the model across different socioeconomic and cultural contexts will require more diverse training data,” said Liu. “Our next step is to expand the dataset, refine the model’s sensitivity and begin real-world testing in clinical settings.”
The potential is enormous. Automated scoring can cut down analysis time from hours to minutes and scale longitudinal studies on child development in ways previously unimaginable. In a healthcare system strained by labor shortages and rising costs, the model offers a tool that’s both efficient and reliable.
“This is just the beginning,” said Murad. “Our vision is to create AI tools that can assist in early diagnosis, track developmental progress and support interventions—all while honoring the human relationships at the heart of pediatric care.”