A Survey of Pose-Based Deep Learning Techniques for Student Behavior Recognition in Educational Environments
Abstract
This survey reviews pose‐based deep learning methods for classroom behavior recognition, comparing recurrent (PoseRNN, Bi-LSTM), graph-convolutional (ST-GCN, Edge-ST-GCN), attention-augmented (AGCN, TSST-GCN), and transformer (PoseFormer, ActionFormer) architectures. We highlight skeleton data’s advantages—privacy, robustness, and efficiency—and summarize each model’s trade-offs in accuracy, latency, and interpretability. Across benchmark datasets, graph-based approaches offer the best real-time performance, while transformers excel in capturing complex, long-duration actions at higher computational cost. Finally, we identify key research directions: multimodal fusion, personalized adaptation, bias mitigation, and edge-optimized deployment.


