Classification of Student Actions Through 2D Pose-Based CNN-LSTM Networks
Keywords:
CNN-LSTM, 2D Pose Estimation, Student Action ClassificationAbstract
Automated analysis of student behavior in classrooms offers educators a reliable method to enhance engagement and assess participation. This study introduces a 2D pose-based CNN-LSTM model designed to classify student actions—hand raising, writing, and reading—from video data using the EduNet dataset. Video frames were processed with Mediapipe to extract pose landmarks, focusing on upper-body features. The proposed architecture leverages Convolutional Neural Networks (CNN) for spatial analysis and Long Short-Term Memory (LSTM) units for temporal sequence understanding. Despite constraints imposed by a limited dataset size, the model successfully achieved a validation accuracy of 98.83%. These findings confirm that pose-based approaches provide precise, efficient alternatives to traditional behavior analysis. Future enhancements, such as expanding datasets and modeling multi-person scenarios, are recommended to improve applicability in diverse classroom environments.


