Human-Action-Recognition

Human Action Recognition Using CNN and LSTM

This project focuses on human action recognition using advanced deep learning techniques, specifically Convolutional LSTM (ConvLSTM) and Long-term Recurrent Convolutional Networks (LRCN). The goal is to accurately classify human actions in video sequences, leveraging both spatial and temporal features.

Dataset

This project uses the UCF50 dataset, which is a popular benchmark dataset for human action recognition.

Description: The UCF50 dataset contains 50 action categories with a total of 6,618 videos. Each video is labeled with one of the 50 action categories, including actions like "Jumping Jack," "Walking," and "Running." The dataset covers a wide range of human actions and is diverse in terms of viewpoints and backgrounds. Organization: The dataset is organized into directories where each directory corresponds to a specific action class. Each action directory contains several video files of that particular action.

Libraries

Video Processing: cv2, moviepy.editor, pafy, youtube-dl, pytube
Data Manipulation: numpy, datetime, collections.deque
Machine Learning: tensorflow, tensorflow.keras.layers, tensorflow.keras.models
Utilities: sklearn.model_selection.train_test_split, matplotlib.pyplot, tensorflow.keras.utils.to_categorical, tensorflow.keras.utils.plot_model, tensorflow.keras.callbacks.EarlyStopping

Approaches

1. Convolutional LSTM (ConvLSTM)

The ConvLSTM model integrates convolutional operations with LSTM units to capture both spatial and temporal information. This approach is well-suited for recognizing actions in video sequences by processing frames with spatial filters and then learning temporal dependencies.

Architecture: The ConvLSTM model consists of several ConvLSTM layers followed by fully connected layers for classification.

2. Long-term Recurrent Convolutional Networks (LRCN)

The LRCN model combines Convolutional Neural Networks (CNN) with LSTM networks. The CNN extracts spatial features from individual frames, and the LSTM captures temporal dependencies across frames.

Architecture: The LRCN model first applies CNN layers to each frame to obtain feature vectors, which are then fed into an LSTM network for action classification.

Test Video

Test.Video.mp4

Output

download.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
ActionRecognitionn.ipynb		ActionRecognitionn.ipynb
Conv_LSTM_model___Date_Time_2024_08_21__14_16_14___Loss_0.7459816932678223___Accuracy_0.7622950673103333.h5		Conv_LSTM_model___Date_Time_2024_08_21__14_16_14___Loss_0.7459816932678223___Accuracy_0.7622950673103333.h5
LRCN_model___Date_Time_2024_08_21__14_17_36___Loss_0.2676723599433899___Accuracy_0.9180327653884888.h5		LRCN_model___Date_Time_2024_08_21__14_17_36___Loss_0.2676723599433899___Accuracy_0.9180327653884888.h5
README.md		README.md
Test Video.mp4		Test Video.mp4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Human-Action-Recognition

Dataset

Libraries

Approaches

1. Convolutional LSTM (ConvLSTM)

2. Long-term Recurrent Convolutional Networks (LRCN)

Test Video

Output

About

Releases

Packages

Languages

sara-salah1/Human-Action-Recognition

Folders and files

Latest commit

History

Repository files navigation

Human-Action-Recognition

Dataset

Libraries

Approaches

1. Convolutional LSTM (ConvLSTM)

2. Long-term Recurrent Convolutional Networks (LRCN)

Test Video

Output

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages