COMPARISON BETWEEN FACE AND GAIT HUMAN RECOGNITION USING ENHANCED CONVOLUTIONAL NEURAL NETWORK

Identifying people at distance is an important task in daily life Because of the increase in terrorism. Biometrics is a better solution to overcome personal identity problems, and this applies to soft biometrics also. Soft biometric are features that can be extracted remotely and do not require cooperation with people. This paper introduces a comparison between human face recognition and human gait recognition using soft biometric features. Nine face attributes and nine gait attributes are taken from a dataset built by researchers. The constructed dataset is composed from (66) videos for (33) persons. Features are extracted using Haar and MediaPipe methods. The extracted features are classified using enhanced convolutional neural network. This work achieves an accuracy of 95.832% in human face recognition and an accuracy of 89.583% in human gait recognition. From the above results it turns out that the proposed method achieved promising results with regard to Recognize people remotely.


Introduction
Identification from a distance has become important due to the ever-increasing surveillance infrastructure that is being deployed in society (Reid et al., 2013).The use of biometrics in several surveillance applications such as human recognition or re-identification (Prakash et al., 2023).The difficulty of identifying people in varied positions is one of the main challenges modern person recognition techniques must overcome (Gharghabi et al., 2016).The development of a novel identification method known as soft biometrics is the result of substantial advancements made in the field of biometrics (Arigbabu et al., 2015).A need for biometric characteristics that enable the identification of people in difficult circumstances without demanding their cooperation has emerged (Reid & Nixon, 2013).Soft biometric features are defined as the characteristics that give distinctiveness and permanence to sufficiently differentiate any two individuals (Sikkandar & Thiyagarajan, 2020).Developing a standalone soft biometric system for recognition or retrieval is currently one of the biggest challenges due to certain critical factors directly affecting accuracy (Hassan et al., 2023).
Biometrics aims to recognize a person.Traditional biometrics has excellent accuracy and great versatility.But it is difficult to collect physical data from a distance, and also cooperation is often required like with lifting fingerprint (Kim et al., 2012).For this task, soft biometrics are valuable, since they provide useful information for identifying individuals (Romero et al., 2020).Soft biometrics is a new form of biometric identification which utilizes labeled physical or behavioral traits (Reid & Nixon, 2011).The term soft biometrics has been introduced in ( 2004) by Jain et. al. to describe a set of characteristics that provide some information about the individual (Jain et al., 2004).Soft biometric traits include characteristics such colors of eye, hair, beard and skin; shape and size of the head; general discriminators like height or weight and also description of indelible marks such as birth marks, scars or tattoos (Dantcheva et al., 2011).Anthropometric attributes are one type of soft biometrics that are refer to geometric and shape features of the face, body, and skeleton (Kang et al., 2018).The human face has various traits that have different levels of distinctiveness (Almudhahka et al., 2026).Usually, the overall accuracy of any soft biometrics system depends upon multiple factors, one of which is attribute correlation which is termed as supportive or non-supportive behavior of attributes towards each other during recognition process (Hassan & Izquierdo, 2022).
The link between soft biometrics and human description is one of the key benefits of using them; people intuitively utilize soft biometric qualities to identify and describe one another.Soft biometrics enable retrieval as well as identification.This is accomplished by bridging the semantic gap between human descriptions and biometric measures (Reid & Nixon, 2011).Systems that use soft biometrics have several advantages: they may be partially derived from the most widely used classical biometric identifiers; their acquisition is unobtrusive and requires no enrollment; training can be carried out beforehand on people who are not part of the target identification group (Dantcheva et al., 2011).The evaluation of recognition performance in soft biometric recognition poses a challenge due to the absence of a standardized dataset, particularly when dealing with soft biometrics at varying distances.However, a notable advantage of soft biometrics is its flexibility regarding the resolution of captured images.Soft biometric labels can be collected even from a distance, eliminating the need for stringent image resolution requirements (Guo et al., 2019).While face and gait are the primary biometrics applicable at a distance, they can encounter challenges such as low frame rate and/or resolution in surveillance situations.Nevertheless, it is noteworthy that even with low-resolution images, a comprehensive human description of the subjects can still be provided (Reid et al., 2013).In contrast, soft biometrics present numerous advantages over alternative distance identification methods.They can be extracted from videos with low resolution and low frame rate, and possess robust attributes that remain unaffected by factors like camera viewpoint, sensor aging, and scene illumination (Tome et al., 2014).
This paper uses soft biometric features an excellent accuracy was achieved in recognize person.Two different methods were used, the first method uses facial features.The face was detected using Haar classifiers.The trained Haar cascade analyses a picture and decides if the target item is there (Rudinskaya & Paringer, 2020).Then features were extracted using MediaPipe techniques.MediaPipe is an open-source framework with a hybrid platform that creates pipelines for processing perceptual data (Subramanian et al., 2022).Nine soft biometric features for face (left eye distance, right eye distance, mouth distance, nose distance, area left, area right, area 2d mouth, area 2d region left, area 2d region right).
The second method used gait features extracted using MediaPipe techniques.Nine soft biometric features for gait (angle elbow, angle hip, length hip, angle mid hip, angle ankle, length ankle, distains ankle1, distains ankle2, distains mid pinky).Classification was done using convolutional neural network (CNN) for both methods.CNN is a supervised method, consists of an input and an output layer, as well as multiple hidden layers (Sakib et al., 2019).Finally, the two methods were compared.

Literature Review
In the past years (2014-2019), deep learning algorithms and specifically deep Convolutional Neural Networks (CNNs) have led to breakthroughs in many application domains including biometric recognition (Minaee et al., 2023).Although deep learning research in biometrics has achieved good results, however, there is great room for improvement in different directions, such as using soft biometrics.Soft biometrics is extracted set of features from all human body (Garg et al., 2018).Becerra-Riera et al (2019) presented an overview on face describable visual features and in particular of the so-called soft biometrics (e.g., facial marks, gender, age, skin color, and other physical characteristics).Terhörst et al (2020) proposed NFR (Negative Face Recognition), a new face recognition approach that enhances the soft-biometric privacy on the template-level by representing face templates in a complementary (negative) domain.Negative templates describe facial properties that does not exist for this person, while ordinary templates characterize facial properties of person.Sun et al (2023) they used convolutional neural networks (CNNs) for facial point detection.Global high-level features are extracted over the whole face region at the initialization stage, which help to locate high accuracy keypoints.Fard et al (2021) proposed a ASMNet, a lightweight Convolutional Neural Network (CNN) structure with multi-tasking learning to detect facial landmark points and to estimate head position.Vukadinovic & Pantic (2005) they proposed a robust, highly accurate method for detecting 20 facial points using GentleBoost classifier learned on features extracted with Gabor filters.Arca et al (2006) they introduced a completely automatic face recognition system.This method works on color images they extracted (24) facial feature points, then extracted the feature of the points by Gabor.The recognition of the faces by compare the similarity of points' feature.Saleem et al (2023) they proposed a system to identify a person from a video clip using facenet algorithm.This system analyzed the facial features of a person and accordingly classified the images.
In addition to facial features, there are soft biometric features that are extracted from human gait.Nithyakani et al (2019) proposed a scheme for gait recognition.They used deep convolutional neural network to extract the gait features of a person by training the neural network architecture with Gait Energy Image (GEI).Sharif et al (2020) they presented method for human gait recognition based on extracted shape features using histograms of oriented gradients (HOGs), geometric features, and texture features with local binary patterns (LBPs).Principal component analysis (PCA) was applied on the features for feature reduction.A support vector machine (SVM) was employed to perform the classification.Wang & Yan (2020) they introduced a novel gait classifier which takes full use of deep learning (DL) technology and proposed a novel gaitrecognition method using a convolutional LSTM approach named Conv-LSTM.Elharrouss et al (2021) they presented a method to gait recognition for person re-identification.They used multitask CNN and extracted gait energy images (GEIs).Based on the GEI, features were extracted that were later classified using CNN.Chao et al (2019) they gave a new network named GaitSet to learn identity information from a set of silhouettes.This method exploited deep learning model to compute the cross-view gait representation using a set of independent frames.Sahak et al (2017) they presented method of human recognition based on oblique and frontal gait using features extracted from Kinect, orthogonal least square (OLS) is used for feature selection and multi-layer perceptron for classification.Then the optimized Multi-Layer Perceptron (MLP) with two feature sets is used for the recognition of gait and estimated its effectiveness by using neural network classifier that provides better classification results.Zhang et al (2019) presented an autoencoder-based method termed GaitNet that can disentangle appearance and gait feature representation from raw RGB frames, and utilize a multi-layer LSTM structure to further explore temporal information to generate a gait representation for each video sequence.

Dataset
The authors are locally constructed the dataset for this work.This dataset contains a video of various people that was recorded in real time as they passed in front of and next to the camera.The people photographed in this data set are university students, their ages range between (18-22) and they number 33 people.A camera was used for filming and it was its type Nikon d7200.The camera is positioned "130 cm" above the ground and (450 cm) away from the subject.Two videos are taken for each person, therefore sixty-six videos are used in this research.There are two types of data set (Front and Side dataset), table (1) shows information of the dataset.The area of polygon is determined by the area of the region it occupies.Figure (2) shows an example of the area of a polygon on a two-dimensional plane.
A regular polygon is a polygon that has equal sides and equal angles, the proposed model used regular polygon.Thus, the technique to calculate the value of the area of regular polygons is based on the formulas associated with each polygon.The pre-processing step consists of two sub-steps which are cleaning data using Exploratory Data Analysis (EDA) and balance data using Synthetic Minority Oversampling (SMOTE) technique.Cleaning Data involves analyzing and removing the outliers from the dataset.Exploratory data analysis is a method of sifting through large amounts of information to find outliers.The EDA handles outliers in the data using algorithm (1).

Begin
Step 1: For each column in a dataset: a. Retrieve the lower range and upper range values from the column.b.Sort the items of the column in ascending order.c.Calculate the quantiles for the column: -Q1 = column.quantile(0.The CNN algorithm implements a highly effective approach known as global average pooling, which replaces the traditional Flatten layers commonly used conventional CNN architectures.In the last Convolution layer of the CNN, it generates a feature map specifically for each class involved in the classification task.Instead of employing fully connected layers on top of these feature maps, the CNN takes the average of each individual feature map.This technique aids in addressing the issue of overfitting.To further enhance the model's performance, a Dropout technique with a dropout rate of 0.5% is applied after the global average pooling layer.Subsequently, two dense layers are incorporated into the CNN, utilizing the 'ReLU' and 'SoftMax' activation functions respectively, to facilitate the identification of the target variable.For a comprehensive overview of the CNN model's architecture, please refer to Figure (4).Step 4: Save the features including the left angle of the knee, the right angle of the knee, the left angle of the elbow, and the right angle of the elbow.

End.
Preprocessing step of gait features includes cleaning data as done in face features preprocessing, and balancing data using SMOTE algorithm as in algorithm (2).
Human gait recognition is based on the pattern of the gait obtained in the preceding steps, such that features which are utilized as input to the proposed enhancement CNN (ECCN) algorithm as shown in figure (4)

Results and Discussions
Results of the face and gait recognition are given separately.

Results of Face Recognition
Figure ( 6) and ( 7) clarifies the accuracy and loss for both training and testing for the face front features.

Results of Gait Recognition
Figures (8 and 9) describes the accuracy and loss for both training and testing of the gait side features.

Comparison with Previous Works
Several studies have been concerned with identification human based on soft biometric features using different methods and techniques adopted in previous years.
For performance evaluation, the proposed system achieved an accuracy higher than (Terhörst et al., 2020;Fard et al., 2021;Vukadinovic & Pantic, 2005) for facail features.For gait, the accuracy of previous works was higher than our accuracy, but our method is distinguished by using different features than previous methods.
This section shows a comparison between proposed classification system of identification humane and related methods, as illustrated in Table (2).Although our results are valid, we cannot conclude much about Soft Biometrics.Further evaluation on a much larger and most-varied database is still needed.In previous research, face templates comparison was used they used the entire face.In this research, features extracted from the face were used.Therefore, time is less and storage is less.In gait recognition, geometric features were used that were not used by previous researchers.This research also proves that recognition of a person through his face is more reliable, while recognition through gait is less reliable.

Conclusion
Recognition of people from a distance is one of the most requested topics these days due to the complexity of life and the evolution of nature and global crime, in addition to the emergence of diseases such as the Corona pandemic.This research proves that the human face and the gait can be used to identify people from a distance, but in different accuracies.This research shows that recognition of a person through his face is more reliable, while recognition through gait is less reliable.The research also opens the way to develop a better method in recognition the human gait.
Human face recognition includes four main steps: face detection, features extraction, preprocessing features, and face recognition.Face detection aims to locate the face regions in the images.Haar Cascade classification method is used to detect the face of person in videos.The Haar Cascade method is implemented in python code using Open CV's Cascade Classifier function.In features extraction step, nine features are extracted by detecting region of interest (ROI) using Media Pipe Model by area of triangle face.For ROI determination, the Media Pipe gets the bounding box and the main points, then draws a line between each two points to get the area of the triangle face.The output of the media pipe model is the major nine-coordinate for every face which are presented in figure (1).Four features are extracted by computing the distance of the four 2D landmark faces.These landmark faces are shown in figure (1), which are (left eye distance, right eye distance, mouth distance, and nose distance).The distance is calculated using Euclidian distance equation.

Fig. 2 .
Fig. 2. Area of Polygon in a Two-Dimensional Plane 25) -Q3 = column.quantile(0.75)d.Determine the lower range and upper range values based on the interquartile range (IQR): -IQR = Q3 -Q1 -lower range = Q1 -(1.5 * IQR) -upper range = Q3 + (1.5 * IQR) e. Update the data in the column with the new range values.Step 2: For each value in the column: a.If the value is less than the lower range: -Set the new value as the lower range value.b.Update the value in the column.c.If the value is greater than the upper range: -Set the new value as the upper range value.d.Update the value in the column.End In this paper, SMOTE over-sampling technique is used on the dataset.The SMOTE technique is clarified in algorithm (2).Algorithm (2) SMOTE Algorithm Input: dataset, Smote Percentage call (N), Number of nearest neighbors (k) Output: Balanced training dataset Begin Step1: for each minorities data in the dataset, do { get data and name class } Step2: for each row [X] in the data minorities Class, do { a. Find the k nearest neighbors of the row b. per= N/100 c. while Per ≠ 0 do { Select one of k nearest neighbors, call Y Select a random number R [0add a new sample to data minorities Class call Synthetic data Step 4: return Synthetic data End Face recognition is the last step in human face recognition methodology.This step depends on Enhanced Convolutional Neural network (ECCN).The enhancement performance of the CNN is done through the use of the CNN one dimension four times and the global average pooling one dimension as shown in figure (3).

Fig. 3 .
Fig. 3.The Block Diagram of the Proposed ECNN

Fig
Fig 4. The Detailed Proposed ECNN

Fig. 5 .
Fig. 5. Names of the Key Joint PositionThe extraction of the left and right leg angles is explained in algorithm (3).

Fig. 6 .
Fig. 6.Accuracy in Training and Testing of the Proposed ECNN with Front Features

Fig. 8 .
Fig. 8. Accuracy of the Proposed ECNN Algorithm with Gait Side Features Figure (8) shows the training accuracy value is (0.96392) and the testing accuracy is about (0.89583) with (400) Epoch.Figure (9) shows the training loss value is about (0.00179) and the Testing loss is about (0.00519) with 400 Epoch.Figure (10) describes values of accuracy for the proposed ECNN algorithm based on face front features and gait side features.

Fig. 10 .
Fig. 10.Values of Accuracy of the Proposed ECNN Algorithm based on Face Front Features and Gait Side Features

Table 1 -
Information of Dataset

Table 2 -
Comparison between the Proposed System and Related Methods ValueAccuracy