Team: J. Wu, P. Ishwar, J. Konrad
Funding: National Science Foundation (CISE-SATC)
Status: Completed (2016-2018)

Summary:  This project studies learning gesture “styles” (of both the hand and body) for user authentication. This is unlike both the BodyLogin and HandLogin projects, which study the use of gestures in the context of a “password”. Contrary to these two other projects, which associate each user with a single, chosen gesture motion, this project instead aims to learn a user’s gesture “style” from a set of training gestures and leverages the datasets developed in both of these prior projects.

Overview of two-stream convolutional networks for both identification and verification
Overview of two-stream convolutional networks for both identification and verification

We have developed a deep learning based framework for learning gesture style, specifically, by using a two-stream convolutional neural network. We have adapted this approach for both identification and verification. We benchmarked this approach against a state-of-the-art method which uses a temporal hierarchy of covariance matrices of features extracted from a sequence of silhouette-depth frames.

Results: We have validated our approach on data extracted from both HandLogin and BodyLogin datasets.

Compared to a state-of-the-art method based on covariance features (“Baseline [26]” in the table above), the convolutional neural network shows a significantly improved performance in both user identification and verification tasks. Particularly impressive is performance when the network is trained on some subset of gestures, but tested on a different, unseen gesture. For example, for “All but Piano / Piano” the network is trained on “Compass”, “Fist” and “Push”, but tested on “Piano”. One exception is the case of the “Push” gesture (scenario #4 for HandLogin). We believe that the performance for this case is lower because the training gestures are significantly different in form and dynamics from the other gestures. The “Push” gesture contains variations in scale whereas the other gestures do not. For the most part, this type of result is to be expected. It will always be difficult to generalize to a completely unknown gesture that has little-to-no shared components with training gestures. However, for similar gestures in terms of the form and dynamics the generalization works quite well.

A key practical outcome of this approach is that for authentication and identification there is no need to retrain a CNN as long as users do not use dramatically different gestures. With some degradation in performance, a similar new gesture can still be used for convenience.

The dataset and code used in this project have been made available here.


For complete results, discussion and details of our methodology, please refer to our paper below.


  1. J. Wu, P. Ishwar, and J. Konrad, “Two-Stream CNNs for Gesture-Based Verification and Identification: Learning User Style,” in IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Workshop on Biometrics, June 2016.