Skip to content

Gesture Recognition

Gesture recognition is an AI algorithm that detects 21 keypoints on the hand from images or video and recognizes 26 predefined gesture types (hand shapes and movements). It enables new forms of human-computer interaction (HCI) such as contactless operation, sign language recognition, AR/VR interaction, and device control.

Algorithm Overview

It consists of the following two stages:

  1. Hand Pose Estimation (Gestures Pose): Detects 21 hand keypoints (fingertips, joints, etc.)
  2. Gesture Classification (Gestures Classify): Identifies 26 gesture types from keypoint configurations

Recognizable Gestures (26 types)

IndexGestureMeaning / Use
0callPhone / Calling
1dislikeDislike / Reject
2fistFist / Confirm
3fourNumber 4
4grabbingGrabbing
5gripGripping
6likeLike / OK
7little_fingerLittle finger
8middle_fingerMiddle finger
9no_gestureNeutral
10okOK sign
11oneNumber 1
12palmOpen palm
13peacePeace sign
14peace_invertedInverted peace
15pointPointing
16rockRock sign
17stopStop
18stop_invertedInverted stop
19threeNumber 3
20three_gunThree-gun

Edge AI Board (RV1126B) Execution Efficiency

Processing StageModel SizeProcessing Time
Hand Pose Estimation (Gestures Pose)11.6MB58ms
Gesture Classification (Gestures Classify)2.81MB5ms
Total14.41MBApprox. 63ms

Key Features

  • 21 keypoints + 26-type classification: High-precision hand movement recognition
  • High-speed processing: Pose estimation 58ms + classification 5ms, approximately 63ms total
  • Lightweight model: Compact total size of 14.41MB
  • Real-time capable: Low-latency recognition on edge AI boards

Use Cases

  • Contactless operation interfaces (medical settings, cleanrooms)
  • Alternative input for AR/VR controllers
  • Sign language recognition systems
  • Smart home gesture control (lighting, appliances)
  • Interactive digital signage operation
  • Contactless communication in nursing care facilities
  • Hands-free device operation in factories

Edge AI Board Implementation

Using the RV1126B NPU, hand pose estimation (58ms) and gesture classification (5ms) are processed in a combined total of approximately 63ms. Combined with USB cameras or MIPI cameras, a fully edge-based gesture recognition system can be built.