Track human poses in real-time on Android with TensorFlow Lite
august 06, 2019
Posted by Eileen Mao and Tanjin Prity, Engineering Practicum Interns at Google, Summer 2019

We are excited to release a TensorFlow Lite sample application for human pose estimation on Android using the PoseNet model. PoseNet is a vision model that estimates the pose of a person in an image or video by detecting the positions of key body parts. As an example, the model can estimate the position of a person’s elbow and / or knee in an image. The pose estimation model does not identify who is in an image; only the positions of key body parts.

TensorFlow Lite is sharing an Android sample application that utilizes the device’s camera to detect and display key body parts of a single person in real-time. Check out the source code!

Why is this exciting?

There are many possibilities with pose estimation. To name a few, developers can augment reality based on images of the body, animate computer graphic characters, and analyze the gait of athletes in sports. At Google I/O’19, TensorFlow Lite showcased Dance Like, an app that helps users learn how to dance using the PoseNet model.

This sample application will make it easier for app developers and machine learning experts to explore the possibilities of a light-weight mobile model.

The PoseNet sample application

In contrast with the existing Android examples that are written in Java, the PoseNet sample app was developed in Kotlin. The goal of developing the app was to make it easy for anyone to use the PoseNet model with minimal overhead. The sample app includes a PoseNet library that abstracts away the complexities of the model. The diagram below shows the workflow between the application, PoseNet library, and TensorFlow Lite library.
PoseNet App workflow

The PoseNet library

The PoseNet library provides an interface that takes a processed camera image and returns information about where the person’s key body parts are. This functionality is provided by estimateSinglePose(), a method that runs the TensorFlow Lite interpreter on a processed RGB bitmap and returns a Person object. This page explains how to interpret PoseNet’s inputs and outputs.
// Estimate the body part positions of a single person.
// Pass in a Bitmap and obtain a Person object.
estimateSinglePose(bitmap: Bitmap): Person {...}
The Person class contains the locations of the key body parts with their associated confidence scores. The confidence score of a person is the average of the confidence scores of each key point, which indicates the probability that a key point exists in that position.
// Person class holds a list of key points and an associated confidence score.
class Person {
  var keyPoints: List = listOf()
  var score: Float = 0.0f
}
Each KeyPoint holds information on the Position of a certain BodyPart and the confidence score of that key point. A list of all the defined key points can be accessed here.
// KeyPoint class holds information about each bodyPart, position, and score.
class KeyPoint {
  var bodyPart: BodyPart = BodyPart.NOSE
  var position: Position = Position()
  var score: Float() = 0.0f
}

// Position class contains the x and y coordinates of a key point on the bitmap. 
class Position {
  var x: Int = 0
  var y: Int = 0
}

// BodyPart class holds the names of seventeen body parts.
enum class BodyPart {
  NOSE, 
  LEFT_EYE, 
  RIGHT_EYE, 
  ... 
  RIGHT_ANKLE
}

The PoseNet sample app

The PoseNet sample app is an on-device camera app that captures frames from the camera and overlays the key points on the images in real-time.

The application performs the following steps for each incoming camera image:
  1. Capture the image data from camera preview and convert it from YUV_420_888 to ARGB_888 format.
  2. Create a Bitmap object to hold the pixels from the RGB format frame data. Crop and scale the Bitmap to the model input size so that it can be passed to the model.
  3. Call the estimateSinglePose() function from the PoseNet library to get the Person object.
  4. Scale the Bitmap back to the screen size. Draw the new Bitmap on a Canvas object.
  5. Use the position of key points obtained from the Person object to draw a skeleton on the canvas. Display the key points with a confidence score above a certain threshold, which by default is 0.5.

In order to synchronize pose rendering with the camera frame, a single SurfaceView was used for the output display instead of separate View instances for the pose and the camera. SurfaceView takes care of placing the surface on the screen without a delay by getting, locking, and painting on the View canvas.

Running on-device

We encourage you to try out the app by downloading the source code from GitHub and referencing the README for instructions on how to run it.

On the roadmap

In the future, we hope to explore more features for this sample app, including:
  1. Multi-pose estimation
  2. GPU acceleration with the GPU delegate
  3. NNAPI acceleration with the NNAPI delegate
  4. Post-training quantization of the model to decrease latency
  5. Additional model options, such as the ResNet PoseNet model

It was a pleasure developing the PoseNet sample app this summer! We hope this app makes on-device machine learning more accessible. If you use the app, please share it with us using #TFLite, #TensorFlow, and #PoweredByTF

Acknowledgements

Special thanks to Nupur Garg and Pulkit Bhuwalka, our hosts and Tensorflow Lite software engineers, Tyler Zhu, creator of the PoseNet Model, Pavel Senchanka, fellow intern, Clément Julliard, Pixel Camera software engineer, and the TensorFlow Lite team.
Next post
Track human poses in real-time on Android with TensorFlow Lite

Posted by Eileen Mao and Tanjin Prity, Engineering Practicum Interns at Google, Summer 2019

We are excited to release a TensorFlow Lite sample application for human pose estimation on Android using the PoseNet model. PoseNet is a vision model that estimates the pose of a person in an image or video by detecting the positions of key body parts. As an example, the model can estimate the position of…