Real-Time Object Detection using ml5.js and React

5 min readMay 9, 2021

Last time, I really liked sharing my Angular vs React experience, so today I decided to write a new article on another topic that I find really amazing — Machine learning in the browser 🔥.

It’s just getting easier and easier to perform machine learning tasks — I don’t think one of us could imagine that we will find ourselves running machine learning models in the browser… But the time has come — thanks to TensorFlow.js, a JavaScript library for machine learning, which makes use of the GPU. It also lets us run and retrain existing models directly in the browser / Node.js. The beauty of it is that you can teach, export, and import machine learning models easily, making your model smarter and smarter with small effort, without any Ph.D. needed.

In this article, you will learn how to get started with ml5.js by creating an object detection application in React. We will capture a live video stream from our webcam, and run some detection logic on it, based on a machine learning model called coco-ssd. We will also explore some useful React hooks and real use cases for them. Don’t worry, I will guide you step by step, just follow me!

When I was holding my phone horizontally, it has been detected as a remote (Time to upgrade?😜)

….

Running machine learning model on a webcam video element

Part 1 — Create a new React app

Feel free to use an existing application.

npx create-react-app my-app
cd my-app
npm start

Part 2 —Import ml5.js and Webcam

In this example, we will simply include ml5.js in our index.html as follows:

<script src="https://unpkg.com/ml5@0.6.1/dist/ml5.min.js" type="text/javascript"></script>

and in our component (App.js):

import * as ml5 from "ml5";

Next, we will install the Webcam component (via react-webcam):

// with npm
npm install react-webcam

// with yarn
yarn add react-webcam

All left is to put the Webcam component somewhere in our app, for simplicity, I just put it inside my App.js:

<div className="App">
    <Webcam/>
</div>

The Webcam component will handle the permission and connect our webcam with the HTML video element for us. Now it’s time to connect the video stream with some machine learning model… 😌

Part 3 — Running ml5.js Object Detector

First, we will have to get a reference to the video element of the Webcam component. To do so, we will make use of the useRef React hook.

Let’s add a new line to our App.js:

const webcamRef = useRef();

Then, connect the reference to the Webcam JSX element:

<Webcam ref={webcamRef}/>

Now we can obtain the reference to the Webcam component using webcamRef, which includes (once initiated) a video attribute. We will need this attribute to be sent as a parameter to the detection method of ml5.js object detector class. Almost there!

Next, we need to run some code once the component has loaded. We can do it easily using the useEffect hook with an empty dependency array:

React.useEffect(() => {
        // Code that runs on component load here
    }, []
);

Inside our useEffect, we will:

Add an event listener to the onloadeddata event
Set an interval to execute the model’s detection logic
Important to avoid memory leaks: Clear the interval when the component unmounts (the return function of the useEffect hook will help us with it)

We should end up with the following code (Feel free to check the comments in the code if something feels unfamiliar):

useEffect(() => {
    let detectionInterval;

// 1. Once the model has loaded, update the dimensions run the model's detection interval
const modelLoaded = () => {
    const { width, height } = dimensions;
    videoRef.current.video.width = width;
    videoRef.current.video.height = height;
    detectionInterval = setInterval(() => {
        detect();
    }, 200);
};

    const objectDetector = ml5.objectDetector('cocossd', modelLoaded);

    const detect = () => {
        if (videoRef.current.video.readyState !== 4) {
            console.warn('Video not ready yet');
            return;
        }

        objectDetector.detect(videoRef.current.video, (err, results) => {
            console.log(results);
        });
    };

    return () => {
        if (detectionInterval) {
            clearInterval(detectionInterval);
        }
    }

}, []);

Luckily, I got detected as a person and not as a monster!👻

If you are asking yourself: “Why all the functions are declared within the useEffect hook?” — the answer is that my useEffect has an empty dependency array — which means it runs only once (on init), and has two advantages in this case:

It will run after the browser has already painted (no need to check if an element exists in the DOM)
Better loading performance (doing extra work is less of a problem after the browser has painted)
It will run only once. If this code was inside the component, our functions were re-created each render. Since the performance impact of this (in this case) is minor, it’s up to you to decide if you want to make those kinds of optimizations that early.

Bonus: Drawing bounding boxes of the detected objects

In this section, we are going to use the HTML canvas element to draw on the Webcam component. We will add a floating canvas over the video and then use the rectangle boxes received from the object detector results to know where to draw on the canvas.

Part 1 — Creating the canvas

Create a canvas element just below the webcam component and add a reference to it using useRef:

// JSX
<canvas ref={canvasRef} className="floating"/>
// Inside the component
const canvasRef = useRef();

Inside our useEffect hook, Let’s get the canvas 2D context:

const ctx = canvasRef.current.getContext('2d');

Next, we will update the canvas dimensions to be the same as the webcam element by updating our modelLoaded() function:

videoRef.current.video.width = width;
videoRef.current.video.height = height;
canvasRef.current.width = width;
canvasRef.current.height = height;

Step 2 — Draw!

We can take leverage of the canvas API to draw rectangles. We already have the position and dimensions from the Object Detector API of any detected object by the coco-ssd model (the results variable we received earlier has everything in it). Let’s update our detect() method:

const detect = () => {
    if (videoRef.current.video.readyState !== 4) {
        console.warn('Video not ready yet');
        return;
    }

    objectDetector.detect(videoRef.current.video, (err, results) => {
        const { width, height } = dimensions;
        ctx.clearRect(0, 0, width, height);
        if (results && results.length) {
            results.forEach((detection) => {
                ctx.beginPath();
                ctx.fillStyle = "#FF0000";
                const { label, x, y, width, height } = detection;
                ctx.fillText(label, x, y - 5);
                ctx.rect(x, y, width, height);
                ctx.stroke();
            });
        }
    });
};

clearRect — clears the canvas drawing (we want to draw only once per detection and clear the previous drawing)
We loop the results. Each result is a detection: a person, a chair, a cell phone, e.g…
We draw a rectangle (only the stroke of it) and a text, based on the detection x,y position and dimensions.

That’s it! Hope you enjoyed it and managed to work it out. If you don’t have a webcam or find it slow, you can just try it out on a picture. Feel free to explore additional cool APIs from ml5.js and Tensorflow.js such as Handpose, FaceApi, and more 😎.

Github exaple: https://github.com/theunreal/react-ml5js

Let me know what you think, feel free to hit me up in the comments with any questions.