Python fast screenshots and locateOnScreen

Reading time ~3 minutes

Taking screenshots with Python is easy, however, the performance often seems to be an issue, depending on the packages you started with (see per example this question )

In my previous article (/reinforcement-learning/nintendo/reinforcement-learning-nintendo-nes-tutorial/), I noted that I would be limited by the method that was looking for an element on the screen (the Game Over) as often as possible.

Getting rid of bottlenecks is a fun thing to do as a developer. For an unexplained reason, I find it particularly satisfying. Python, with its plethora of libraries introduces many of them. So it is time for a quick tour of the possible solutions.

Benchmarks

I do not want to mess up the code of my previous article so it often is a good idea, when possible, to do the benchmarks in separate files, with similar inputs.

Screenshots

In the previous article, I relied on pyautogui. I realized it was built on pyscreeze, so I also tried this library. After some browsing, I learned that PIL also proposed this feature.

I discovered it after writing this article but d3dshot claims to be the fastest way to perform screenshots in Python. I’ll keep that in mind if I face new bottlenecks in the future, but let’s stick with the first 3 packages for now.

from PIL import ImageGrab
from Xlib import display, X
import io
import numpy as np
import pyautogui as pg
import pyscreeze
import time


REGION = (0, 0, 400, 400)


def timing(f):
    def wrap(*args, **kwargs):
        time1 = time.time()
        ret = f(*args, **kwargs)
        time2 = time.time()
        print('{:s} function took {:.3f} ms'.format(
            f.__name__, (time2-time1)*1000.0))

        return ret
    return wrap


@timing
def benchmark_pyautogui():
    return pg.screenshot(region=REGION)


@timing
def benchmark_pyscreeze():
    return pyscreeze.screenshot(region=REGION)


@timing
def benchmark_pil():
    return np.array(ImageGrab.grab(bbox=REGION))


if __name__ == "__main__":

    im_pyautogui = benchmark_pyautogui()
    im_pyscreeze = benchmark_pyscreeze()
    im_pil =       benchmark_pil()

As expected, pyscreeze is slightly faster than pyautogui, but PIL beats them by a factor of 10!

benchmark_pyautogui function took 157.669 ms
benchmark_pyscreeze function took 152.185 ms
benchmark_pil function took 13.198 ms

Locate an element on screen

import pyautogui as pg
import numpy as np
import cv2 as cv
from PIL import ImageGrab, Image
import time

REGION = (0, 0, 400, 400)
GAME_OVER_PICTURE_PIL = Image.open("./balloon_fight_game_over.png")
GAME_OVER_PICTURE_CV = cv.imread('./balloon_fight_game_over.png')


def timing(f):
    def wrap(*args, **kwargs):
        time1 = time.time()
        ret = f(*args, **kwargs)
        time2 = time.time()
        print('{:s} function took {:.3f} ms'.format(
            f.__name__, (time2-time1)*1000.0))

        return ret
    return wrap


@timing
def benchmark_pyautogui():
    res = pg.locateOnScreen(GAME_OVER_PICTURE_PIL,
                            grayscale=True,  # should provied a speed up
                            confidence=0.8,
                            region=REGION)
    return res is not None


@timing
def benchmark_opencv_pil(method):
    img = ImageGrab.grab(bbox=REGION)
    img_cv = cv.cvtColor(np.array(img), cv.COLOR_RGB2BGR)
    res = cv.matchTemplate(img_cv, GAME_OVER_PICTURE_CV, method)
    # print(res)
    return (res >= 0.8).any()


if __name__ == "__main__":

    im_pyautogui = benchmark_pyautogui()
    print(im_pyautogui)

    methods = ['cv.TM_CCOEFF', 'cv.TM_CCOEFF_NORMED', 'cv.TM_CCORR',
               'cv.TM_CCORR_NORMED', 'cv.TM_SQDIFF', 'cv.TM_SQDIFF_NORMED']


    # cv.TM_CCOEFF_NORMED actually seems to be the most relevant method
    for method in methods:
        print(method)
        im_opencv = benchmark_opencv_pil(eval(method))
        print(im_opencv)

And the results!

benchmark_pyautogui function took 175.712 ms
False
cv.TM_CCOEFF
benchmark_opencv_pil function took 21.283 ms
True
cv.TM_CCOEFF_NORMED
benchmark_opencv_pil function took 23.377 ms
False
cv.TM_CCORR
benchmark_opencv_pil function took 20.465 ms
True
cv.TM_CCORR_NORMED
benchmark_opencv_pil function took 25.347 ms
False
cv.TM_SQDIFF
benchmark_opencv_pil function took 23.799 ms
True
cv.TM_SQDIFF_NORMED
benchmark_opencv_pil function took 22.882 ms
True

pyautogui, once again, is super slow. However, the cv based methods are an order of magnitude lower, though some see “Game Over” when it is not here. I made sure that TM_CCOEFF_NORMED also returned True when the element was in the region before updating the following class:

from PIL import Image, ImageGrab
from helpers import fast_locate_on_screen
import cv2 as cv
import numpy as np
import os
import pyautogui as pg
import time


class BalloonTripEnvironment:

    def __init__(self):
        self._game_filepath = "../games/BalloonFight.zip"
        self._region = (10,10,300,300)
        self._game_over_picture = cv.imread("./balloon_fight_game_over.png")

    def _custom_press_key(self, key_to_press):
        pg.keyDown(key_to_press)
        pg.keyUp(key_to_press)

    def turn_nes_up(self):
        os.system(f"fceux {self._game_filepath} &")
        time.sleep(1)

    def start_trip(self):
        keys_to_press = ['s', 's', 'enter']
        for key_to_press in keys_to_press:
            self._custom_press_key(key_to_press)

    def observe_state(self):
        return pg.screenshot(region=self._region)

    def capture_state_as_png(self, filename):
        pg.screenshot(filename, region=self._region)

    def step(self, action):
        self._custom_press_key(action)
       
    def is_game_over(self):
        img = ImageGrab.grab(bbox=self._region)
        img_cv = cv.cvtColor(np.array(img), cv.COLOR_RGB2BGR)
        res = cv.matchTemplate(img_cv, self._game_over_picture, eval('cv.TM_CCOEFF_NORMED'))
        return (res >= 0.8).any()

    def rage_quit(self):
        os.system("pkill fceux")
        exit()


if __name__ == "__main__":

    env = BalloonTripEnvironment()

    env.turn_nes_up()
    time.sleep(10)
    env.step('enter')
    env.start_trip()
    print("Started")
    is_game_over = False
    i = 0

    while not is_game_over:
        i += 1
        is_game_over = env.is_game_over()
        env.step('f')

    print("Game over!")

    env.rage_quit()

Below, you can see the GIFs of the loop. On the left, the previous version, where each call to is_game_over() needed so much time that the “agent” could not press the button often enough. Now the frequency is high enough, the “agent” just bounces on the top of the screen until it dies!

Before After

Fig. 1: On the left, the previous version of is_game_over(), and the new version, is on the right (note that the beginning of the GIF is just the demo mode of the game.

Hope you liked it, stay tuned for the next articles if you like the project!

How to optimize PyTorch code ?

Optimizing some deep learning code may seem quite complicated. After all, [PyTorch](https://pytorch.org/) is already super optimized so w...… Continue reading

Acronyms of deep learning

Published on March 10, 2024

AI with OCaml : the tic tac toe game

Published on September 24, 2023