An Over-Engineered Halloween Decoration
Creeping people out, with software.
Halloween is a great excuse to build decorations, whether it’s something simple like a motion-activated speaker or a ghost on a pulley system that strafes the yard when someone walks by. I actually wanted to build the latter, and might in the future, but some napkin math told me that driving a pulley across my yard would require a rotational velocity of something on the order of 15m/s which means I would need either a giant drive wheel and a beefy motor, or a smaller drive wheel and a faster and still beefy motor.
To that end I settled on something more feasible with the parts I had on hand and took on a small animatronics project: A disembodied eyeball that follows you across the room.
The original eyebot running on a Jetson Nano
All code and STL files are available at https://gitlab.com/ss32/creepycam
Stealing Designs
An old post on /r/3Dprinting showcased this very cool animatronic eyeball that I used as the starting point, but I wanted something bigger. To that end, I (painfully) modified the STLs in Blender to support a scaled-up version of the original eyeball and eyelid assembly. The end result is an arguably unstable base with the ~10cm eyeball cantilevered off of the front. This new design also required some custom servo horns for the X and Y axes of the eyeball itself and a scaled-up version of a great little u-joint.
Remote Control
The end goal of the project was to control the eyeball via an Arduino but I didn’t want a static boring animation routine; it had to be something interactive. I stumbled on the Firmata library which is a wonderful tool that comes bundled with the Arduino IDE and allows remote control of the microcontroller via serial. Coupled with the PyFirmata library this makes the path to more complex software just a matter of coding on the Python side.
I chose to separate the image processing code and control code such that image processing can be run remotely on a machine with more horsepower if both face detection and tracking and person detection are run simultaneously.
Hugo and Mermaid are not playing nicely
flowchart TD
A[Image Processing]-->|ZMQ - Servo Commands|B[PyFirmata]
B-->|USB|C[Arduino]
The main vision processing routine produces an array of [person_x, person_y, bbox_w, boox_h, img_dims, detection_time]
and pushes it out to a ZMQ topic that the Arduino control script subscribes to via a simple PUB/SUB model:
Server
def zmq_setup(config):
context = zmq.Context()
socket = context.socket(zmq.PUB)
host_addr = "tcp://{}:{}".format(config["server"], config["port"])
socket.bind(host_addr)
print("Successful bind at {}".format(host_addr))
return socket
config = get_config()
socket = zmq_setup(config)
# Send information about the person in the frame to the arduino control device
socket.send_string(config["servo_topic"], zmq.SNDMORE)
socket.send_pyobj(
[person_x, person_y, bbox_w, bbox_h, img_dims, detection_time]
)
Client
def zmq_setup(config):
context = zmq.Context()
socket = context.socket(zmq.SUB)
client_addr = "tcp://{}:{}".format(config["client"], config["port"])
socket.connect(client_addr)
socket.setsockopt(zmq.SUBSCRIBE, "{}".format(config["servo_topic"]).encode())
socket.setsockopt(zmq.SUBSCRIBE, "killsig".encode())
print("Successful ZMQ subscription at {}".format(client_addr))
return socket
config = get_config()
socket = zmq_setup(config)
if topic == "servo":
data = socket.recv_pyobj()
The great thing about publishing data using different topcis that it allows for concise code paths that accomplish different tasks. The main loop of the client side waits for information from the vision pipeline and moves the servos accordingly, but it can also wait for a kill
signal to shut itself down. Keeping the signal on a separate topic partitions the tasks as well as allows for full remote control of the client if they are on the same machine.
Server
# Check to make sure server and client are the same machine
if config["lauch_arduino_process"]:
if config["server"] != config["client"]:
print(
bcolors.WARNING
+ "WARNING: Cannot start Arduino process if client and host are not on the same machine.\n"
+ "Set server and client to 127.0.0.1 in order to use this feature."
+ bcolors.ENDC
)
else:
process = Popen(["python3", "arduino_control.py"], stdout=PIPE, stderr=PIPE)
socket.send_string("killsig", zmq.SNDMORE)
socket.send_pyobj(True)
Client
if topic == "killsig":
data = socket.recv_pyobj()
if data:
print("Got kill signal from camera, exiting")
sleep(0.25)
ypin.write(90)
xpin.write(X_REST)
top_eyelid_pin.write(TOP_REST)
bottom_eyelid_pin.write(BOTTOM_REST)
sys.exit(0)
Vision Processing
The original plan for the eyeball was to mount it as a decoration outside and it would follow people as they walked up to the house. Tracking someone both far away and up close with a fixed focal length falls under two problem regimes: person tracking and face tracking. The vision pipeline can do both, but as it is currenty written can only do person tracking if running on an Nvidia Jetson platform, making use of the Jetson Inference library. The precompiled models on there make it trivially easy to run a forward pass on a person detection network using the onboard GPU
def find_people(frame, net, previous_coords):
img = jetson.utils.cudaFromNumpy(frame[:, :, ::-1])
detections = net.Detect(img, overlay="none")
people = []
x, y, w, h = previous_coords
for i, d in enumerate(detections):
if d.ClassID == 1:
w = int(d.Width)
h = int(d.Height)
x = int(d.Left)
y = int(d.Top)
people.append([x, y, w, h])
# Find the closest person / largest bounding box
if len(people) > 0:
person_index = find_largest_rectangle(people)
x, y, w, h = people[person_index]
detection_time = time.time()
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 225, 0), 2)
return frame, x, y, w, h, detection_time
In order to account for multiple people in the frame I find the closest one by calculating the largest bounding box, assuming that bigger equals closer.
Once a person is closer to the camera the tracking gets handed off to OpenCV’s Yunet, a highly efficient face detection network capable of running at 10fps on even a Raspberry Pi 4. The same bounding box area method is used here to return only the closest person, or the one with the biggest head.
def find_faces(frame, model, previous_coords, previous_detection_time):
person_x, person_y, bbox_w, bbox_h = previous_coords
detection_time = previous_detection_time
try:
faces = model.infer(frame)
except:
faces = []
# If more than one face is found, focus on the largest (closest) one
if len(faces) != 0:
detection_time = time.time()
if len(faces) > 1:
face_index = find_largest_rectangle(faces)
faces = [faces[face_index]]
person_x, person_y, bbox_w, bbox_h = faces[0][0:4].astype(np.int32)
cv2.rectangle(
frame,
(person_x, person_y),
(person_x + bbox_w, person_y + bbox_h),
(0, 0, 255),
2,
)
return frame, person_x, person_y, bbox_w, bbox_h, detection_time
Also in the vision pipeline are options to save every frame that gets recorded or the more fun option of only recording when it finds a person or face. The end result of the second option is a supercut of people walking up to the decoration looking confused or creeped out.
Motion Control
The information generated by the vision pipeline is sent off to the Arduino control routine, which handles the requisite calculations for the eyeball and eyelids themselves, like mapping image coordinates to servo coordinates. When testing the range of the servos and eyeball I wrote a little servo tool that uses Firmata to issue commands to the servos so I could figure out their min/max ranges.
$ python3 servo_tool.py
Found Arduino at /dev/ttyUSB0
Commands are issued as: <pin> <servo_position>
Pin options are: x - eye x axis
y - eye y axis
b - bottom eyelid
t - top eyelid
Example: x 125
Moves the X-axis servo to 125 degrees
Type exit as a command or hit CTRL+C to quit
----------------------------------------------
Command: x 95
Command: exit
Exiting
These values are stored in a config file written in YAML which is parsed and handed off to a function that does the image too servo coordinate mapping. Inputs to the servos are in the form of a duration in microseconds, so an integer value is required.
def map_to_range(value, x1, x2, y1, y2):
xRange = x2 - x1
yRange = y2 - y1
remapped_value = float(value - x1) / float(xRange)
return int(y1 + (remapped_value * yRange))
To make the eye a little more realistic I added a blink function and a “suspicious” mode to the motion control. A maximum blink time is included in the config and while the eye is open (only when it sees someone) it will wait a random time up to that max time and blink while it is watching them.
# Only blink if we're looking at someone
if time.time() - tStart > tBlink and top_eyelid_position != TOP_REST:
blink(top_eyelid_pin, top_eyelid_position)
tBlink = config["BLINK_MIN_TIME_SECONDS"] + random.randint(0, config["BLINK_MIN_TIME_SECONDS"])
tStart = time.time()
Similarly, the suspicious mode will randomly fire while the eye is open, squinting at whoever it is following.
if time.time() - tSus > tSusTimer:
tSus = time.time()
tSusTimer = random.randint(0,config["SUSPICIOUS_TIME_SECONDS"])
suspicious = not suspicious
If the camera doesn’t see anyone for a few seconds, also a configurable value, then the eye closes.
The net effect of all three behaviors results in a much more realistic movement, pushing it more toward the uncanny valley.
Notes on Testing
The whole system was originally designed for a Jetson Nano and ran reliably at 15fps using the old face detection pipeline. The Yunet pipeline is more efficient and will run much faster; it achieves 10fps on a Pi 4 and an Nvidia Orin CPU is capable of at least 30fps; I haven’t tested with a camera that can run any faster.