2026-06-12原文3 min readUpdated: 2026-06-12

Stereo Vision-Based Fall Prediction and Detection using Human Pose Estimation on the AMD Kria K26 SOM

This paper presents a portable, low-power, battery-operated vision-based fall prediction and detection system using human pose estimation on an AMD Kria K26 SOM. The system uses an Intel RealSense D455 camera and a three-stage pipeline (quantized YOLOX, A2J, and CNN) to achieve real-time, privacy-preserving fall detection on the edge. Results show 4.5 FPS throughput with 75.85% classification accuracy.

SourcearXiv Computer VisionAuthor: Shreyas Narasimhiah Ramesh, P. D. Rathika, Mahasweta Sarkar, Kristen Wells, Michel Audette, Christopher Paolini

[2606.12473] Stereo Vision-Based Fall Prediction and Detection using Human Pose Estimation on the AMD Kria K26 SOM

[Submitted on 10 Jun 2026]

Title:Stereo Vision-Based Fall Prediction and Detection using Human Pose Estimation on the AMD Kria K26 SOM

View a PDF of the paper titled Stereo Vision-Based Fall Prediction and Detection using Human Pose Estimation on the AMD Kria K26 SOM, by Shreyas Narasimhiah Ramesh and 5 other authors

View PDF HTML (experimental)

Abstract:Background and Objective: Falls among elderly people can cause serious injury and reduce quality of life. Timely prediction and detection are essential to prevent harm and support well-being. We propose a portable, low-power, battery-operated, vision-based fall prediction and detection system using HPE on an AMD Kria K26 System-on-Module (SOM). The objective is a non-intrusive, privacy-preserving system for real-time fall detection.

Methods: The system uses an Intel RealSense D455 range-sensing camera connected to the K26 SOM by USB. It captures synchronized RGB and depth frames, 640 x 480 x 3 and 640 x 480 pixels, at 60 FPS. The SOM runs a three-stage pipeline with quantized YOLOX, Anchor-to-Joint (A2J), and fall-detection models. YOLOX identifies human bounding boxes from RGB frames, then discards the RGB frames to preserve privacy. A2J uses depth frames to estimate 15 joint keypoints per person. A CNN uses selected joint coordinates (x, y, z) to classify fall activity. YOLOX was trained on CrowdHuman; A2J on ITOP, MP-3DHP, UR Fall Detection, and a custom SDSU PSG dataset; and the CNN on UR Fall Detection and SDSU PSG. The design used a single-core DPU with a serial pipeline and a dual-core DPU running YOLOX and A2J with multiple threads.

Results: Quantized accuracy was evaluated using IoU >= 50% for YOLOX, mAP with a 10-cm rule for A2J, and classification accuracy, (TP + TN)/(TP + TN + FP + FN), for the CNN. Accuracies were 74%, 84.13%, and 75.85%. Throughput improved from 2.5 FPS for the single-threaded pipeline to 4.5 FPS for the multi-threaded version.

Conclusion: Results demonstrate the feasibility of privacy-preserving fall detection on an AMD Kria K26 edge device. On-device HPE and fall classification runs without cloud dependency, supporting elderly monitoring and assistive healthcare. Future work will improve model accuracy and speed.

Comments: 19 pages; 31 figures

Subjects:

Computer Vision and Pattern Recognition (cs.CV)

Cite as: arXiv:2606.12473 [cs.CV]

(or arXiv:2606.12473v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2606.12473

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Christopher Paolini [view email] [v1] Wed, 10 Jun 2026 05:08:35 UTC (154,009 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled Stereo Vision-Based Fall Prediction and Detection using Human Pose Estimation on the AMD Kria K26 SOM, by Shreyas Narasimhiah Ramesh and 5 other authors

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.CV

new | recent | 2026-06

Change to browse by:

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)