2026-06-19原文2 min readUpdated: 2026-06-19

TeleMorpher: Toward Robust Simultaneous Motion-Location Editing

Researchers propose TeleMorpher, a one-shot framework for simultaneous motion and location editing in videos using diffusion models. It disentangles protagonist and background, uses pose warping with motion priors, and introduces new evaluation metrics. Experiments show superior performance on in-the-wild videos and the TaiChi dataset.

SourcearXiv Computer VisionAuthor: Haengbok Chung

[2606.19676] TeleMorpher: Toward Robust Simultaneous Motion-Location Editing

[Submitted on 18 Jun 2026]

Title:TeleMorpher: Toward Robust Simultaneous Motion-Location Editing

View a PDF of the paper titled TeleMorpher: Toward Robust Simultaneous Motion-Location Editing, by Haengbok Chung

View PDF HTML (experimental)

Abstract:Diffusion models have achieved remarkable success in image and video generation and editing. While recent studies have extended these efforts toward motion editing, simultaneously transforming both motion and location-despite its practical importance-remains largely unexplored. To better understand robust motion-location editing, we first analyze the fundamental factors that degrade its quality. Based on this analysis, we propose TeleMorpher, one of the first one-shot frameworks to the best of our knowledge, for simultaneous motion-location editing. Our approach leverages motion priors, a target motion-centric video generated from an off-the-shelf model as motion-editing guidance, and the ground truth motion to enable more controllable and precise motion-location editing. Via this, our framework works as follows: (1) we first disentangle the protagonist and the background via pre-trained segmentation and inpainting models. (2) Then, we introduce a training-free pose warping that edits the protagonist's motion with the motion prior as the guidance. (3) The result of warped motion video is directly injected into a baseline motion editor during inference, mitigating the difference between source and target motions while preserving the appearance of the source video. (4) To enhance the reliability of quantitative evaluations, we propose two new LPIPS-based metrics that measure the background consistency before and after the motion editing and the fidelity of motion editing performance via measuring the difference between the extracted protagonist's skeletons from source and target videos. Experiments with in-the-wild videos and the TaiChi dataset demonstrate that TeleMorpher achieves superior performance across both quantitative and qualitative measurements (real-human evaluation), underscoring its effectiveness.

Subjects:

Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)

Cite as: arXiv:2606.19676 [cs.CV]

(or arXiv:2606.19676v1 [cs.CV] for this version)

https://doi.org/10.48550/arXiv.2606.19676

arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Haengbok Chung Ms [view email] [v1] Thu, 18 Jun 2026 01:00:28 UTC (8,119 KB)

Full-text links:

Access Paper:

View a PDF of the paper titled TeleMorpher: Toward Robust Simultaneous Motion-Location Editing, by Haengbok Chung

View PDF

HTML (experimental)

TeX Source

view license

Current browse context:

cs.CV

new | recent | 2026-06

Change to browse by:

cs cs.AI

References & Citations

NASA ADS

Google Scholar

Semantic Scholar

Data provided by:

Bibliographic Tools

Bibliographic and Citation Tools

Bibliographic Explorer Toggle

Bibliographic Explorer (What is the Explorer?)

Connected Papers Toggle

Connected Papers (What is Connected Papers?)

Litmaps Toggle

Litmaps (What is Litmaps?)

scite.ai Toggle

scite Smart Citations (What are Smart Citations?)

Code, Data, Media

Code, Data and Media Associated with this Article

alphaXiv Toggle

alphaXiv (What is alphaXiv?)

Links to Code Toggle

CatalyzeX Code Finder for Papers (What is CatalyzeX?)

DagsHub Toggle

DagsHub (What is DagsHub?)

GotitPub Toggle

Gotit.pub (What is GotitPub?)

Huggingface Toggle

Hugging Face (What is Huggingface?)

ScienceCast Toggle

ScienceCast (What is ScienceCast?)

Demos

Replicate Toggle

Replicate (What is Replicate?)

Spaces Toggle

Hugging Face Spaces (What is Spaces?)

Spaces Toggle

TXYZ.AI (What is TXYZ.AI?)