PDT leverages a diffusion transformer-based architecture to transform Gaussian noise into semantically meaningful point distributions, guided by input reference points. We demonstrate the effectiveness of our approach across three structural representations: surface keypoints for artist-inspired meshes, inner skeletal joints for character rigging, and continuous feature lines for garment analysis.
Results & denoising processes of mesh keypoints prediction.
Results & denoising processes of skeletal joints prediction.
Results & denoising processes of feature line extraction.
We pair noisy points from Gaussian distribution each with an input point as a per-point reference. Then, our diffusion model is trained to drag and denoise the Gaussian noise into a desired structural points distribution.
Architecture overview of our PDT. The model extracts per-point features from input reference points and associates them with corresponding noisy points through adding its positional encoding features. The combined features and timestep embeddings are processed through a series of DiT layers to learn the distribution transformation.
@misc{,
}