CaPa: Carve-n-Paint Synthesis
for Efficient 4K Textured Mesh Generation
Paper is released! Stay tuned for online demo!
Graphics AI Lab,
NCSOFT Research
TL; DR: we propose CaPa, a novel method for generating high-quality 4K textured mesh under only 30 seconds,
providing 3D assets ready for commercial applications such as games, movies, and VR/AR.
providing 3D assets ready for commercial applications such as games, movies, and VR/AR.
Abstract
The synthesis of high-quality 3D assets from textual or visual inputs has become a central
objective in modern generative modeling.
Despite the proliferation of 3D generation algorithms, they frequently grapple with challenges such as multi-view inconsistency, slow generation times, low fidelity, and surface reconstruction problems.
While some studies have addressed some of these issues, a comprehensive solution remains elusive.
In this paper, we introduce CaPa, a carve-and-paint framework that generates high-fidelity 3D assets efficiently.
CaPa employs a two-stage process, decoupling geometry generation from texture synthesis.
Initially, a 3D latent diffusion model generates geometry guided by multi-view inputs, ensuring structural consistency across perspectives.
Subsequently, leveraging a novel, model-agnostic Spatially Decoupled Attention, the framework synthesizes high-resolution textures (up to 4K) for a given geometry.
Furthermore, we propose a 3D-aware occlusion inpainting algorithm that fills untextured regions, resulting in cohesive results across the entire model.
This pipeline generates high-quality 3D assets in less than 30 seconds, providing ready-to-use outputs for commercial applications.
Experimental results demonstrate that CaPa excels in both texture fidelity and geometric stability, establishing a new standard for practical, scalable 3D asset generation.
Methodology
Overview
Pipeline of CaPa
- 1. Geometry Generation: First, we generate geometry (polygonal mesh) using a 3D latent diffusion model. Using the learned 3D latent space with ShapeVAE, we train a 3D Latent Diffusion model that generates 3D geometries, guided by multi-view images from multi-view diffusion model to ensure alignment between the generated shape and texture.
-
2. Texture Generation: Second, we render four orthogonal views of the mesh, which serve as inputs for texture generation.
To produce a high-quality texture while preventing the Janus problem, we design a novel, model-agnostic spatially decoupled attention:
- This mechanism ensures that each spatial region independently attends to its corresponding view, preserving view-specific details and enhancing multi-view consistency.
- Its model-agnostic nature allows integration with any diffusion model, enabling extraordinary texture quality powered by SDXL, thus outperforms other 3D generation or texturing methods typically limited to SD1.5.
- Final Output: A hyper-quality textured mesh is obtained through back projection and a 3D-aware occlusion inpainting algorithm. The entire 3D asset generation process is completed in less than 30 seconds using a fully feed-forward approach.
Comparison: Image to 3D Asset Generation
We compare CaPa with state-of-the-art Image-to-3D methods. Here, all the assets are converted to polygonal mesh, using its official code. CaPa significantly outperforms both geometry stability and visual fidelity, especially for the back and side view.
Scalability & Adaptability
PBR-aware 3D asset Generation
Texture Editing
Original | Edited w/ text prompt ("orange sofa, orange pulp") |
---|
Citation
BibTeX
@article{heo2025capa,
title = {CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation},
author = {Hwan Heo and Jangyeong Kim and Seongyeong Lee and Jeong A Wi and Junyoung Choi and Sangjun Ahn},
journal = {arXiv preprint arXiv:2501.09433},
year = {2025},
}
title = {CaPa: Carve-n-Paint Synthesis for Efficient 4K Textured Mesh Generation},
author = {Hwan Heo and Jangyeong Kim and Seongyeong Lee and Jeong A Wi and Junyoung Choi and Sangjun Ahn},
journal = {arXiv preprint arXiv:2501.09433},
year = {2025},
}
Related Project
Texture Copilot