ByteDance’s native audio-visual joint generation model built on a dual-branch DiT architecture, producing synchronized video and audio in a single pass with multilingual lip-sync, cinematic camera control, and narrative coherence.
| Provider | Bytedance |
| Tasks | text-to-video · image-to-video |
| Starting from | 0.0552 USD / call · Pricing details |
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
model is required
seedance-v1.5-pro prompt length must be <= 2000
2000resolution must be 480p or 720p
480P, 720P duration must between 4 and 12
4, 5, 6, 7, 8, 9, 10, 11, 12