Alibaba’s multimodal video generation model series supporting role-play (reference-to-video), multi-shot narrative, audio-visual sync, and up to 15-second output, enabling creators to star in AI videos with their own appearance and voice.
| Provider | Alibaba |
| Tasks | text-to-video · image-to-video |
| Starting from | 0.3049 USD / call · Pricing details |
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
model is required
wan2.6 prompt must be between 1 and 2000 characters
1 - 2000audio_url must be valid http/https url
duration must be 5 or 10 or 15 seconds
5, 10, 15 first_frame_image is required
resolution must be 720P or 1080p
720P, 1080P