🎬 Jimeng Seedance 2.0 Guide (New Multimodal Creation Experience)

Ever since the days when we could only “tell stories” with text and starting/ending frames, we’ve wanted to build a video model that truly understands your expression. Today, it has finally arrived!

Jimeng Seedance 2.0 now supports four modalities: image, video, audio, and text input, making expressions richer and generation more controllable.

You can use an image to set the visual style, a video to specify character actions and camera movements, and a few seconds of audio to build the rhythm and atmosphere… Combined with prompts, the creation process becomes more natural, more efficient, and much more like being a real “director”.

In this upgrade, the “reference capability” is the biggest highlight:

📷 Reference image accurately restores composition and character details
🎥 Reference video supports replication of camera language, complex action rhythms, and creative effects
⏱ Video supports smooth extension and transition, generating continuous shots according to user prompts—not just generation, but “keep shooting”
✂️ Editing capability is synchronously enhanced, supporting character replacement, deletion, and addition in existing videos

We know video creation is never just “generation,” but the control of expression. 2.0 is not just multimodal; it’s a truly controllable way of creation.

Seedance 2.0, multimodal creation, starts here. Please imagine boldly, and leave the rest to it.

Parameter Preview

Core Dimension	Seedance 2.0
Image Input & Formats: jpeg, png, webp, bmp, tiff, gif	≤ 9 images Size: Less than 30 MB
Video Input & Formats: mp4, mov	≤ 3 videos, total duration [2,15]s Size: Less than 50 MB Total video pixel range: [409600 (640×640, 480p), 927408 (834×1112, 720p)]
Audio Input & Formats: mp3, wav	≤ 3 audios, total duration not exceeding 15s Size: Less than 15MB
Text Input	Natural language
Generation Duration	≤ 15s, freely selectable from 4-15s
Sound Output	Includes sound effects/background music
Interaction limit: The current total limit for mixed input is 12 files. It is recommended to prioritize uploading assets that have the greatest impact on the visuals or rhythm, appropriately balancing the number of files across different modalities.

⚠️ Gentle Reminder: Regarding the Upload of Realistic Human Face Assets

Due to platform compliance requirements, uploading assets containing realistic human faces (both images and videos) is currently not supported. To protect user rights and ensure safe generation, the system will automatically block such assets, and video content cannot be generated after uploading. This means if you upload photos of real people (especially clear and identifiable human faces), the model will not be able to process the corresponding generation. We understand this may cause some limitations, but this measure is to ensure content safety and standardized platform operation. Thank you for your understanding and cooperation! If there are updates in the future, we will update the documentation in a timely manner. Thanks again for your understanding~

🏁 Final Words

The multimodal capabilities of Seedance 2.0 are constantly evolving, and we will continuously update our capabilities and support more input combinations. We hope this manual will help you unleash your creativity more freely!

If you encounter a bug, or have usage suggestions or requirement scenarios, feel free to leave a message, send a private message, or let us know loud and clear! We will keep optimizing and work together to make Jimeng a truly enjoyable and convenient productivity tool for you ❤️