WithAI.Design

5 min read

实现素材自由!音视频一键流

实现素材自由!音视频一键流

02.gif

前言

目前开源的视频方案中,有 LTXstudio、Cogsvideo 和最近很火的 Hunyuan video,都是开源届扛把子的,虽然比不上可灵、Openai Sora、Luma、Pika 等一众闭源视频模型,但是也足够让人惊喜。

关于视频素材自由这个话题,我之前也分享过:# 【AI辅助设计】视频素材自由了?AI一键音视频工作流

转眼半年过去了,大家今天也对比看看,哪个效果更好。

效果

以下视频未经剪辑,视频、配音、音效,都是自动生成。

工作流

概述

本次工作流,充分利用到当前的开源的音视频模型,用 comfyui 串联成一个工作流,自动合成。 技术包括:

  1. Llm 节点。用于创作脚本和督导音频、声效。
  2. Huanyuan video。文生视频节点。
  3. MMAudio。用于生成音效。
  4. Stable Audio。用于生成背景音乐。

Llms视频脚本处理

我们还是请出 llm 给我们编写剧本,包括分镜画面、配乐和声效。

提示语:

# Role  
You are a director and screenwriter with an exceptional sense of aesthetics, skilled at using visual storytelling, music, and sound to convey the narrative.  

# Task  
Based on all of my inputs, design the camera shots, accompanying background music, and sound effects.  Output in English.

# Output Format:  

**Camera Shots**: <Description of the camera angles and visuals>  
|  
**Music**: <Corresponding background music. **The music only needs to describe the style and genre according to the screen**>  
|  
**Sound Effects**: <Sound effects accompanying the visuals>  

# sample
input:a Chinese girl
output:
**Camera Shots**:  
The frame opens with a distant view, gradually zooming into a lush bamboo forest. The emerald sea of bamboo sways rhythmically as the wind passes through, sunlight filtering through the leaves, leaving golden patches on the trunks below. The camera glides deeper into the forest, revealing a small stone platform where a dancer stands poised with flowing sleeves. Their movements are graceful and serene, each spin sending their long sleeves cascading like streams of water tracing elegant arcs. The camera transitions to an overhead shot, revealing the dancer encircled by layers of dense bamboo, like a secluded haven. Following the dancer's leaps and twirls, close-up shots capture the flowing fabric mid-motion, fluttering like the wings of a bird. As the sequence ends, the camera pulls back to a wide shot, framing the dancer against the backdrop of a rising sun. The translucent white drapes of their outfit ripple softly in the breeze, bringing the scene to a tranquil close.

|

**Music**:  
heaven church electronic dance music

|

**Sound Effects**:  
The sound of the wind brushing through the bamboo forest sets a gentle yet constant backdrop, layered with occasional birdsong—a sharp, clear contrast to the stillness. As the dancer spins, the faint sound of flowing fabric cutting through the air adds texture to the auditory scene, as if echoing the natural elegance of their movements. The soft tapping of their feet upon the stone platform is intentionally crisp, complimenting the music. The interaction between the dancer’s long sleeves and air creates a subtle whooshing sound, amplifying the visual poetry of their movements. As the performance concludes, the soundscape reduces to a faint breeze and the steady rustle of bamboo, bringing the audience back to the serene and timeless ambiance of the forest.

可以通过手写主题,也可以通过图片反推,输入到 llm,让他处理脚本。

对脚本进行裁剪

目的是输入到文生视频、文生音效、文生音乐流程。

Hunyuan 文生视频

Huanyuan video的标准流程。

不得不说,huanyuan 的视频质量很不错,我这里还是没敢把分辨率提升到 720 p。

01.gif

显存吃紧的朋友,可以把分块调低。

同时把这里设置为 fp 8 。

音效

传入上面输出的提示词。

时长这里,我做了个预算,根据视频的帧率和帧数,算出时间。

时间=帧数/帧率

背景音乐

Stable Audio 的标准流程。

生成的音乐还不是很高清,期待官方的进一步优化。

合成输出

通过视频合成节点,把音视频何为一体即可。

写在最后

Huanyuan video 可以说是当前开源的最优的文生视频模型,如果后续加上图生视频,这样应用场景可能更可控一些。 音乐方面,还不是很给力,主要在清晰度上。比较惊喜的是,MMAudio 的效果还是挺好的。

按照这样发展下去,利用开源方案,去制作一些视频素材,也指日可待了💪。

更多 AI 辅助设计和设计灵感趋势,请关注公众号(设计小站):sjxz 00。

标签