π― Overview
The Script Editor is the main interface for creating LipsyncFlow video generation scripts. It allows you to define characters, create script entries for AI-generated lipsync videos, and manage the entire workflow from script creation to service submission.
π¬ What is LipsyncFlow?
LipsyncFlow is a service that generates AI-powered lipsync videos where characters speak dialogue with realistic lip movements. The Script Editor helps you create the scripts that tell the service what characters to use, what they should say, and how the videos should be rendered.
Figure: LipsyncFlow Script Editor Main Form
This is the primary interface for managing your AI video projects. It is divided into several key sections:
- Characters Panel (Top-Left): Define and manage your AI actors, including their names, default images, and voice print audio files. Use the Add, Edit, Copy, and Remove buttons to manage your character roster.
- Script Entries Panel (Top-Right): Create and edit individual dialogue lines or actions for your characters. Each entry represents a single video clip to be rendered. The grid shows character, text, TTS parameters, image source, and sequence assignment.
- Sequences Panel (Bottom-Left): Organize multiple script entries into logical sequences. FFmpeg filters can be applied to entire sequences or individual clips within them. Use Add, Edit, Copy, and Remove to manage sequences.
- Output/Status Panel (Bottom-Right): Displays application logs, status messages, validation results, and JSON output. The status bar shows current operation status and validation information.
- Menu Bar (Top): Provides access to file operations (import/export scripts), edit operations (copy/paste JSON, AI prompt builder), tools (Media Processor Library, validation), service configuration, and help documentation.
- Action Buttons: Quick access to common operations like generating JSON, sending to service, and clearing selections.
Use this form to navigate between character, script entry, and sequence management, and to access various tools and help resources via the top menu bar.
ποΈ Core Components
The Script Editor consists of several key areas that work together to create complete video generation scripts:
π₯ Character Management
Define characters with default images and voice print audio files. Characters serve as the foundation for all script entries.
π Script Entries
Create individual video clips with dialogue, TTS parameters, and image controls. Each entry represents one video segment.
ποΈ Sequences
Group related script entries together and apply FFmpeg filters to individual videos or the combined sequence.
βοΈ Media Processors
Manage reusable FFmpeg filter chains for video and audio processing across your projects.
π§ Service Integration
Configure and communicate with the LipsyncFlow service for video generation and job submission.
πΎ File Management
Save, load, export, and import scripts in various formats including native .lipsync and JSON.
π€ AI-Assisted Editing
Generate comprehensive AI prompts to edit scripts using external AI chat interfaces, then import the results back.
π Getting Started Workflow
Follow this step-by-step process to create your first LipsyncFlow script:
Step 1: Add Characters
Click "Add Character" to define your video actors. Each character needs a name, default image file, and voice print audio file for voice cloning.
Step 2: Create Script Entries
Add dialogue entries by clicking "Add Script Entry". Each entry specifies which character speaks, what they say (must be less than 30 seconds when spoken), and how the video should be rendered. Use the text splitting feature for longer dialogue.
Step 3: Configure TTS Parameters
Adjust exaggeration (0.0-1.0) for emotional intensity and CFG Weight (0.5-5.0) for speech pacing control in each script entry.
Step 4: Set Image Sources
Choose how each video starts: use character default, previous video frame, or a custom override image.
Step 5: Apply Video Effects (Optional)
Add FFmpeg filters for video enhancement, color correction, or special effects using the Media Processor Library.
Step 6: AI-Assisted Editing (Optional)
Use "Edit β Copy AI Edit Prompt..." to generate a comprehensive prompt for AI-assisted script editing, then use "Edit β Paste JSON" to import AI-generated improvements.
Step 7: Validate and Submit
Use "Tools β Validate Script" to check for errors, then "Send to Service" to submit your script for video generation.
π₯ Character Management
Characters are the foundation of your script. Each character represents an AI actor that can speak dialogue in your videos.
Creating Characters
1
Click the button in the Characters section
2
Enter a descriptive name for your character
3
Browse and select a default image file (JPG, PNG, etc.)
4
Browse and select a voice print audio file (WAV, MP3, etc.)
π‘ Pro Tip: Use high-quality images and clear audio recordings for best results. The voice print audio should be clean speech without background noise for optimal voice cloning.
π‘ Image Framing Hint: The image entered for the characters should be a wide enough shot that the AI does not need to invent a background for the character in order to increase consistency between video clips of the same character in the same environment. If you only have a tight-shot image, render one video using that image, capture a single frame from that video, and set that captured frame as the character's starting image for all clips. This gives a consistent, wider-framed starting point so the AI doesnβt have to fill in the background each time, reducing visual inconsistencies across clips.
π‘ Voice Print Audio Hint: Use a reference audio clip shorter than 10 seconds with 1 second of silence at the end, recorded at a healthy volume and without background noise, for the best voice cloning and lipsync results.
π‘ Video Prompt Hint: Keep the video prompt very short and focused on specific expression or movement cues for the character (e.g., "soft smile, slight head nod"). Avoid guidance about how the character speaks, their motivation, or other narrative details. You can leave the "Video Control Prompt" empty; it will default to a basic "Character Speaks" prompt.
π‘ Video Prompt Writing Tips:
- Keep it concise β the model has a token limit and pads/truncates to its text_len, so overly long prompts will be truncated.
- Use comma-separated short concepts (e.g., "blurry, low quality, distorted") and avoid redundant synonyms.
- Avoid contradictions β do not include conflicting instructions unless explicitly intended.
β οΈ Voice Print Audio Requirements:
For optimal voice cloning results, your voice print audio should be:
- Less than 10 seconds long - Keep the recording concise
- Include 1 second of silence at the end - This helps the AI model understand speech boundaries
- Clean, clear speech - No background noise, music, or other voices
- Consistent tone and pace - Speak naturally as you would in your videos
Character Properties
- Name: Unique identifier for the character
- Default Image: Starting image used when no override is specified
- Voice Print Audio: Reference audio for voice cloning and TTS generation (should be less than 10 seconds with 1 second of silence at the end)
- Validation Status: Real-time check for file existence and validity
π Script Entry Creation
Script entries define individual video clips with dialogue, TTS parameters, and rendering options.
Basic Script Entry Setup
1
Click in the Script Entries section
2
Select a character from the dropdown
3
Enter the dialogue text the character should speak
4
Optionally assign to a sequence for group processing
Figure: Edit Script Entry Dialog
This dialog is where you configure all the details for an individual script entry, which represents a single AI-generated video clip:
- Character: Select the character who will speak this line. The character's default image and voice print audio will be used unless overridden.
- Text to Speak: Enter the dialogue for the character. Remember the 30-second spoken limit per entry.
- Text Splitting Options: Check "Split text by estimated speaking time" to automatically break long text into ~15-second segments, helping to manage video length and quality.
- TTS Parameters: Adjust "Exaggeration" (emotional intensity) and "CFG Weight" (TTS quality/pacing). Provide a "Video Control Prompt" for emotional guidance and ensure the "Reference Audio" (voice print) is correctly linked.
- Image Controls:
- Use Previous Last/First Frame: Start this clip with the last or first frame of the previously rendered video in the job, ensuring visual continuity.
- Image Override: Provide a specific image file to use as the starting image for this clip, overriding the character's default.
- Sequence: Assign this script entry to an existing sequence for grouped processing.
- Post-Processing: Access buttons to "Edit per_clip_processors" (FFmpeg filters applied to this individual clip) and "Edit final_frame_processors" (FFmpeg filters applied to the final frame of this clip).
- Update/Cancel: Save your changes or discard them.
This dialog provides granular control over each segment of your AI-generated video.
Text Length Requirements
β οΈ Script Entry Text Limit:
Each script entry's text must be less than 30 seconds when spoken. This ensures optimal video generation quality and processing efficiency.
π¬ Video Rendering Performance:
- Longer videos = more artifacts: As video length increases, visual artifacts become more noticeable
- Rendering time increases exponentially: Each additional second takes progressively longer to render
- Performance baseline: On a 4090 GPU, a 5-second video takes approximately 1.5 minutes per second of video (7.5 minutes total)
- Quality vs. Speed: Shorter videos render faster and with better quality
π₯οΈ GPU Requirements:
- Recommended minimum: GPU with 24 GB of VRAM for optimal performance
- Testing specifications: All performance data and recommendations are based on testing with a 4090 GPU on 30-second or less videos at 480Γ832 resolution
- VRAM scaling: Longer videos and higher resolutions require significantly more VRAM - plan accordingly
- Lower VRAM impact: GPUs with less VRAM may experience slower rendering, memory errors, or quality degradation
- Memory management: Monitor your GPU memory usage and consider reducing video length or resolution if you encounter memory errors
Advanced Text Processing
π Text Splitting Options:
- Split by Speaking Time: Automatically breaks long text into chunks of approximately 15 seconds based on estimated speaking time (125 words per minute). This helps keep each script entry video short enough to meet the 30-second requirement and reduces rendering time and artifacts.
- Acronym Spacing: Adds spaces between letters in acronyms for better pronunciation (e.g., 'NASA' becomes 'N A S A')
- Preview Split: See how your text will be divided before saving
TTS Parameters
Exaggeration (0.0 - 1.0)
Controls emotional intensity and lip movement expressiveness
- 0.0 = Calm, subtle movements
- 0.5 = Moderate expression
- 1.0 = Very expressive, exaggerated
CFG Weight (0.5 - 5.0)
Influences TTS generation quality and pacing
- 0.5 = Fast generation, natural pacing
- 2.0 = Balanced quality and speed
- 5.0 = High quality, controlled pacing
Reference Audio
Required audio file for voice cloning and TTS generation
Image Source Control
Control how each video starts by choosing from these options (in priority order):
- Image Override: Specific image file for this entry
- Use Previous Last Frame: Last frame from previous video
- Use Previous First Frame: First frame from previous video
- Character Default: Character's default image (fallback)
ποΈ Sequence Management
Sequences allow you to group related script entries and apply FFmpeg filters to individual videos or the combined sequence.
Creating Sequences
1
Click in the Sequences section
2
Enter a unique ID and descriptive name
3
Add a description explaining the sequence's purpose
4
Configure FFmpeg filters for per-clip and joined video processing
Sequence Features
- Per-Clip Processors: FFmpeg filters applied to individual videos before joining
- Joined Video Processors: FFmpeg filters applied to the final combined video
- Script Entry Assignment: Assign script entries to sequences via the dropdown
- Group Processing: All videos in a sequence are processed together
βοΈ Media Processor Library
The Media Processor Library is your toolkit for managing reusable FFmpeg filter chains across all your projects.
Accessing the Library
1
Go to
2
Create, edit, and organize your FFmpeg filter definitions
3
Import/export processor libraries for sharing and backup
π‘ Pro Tip: Build a comprehensive library of your most-used FFmpeg effects. This saves time and ensures consistency across all your video projects.
ποΈ Advanced Video Render Options
The "Additional Video Properties" button in the script entry dialog allows you to modify AI video model settings. These settings directly affect video quality, rendering speed, and success rate.
β οΈ Important: Improper settings can result in failed video render jobs. If you're unsure about any setting, leave it at the default value. Start with minimal changes and test before making major adjustments.
π‘ VRAM Considerations: All testing was done on 30-second or less videos at 480Γ832 resolution. Longer videos and higher resolutions require significantly more VRAM. Monitor your GPU memory usage and consider reducing video length or resolution if you encounter memory errors.
Core Video Settings
size
What: Video resolution in pixels (heightΓwidth)
Why: Higher = sharper but slower/more VRAM
Typical: 480Γ832, 720Γ1280, 1280Γ720, 832Γ480, 1024Γ1024
Tip: Start with 480Γ832 for speed and lower VRAM usage
sample_steps
What: Quality vs speed steps
Why: More steps = better detail, slower render
Typical: 10β20 (testing), 30β40+ (final quality)
Tip: Keep 10 for quick tuning
fps
What: Frames per second
Why: Higher = smoother motion, larger files
Typical: 25 for natural motion and manageable size
Tip: 25 is usually optimal
resolution
What: Human-friendly label for size (e.g., "480p")
Why: Keeps settings organized and clear
Typical: "480p", "720p", "1080p"
Tip: Keep in sync with size setting
Lip-Sync and Guidance Settings
sample_audio_guide_scale
What: Lip-sync strength from audio
Why: Higher = tighter mouth timing; too high may reduce image quality
Typical: 4.0 (baseline), try 5.0β6.0 for stronger sync
Tip: Best first tweak for better lipsync
sample_text_guide_scale
What: How strongly the text prompt steers visuals
Why: Too high can fight lipsync timing
Typical: 1.0β2.0 for lipsync work
Tip: Keep low (1) for lipsync
guidance2_scale
What: Secondary guidance strength
Why: Fine-tunes overall guidance balance
Typical: 5.0 baseline; adjust slightly (4β6) if needed
Tip: Usually leave alone
embedded_guidance_scale
What: Extra internal guidance strength
Why: Higher follows internal cues more strongly
Typical: 6 (leave as is)
Tip: Usually leave alone
Timing and Motion Settings
sample_shift
What: Temporal schedule "shift"
Why: Changes motion timing/consistency slightly
Typical: 4β7. Default 5 is good
Tip: Default 5 works well
flow_shift
What: Additional flow/temporal shift control
Why: Fine-tunes temporal behavior
Typical: 4β7; default 5 matches sample_shift
Tip: Keep equal to sample_shift unless you know you need different behavior
switch_threshold
What: Advanced guidance switch sensitivity
Why: Subtle effect; leave at 0 unless experimenting
Typical: 0; try 100 or 200 for small changes
Tip: Usually leave alone
skip_steps_multiplier
What: Skips a portion of early steps to speed up
Why: Faster but reduces detail
Typical: 0 (off). Try 0.2β0.5 only if you need speed
Tip: Only use if you need faster rendering
Color and Quality Settings
color_correction_strength
What: Global color stabilization strength (single-clip)
Why: Prevents color drift over time
Range: 0.0β1.0
Tip: For short clips, 0 is fine; for color drift, raise toward 1.0
data_type
What: Numeric precision used by the model
Why: Affects quality and compatibility
Typical: BF16 (good default); FP16 if needed for compatibility
Tip: Leave as BF16
attention_mode
What: Attention implementation
Options: auto (standard), sage (alternative)
Typical: auto
Tip: Leave auto
Long Video Settings (Sliding Window)
sliding_window_size
What: Chunk length for long videos
Why: Larger = fewer cuts but more VRAM/time
Typical: 129 (good default)
Tip: Only change for very long videos
sliding_window_overlap
What: Frames shared between chunks
Why: Higher overlap = smoother transitions, more compute
Typical: 5
Tip: Keep default for most cases
sliding_window_overlap_noise
What: Extra noise in overlaps to reduce seams/blur
Why: Prevents visible chunk boundaries
Typical: 20
Tip: Keep default
sliding_window_color_correction_strength
What: Color matching across chunks
Why: Maintains color consistency between video segments
Range: 0.0β1.0. Typical 1.0 for consistency
Tip: Keep 1.0 for long videos
Advanced and Style Settings
base_seed
What: Random seed for repeatability
Why: -1 = new random each run; fixed number = reproducible results
Typical: -1 (random) or fixed number for consistency
Tip: Use a fixed number when you like a result and want to re-render it
sampler_solver
What: Sampling algorithm
Options: unipc (balanced/fast), dpm++ (smooth), euler (different feel)
Typical: unipc
Tip: Start with unipc
Text Prompt and Advanced Settings
prompt
What: Text description that nudges look/feel
Why: Guides the visual style and mood
Typical: Simple, neutral descriptions
Tip: For lipsync, keep it simple and neutral (e.g., "Character speaks calmly")
sliding_window_discard_last_frames
What: Trim tail frames of each chunk to avoid blend artifacts
Why: Prevents artifacts at chunk boundaries
Typical: 0 (keep all). Raise slightly if you see end-of-chunk artifacts
Tip: Only adjust if you see specific artifacts
Quick Recommendations
π Best First Tweaks:
- sample_audio_guide_scale: 4 β 5 or 6 for tighter lipsync
- sample_steps: 10 β 14β20 for more detail
π‘ Settings to Usually Leave Alone:
switch_threshold, attention_mode, data_type, embedded_guidance_scale, sliding_window settings (unless working with very long videos)
β οΈ For Long Videos:
Adjust the sliding window settings; otherwise keep defaults. Long videos require special consideration for memory management and quality consistency.
π§ Service Configuration
Configure the LipsyncFlow service connection and manage job submissions.
Service Setup
1
Go to
2
Enter the service URL (default: http://localhost:5000/api/v1/jobs)
3
Test the connection to ensure the service is accessible
Job Submission
π€ Sending Scripts to Service:
- Send Selected: Send only the selected script entries (if any are selected)
- Send All: Send all script entries regardless of selection
- Validation: Scripts are automatically validated before submission
- Job Tracking: Receive job IDs and status updates from the service
π€ AI-Assisted Script Editing
Use AI to help edit and enhance your LipsyncFlow scripts with the built-in AI prompt builder interface.
Accessing the AI Edit Prompt
1
Go to in the menu bar
2
The AI Edit Prompt dialog will open with a comprehensive prompt template
Figure: AI Edit Prompt Dialog
This dialog provides a powerful interface for creating AI prompts to edit your LipsyncFlow scripts:
- Large Text Area: Contains a comprehensive prompt template with your current script data, ready for customization
- Character/Line/Token Counts: Real-time statistics showing the size of your prompt (useful for AI model token limits)
- Copy Button: Instantly copies the current prompt text to your clipboard
- OK Button: Saves any edits you've made and closes the dialog
- Cancel Button: Discards changes and closes the dialog
The prompt includes your current script's JSON structure, available file paths, FFmpeg examples, and clear instructions for the AI.
Understanding the AI Prompt Structure
π Prompt Components:
- Custom Instructions Placeholder: Replace this section with your specific editing requirements
- File Path Guidance: Instructions for including full file paths with character names
- AI Prompt Start Marker: Clear indication of where to start copying for the AI
- Script JSON Structure Guide: Detailed explanation of object types and properties
- FFmpeg Filter Examples: Video and audio filter syntax examples
- Available File Paths: List of all media files referenced in your current script
- Example Script JSON: Your current script's complete JSON representation
- Editing Instructions: Step-by-step guidance for the AI
Complete AI Editing Workflow
Step 1: Generate the AI Prompt
Click "Edit β Copy AI Edit Prompt..." to open the dialog. The prompt will be pre-populated with your current script data and comprehensive editing instructions.
Step 2: Customize Your Instructions
Edit the "CUSTOM INSTRUCTIONS PLACEHOLDER" section with your specific requirements. For example:
=== CUSTOM INSTRUCTIONS PLACEHOLDER ===
Add a new character named "Sarah" with a professional appearance.
Make all dialogue more dramatic and emotional.
Add background music to sequence 1.
Change the video resolution to 1280x720 for all entries.
Include these file paths for the AI to use:
- C:\MyProject\Characters\Sarah\sarah_portrait.png
- C:\MyProject\Characters\Sarah\sarah_voice.wav
- C:\MyProject\Assets\Backgrounds\office_scene.jpg
=== END CUSTOM INSTRUCTIONS ===
Step 3: Copy the Complete Prompt
Click the "Copy" button to copy the entire prompt (including your custom instructions) to your clipboard. The prompt includes everything the AI needs to understand your script structure and requirements.
Step 4: Send to Your AI Chat Interface
Paste the prompt into your preferred AI chat interface (ChatGPT, Claude, Gemini, etc.). The AI will analyze your current script and generate an updated JSON based on your instructions.
Step 5: Copy the AI's Response
Copy the AI's generated JSON response from the chat interface. The AI should return only the modified JSON without any explanations or markdown formatting.
Step 6: Paste Back into LipsyncFlow
Use "Edit β Paste JSON" to import the AI-generated script. Choose whether to replace your current script or merge the changes, and configure conflict resolution options as needed.
π‘ Pro Tips for AI Editing:
- Be Specific: Provide clear, detailed instructions for the changes you want
- Include File Paths: List all available media files so the AI can reference them
- Test Incrementally: Start with small changes and test before making major modifications
- Validate Results: Always use "Tools β Validate Script" after importing AI-generated content
- Backup First: Save your work before making AI-assisted changes
β οΈ AI-Generated Content Considerations:
- Review Carefully: Always review AI-generated content before using it in production
- Validate File Paths: Ensure all referenced files exist and are accessible
- Check Character References: Verify that all character names in script entries match existing characters
- Test FFmpeg Filters: Validate any FFmpeg filter syntax before submitting to the service
- Backup Original: Keep a copy of your original script before applying AI changes
Advanced AI Prompt Customization
Character Modifications
Request specific character changes:
Add a new character named "Dr. Smith" with:
- Professional medical appearance
- Calm, authoritative voice
- Use the provided medical office background
- Set exaggeration to 0.3 for professional demeanor
Dialogue Enhancements
Improve existing dialogue:
Make all dialogue more engaging by:
- Adding emotional expressions to video prompts
- Increasing exaggeration to 0.7 for more expressiveness
- Adding pauses and emphasis in longer sentences
- Using more dynamic language
Technical Improvements
Optimize technical settings:
Optimize the script for better quality:
- Set sample_steps to 25 for higher quality
- Increase sample_audio_guide_scale to 5.5 for better lipsync
- Add color correction filters to sequences
- Set consistent video resolution to 720x1280
Troubleshooting AI-Generated Scripts
π§ Common Issues and Solutions:
- Invalid JSON: Ask the AI to return only valid JSON without explanations
- Missing Characters: Ensure all character references exist in the characters section
- File Path Errors: Verify all file paths are correct and files exist
- FFmpeg Syntax: Check that all filter syntax is valid FFmpeg format
- Validation Errors: Use the built-in validation to identify and fix issues
πΎ File Operations
Manage your scripts with comprehensive file operations and format support.
Native .lipsync Format
Rich Metadata
Version tracking, creation date, application info, and validation results
Enhanced Data
File existence checks, size information, and duration estimates
Backward Compatibility
Automatic detection of legacy JSON format files
Validation Status
Real-time validation results and detailed error reporting
Import/Export Features
- Script Packages: Export complete scripts with all media files as ZIP archives
- JSON Export: Export to JSON format for external use and compatibility
- Drag & Drop: Load files by dragging them onto the application window
- Recent Files: Quick access to recently opened scripts
- File Info: View detailed information about loaded scripts
π Validation and Debugging
Ensure your scripts are ready for video generation with comprehensive validation tools.
Script Validation
1
Use to check for errors
2
Review validation results and fix any issues
3
Check the status bar for real-time validation updates
Debug Tools
- Debug Script Data: Log complete script structure to console
- Test JSON Generation: Verify API JSON generation step by step
- Preview API JSON: View the exact payload that will be sent to the service
- Service Connection Test: Verify service accessibility before submission
- Logs Folder: Access detailed logs for troubleshooting
π― Best Practices
Follow these guidelines for optimal script creation and video generation:
Character Setup
- Use high-resolution images (at least 512x512 pixels) for best results
- Ensure voice print audio is clear and free of background noise
- Voice print audio should be less than 10 seconds with 1 second of silence at the end
- Test character validation before creating script entries
Script Entry Creation
- Keep individual entries under 30 seconds when spoken for optimal processing and quality
- Use the text splitting feature for longer dialogue to create approximately 15-second segments
- Shorter videos render faster with fewer artifacts - aim for 5-15 second segments when possible
- Preview text splits before saving to ensure proper division
- Set appropriate exaggeration and CFG weight values for your content
Sequence Organization
- Group related script entries into sequences for better organization
- Use descriptive sequence names and IDs
- Apply consistent FFmpeg filters across related videos
File Management
- Save your work frequently using Ctrl+S
- Use the native .lipsync format for full feature support
- Export script packages when sharing projects with others
- Keep backup copies of important scripts
AI-Assisted Editing
- Always backup your script before using AI editing features
- Be specific in your AI instructions and include relevant file paths
- Validate AI-generated content before using it in production
- Test changes incrementally rather than making large modifications at once
- Review and verify all character references and file paths in AI-generated content
π Related Resources
For more detailed information about specific features:
FFmpeg Editing Guide
Learn how to create and manage FFmpeg filter chains for video processing.
Access via Help β Learn how to use me β FFmpeg Editing Guide
AI Video Generation
Understanding AI video generation concepts and best practices.
Multitalk Project
Voice Cloning & TTS
Resources for voice cloning and text-to-speech technologies.
Chatterbox Project