- Fully Python CLI-based interaction
- Enhanced image processing/enhancement and customization
- Gender-aware image generation/prompting
- Estimating similar pose styles based on facial landmark data
- Looping mechanism randomizing IdentityNet, AdapterStrength, Guidance and Inference metrics with some best-sample settings.
- Generation CSV log so you can determine what random values were chosen to lock in your preferences of data points above to better fine-tune as "beauty is in the eye of beholder"...
As far as the name, Kumori (ku-mo-ri / 曇り) means 'cloudy' in Japanese. So this is a bit of play on the word where instead of grey and gloom of clouds, it's been said to embody the sunshine and positivity that comes after the clouds and rain!
Kumori CLI Engine is rooted strongly in the InstantID InstantID GitHub code as they have an excellent, tuning-free approach to identity-preserving generation, allowing for the swift creation of stylized portraits with a singular input image. The original InstantID implementation, leverages state-of-the-art AI to achieve remarkable fidelity in generated images, supporting a wide range of artistic renditions while preserving the unique identities captured in source images.
InstantID is accessible through user-friendly interfaces like Gradio, facilitating easy experimentation for users without deep technical expertise in command-line tools or programming. While this approach demystifies AI's complexities, it often caters to one-off generation tasks rather than batch processing or integrated workflows that many developers and researchers may require, I wanted to be able to "set it and forget it" to generated images.
Since InstantID rolle dout, I had been testing ways to automate InstantID in conjunction with various working HuggingFace models (added a handful of models that work in the configs.py file to choose from) to offer a more versatile and automation-friendly solution. This approach is designed for those seeking deeper customization, speed in batch image processing, and a more scriptable interaction with InstantID's capabilities, bypassing the GUI constraints.
The motivation behind developing a CLI-based interface for InstantID interaction stems mostly from the fact that I have an RTX3060, which works alright, but leaves a bit to be desired on wait times to produce. And waiting for an image to just try another bunch of settings was too time consuming, and would rather create several, and decide for myself what felt best, based on some of the ideal settings at differing strengths --it's worked out pretty good so far, so thought I'd share.
This guide provides both an automated setup script for Windows users and detailed manual installation instructions.
For Windows users, there's an automated batch script auto_install_kumori_cli.bat available in the root of the project directory. This script handles the complete setup process, including environment setup, repository cloning, dependency installation, and more.
To Use the Automated Script:
- Double-click the
auto_install_kumori_cli.batfile or run it using the command line:
./auto_install_kumori_cli.bat
Follow the on-screen instructions. The script will:
- Check for Python installation.
- Create and activate a Python virtual environment.
- Clone the
kumori_cli_enginerepository. - Install required Python libraries.
- Install the appropriate version of PyTorch with CUDA support.
- Download and extract additional files for facial detection and analysis.
NOTE for Non-Windows Users: While this script is designed for Windows, you can refer to it for the required steps and adapt the commands for macOS or Linux environments.
Then follow these steps to start using Kumori CLI Engine:
- Activate the virtual environment with:
.\kumori_venv\Scripts\activate - Navigate to the kumori_cli_engine directory:
cd kumori_cli_engine - Run the Kumori CLI Engine:
python .\kumori_cli.py - See the images you've created in the
generated_imagesfolder!
For those who prefer or need to install everything manually, here are detailed step-by-step instructions for setting up the Kumori CLI Engine.
- Check Python Installation:
Ensure you have Python 3.8 or newer installed on your system. Verify by running python --version or python3 --version in your terminal or command prompt. If Python is not installed, download it from the official Python website and follow the installation instructions for your operating system.
- Install Git:
Git is essential for cloning the project repository to your local machine. If you do not have Git installed, follow these instructions to install it:
-
Windows:
- Download the latest Git for Windows installer from Git's official website.
- Run the installer and follow the installation prompts. Accepting the default options is recommended for most users.
-
macOS:
- Git might already be installed on your machine. You can check by running
git --versionin the Terminal. - If you don't have Git installed, the easiest way is via the standalone installer:
- Download the latest macOS Git installer from Git's official website.
- Follow the on-screen instructions to complete the installation.
- Alternatively, you can install Git using Homebrew (if installed):
- Run
brew install gitin the Terminal.
- Run
- Git might already be installed on your machine. You can check by running
-
Linux (Ubuntu/Debian):
- Open the Terminal.
- Update your package lists with
sudo apt update. - Install Git by running
sudo apt install git. - Verify the installation with
git --version.
- Clone the Kumori CLI Engine Repository:
- With Git installed, clone the Kumori CLI Engine repository:
git clone https://github.com/tillo13/kumori_cli_engine.git cd kumori_cli_engine
- Set Up a Python Virtual Environment:
Ensure you have Python 3.8 or later installed on your system. Using a virtual environment is highly recommended to prevent conflicts with other Python projects or system-wide packages.
A Python virtual environment (venv) offers a self-contained directory within a project, encapsulating its own installation directories for software and packages. This isolation prevents conflicts between project dependencies and system-wide Python installations.
-
Creating a Virtual Environment:
- Open a terminal (Command Prompt or PowerShell on Windows, Terminal on macOS and Linux).
- Navigate to the project directory:
cd kumori_cli_engine - Create the virtual environment:
- Windows:
python -m venv kumori_venv
- macOS/Linux:
python3 -m venv kumori_venv
- Windows:
kumori_venvis the name of your virtual environment, and you can rename it to anything you like. -
Activating the Virtual Environment: Before installing packages and running your project, activate the virtual environment. Once activated, any Python or pip commands will use the versions in the virtual environment, not the global Python installation.
- Windows:
.\kumori_venv\Scripts\activate
- macOS/Linux:
source kumori_venv/bin/activate
You'll know the virtual environment is activated because the command prompt will show the name of your virtual environment, e.g.,
(kumori_venv) user@hostname:~/kumori$. - Windows:
- Install Required Libraries:
- With the virtual environment activated, install the project requirements:
This will ensure all necessary Python libraries, such as
pip install -r requirements.txt -vdiffusers,opencv-python,insightface,dlib, and others, are installed in your environment. Be patient as this might take several minutes.
- Install PyTorch:
-
The
requirements.txtfile installs a standard PyTorch version. If you have an NVIDIA GPU, you will need a specific version of PyTorch for CUDA compatibility. -
First, uninstall any existing PyTorch packages:
pip uninstall torch torchvision torchaudio -
Visit PyTorch's Get Started page to find the appropriate installation command for your system. Here is an example for CUDA 11.8 used with an NVIDIA RTX 3060:
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 -
For systems without GPU support, a standard PyTorch installation suffices:
pip install torch torchvision torchaudio -
An example output for installing PyTorch might be:
(kumori_venv) PS D:\test> pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118 Looking in indexes: https://download.pytorch.org/whl/cu118 Collecting torch...
Note: The CUDA version (cu118 in the command above) should match your system's CUDA installation if using GPU acceleration.
- Download Additional Required Files:
For facial detection and landmarks analysis and AntelopeV2, ensure you have the following models in your
facial_landmarks_modelandmodeldirectories in the root of the project. Download them from this Google Drive link as they are too large for GitHub.
-
Download and unzip the following files into the
kumori_cli_enginedirectory:/models/facial_landmark_model
-
These folders are crucial for the
gender_detect.pyandestimate_similar_faces.pyscripts to function correctly, allowing for accurate gender detection and pose estimation.
At the end, you should have two folders: kumori_cli_engine/models and kumori_cli_engine/facial_landmarks_model.
- Run the Kumori CLI Engine:
- Ensure you are in the
kumori_cli_enginedirectory and run:
python ./kumori_cli.py
- If everything is set up correctly, you should start seeing output in the
generated_imagesfolder!
Tada! Now enjoy using the Kumori CLI Engine to create amazing images!
This kumori_cli.py stands as the operational core of the project based on the app.py file from InstantID (minus Gradio UI). This script is the entry point for executing the generation process, embodying the fusion of advanced AI models with custom enhancements for an enriched user experience. All configurations and updates you'll change up in CONFIGS.SYS
The configs.py script, serves as a catch-all to make setting changes --it's got all your fun settings, to the core functionality of the Kumori CLI Engine. This extension introduces advanced capabilities primarily focused on optimizing and customizing the input image set before they are processed by kumori_cli.py. Leveraging the insights from facial analysis and pose estimation, it further streamlines the generation process to produce stylized and identity-preserved images en masse and is kinda the central-nervous-system of the configurations that you prefer.
-
Automated Image Looping: One of the main features of
configs.pyis its automated looping mechanism. This functionality allows users to process a batch of images stored in a specified directory without requiring manual intervention for each image. This batch processing significantly accelerates the generation workflow, particularly beneficial for projects with large datasets. -
Dynamic Style Application:
configs.pyenables dynamic application of stylistic elements based on a comprehensive style template. Users can specify or allow the script to randomly apply artistic styles to each image, ensuring a range of visually diverse outputs. This flexibility opens the door for experimentation with various artistic renditions, including those not originally contemplated by the InstantID model. -
Pose Estimation Integration: Through advanced facial landmark detection, this script estimates the pose of subjects within input images and aligns them with predefined or detected styles for more contextually appropriate results. This feature ensures that the generated images maintain a natural and coherent aesthetic, respecting the original posture and orientation of the subjects.
-
Gender-aware Prompt Adjustment: Building on gender detection capabilities,
configs.pyintelligently adjusts prompts and style applications to align with the detected gender of subjects in the images. This sensitivity adds a layer of personalization, ensuring that the stylistic outputs are not just visually appealing but also contextually relevant. -
Custom Configurations for Enhanced Outputs: Users have the ability to tweak various parameters and configurations, adjusting the balance between identity preservation and artistic expression. These configurations are designed to offer a broad spectrum of control over the output characteristics, from fidelity to the source image to the incorporation of fantastical elements.
At its essence, configs.py acts as the intermediary between raw input data and the generative AI models. By processing images in large batches — adjusting dimensions, applying filters, or selecting poses — and specifying generation parameters, it enriches the input data to better suit the model's needs and user expectations. This preprocessing layer ensures a higher quality and more customized output, distinguishing the CLI tool's capabilities from standard GUI image generations implementations.
The generation_log.csv is an innovative addition to the Kumori CLI Engine framework that markedly enhances the usability and analytical capabilities of the tool. Diverging from the standard InstantID implementation, which primarily focuses on the generation outcomes without comprehensive tracking of the generative parameters and results, this logging mechanism provides a detailed record of each generation attempt, encompassing a wide array of parameters and outcomes.
-
Comprehensive Tracking: It logs every crucial parameter involved in the image generation process, including
image_name,new_file_name, model parameters (identitynet_strength_ratio,adapter_strength_ratio,num_inference_steps,guidance_scale),seed, and the successful completion status (success). -
Error Logging: In instances where the generation process encounters issues,
error_messagecaptures the specifics of what went wrong, aiding in troubleshooting and refining the generative approach. -
Style Application Insights: Detailed logging of
style_name, along withpromptandnegative_promptused for generation, offers insights into how different styles and textual cues influence the generation outcomes, providing a window into effective prompt engineering. -
Performance Metrics: Each entry records the
time_takenfor generation and thecurrent_timestamp, facilitating analysis of the system's performance over time and under different configurations. -
Demographic Parameters: Unique to this tool, parameters like
gender,gender_confidence,age, andage_confidenceare logged, demonstrating the tool’s capability to analyze and adapt to the demographic attributes of subjects in the images. -
Pose Analysis: The log provides references to the
chosen_poseandmost_alike_posealong with distances (most_alike_eye_distance,most_alike_nose_mouth_distance,most_alike_distance_difference), indicating the effectiveness and accuracy of pose estimation and matching in the generation process.
This rich logging capability provides an unparalleled advantage for users and researchers aiming to deeply understand the generative process and its dependencies on various parameters and conditions. It allows for:
-
Iterative Improvement: By analyzing logs, users can iteratively adjust parameters to achieve desired outcomes, empirically identifying optimal configurations.
-
Troubleshooting and Optimization: Identification of recurrent issues or bottlenecks becomes easier, guiding users in troubleshooting or further optimizing the generation pipeline.
-
Data-Driven Insights: The accumulated data can be analyzed for patterns or insights, potentially revealing new knowledge about generative AI behaviors, style applicability, or demographic representation effectiveness.
-
Enhanced Experimentation: With detailed records, users can more systematically experiment with different configurations, styles, or demographics, making informed decisions to refine their creative or research endeavors.
Additionally, a utility script summarize_and_merge_generation_log.py is provided to help you analyze and summarize your generation results:
The summarize_and_merge_generation_log.py script is a powerful tool designed to help users determine their favorite outcomes from the generated images. This script analyzes the logs generated during the image generation process, providing detailed statistics and insights into the effectiveness of different models and settings.
- Merges Multiple Log Files: Combines multiple
generation_log.csvfiles into a single merged log if multiple runs were executed. - Tracks and Calculates Metrics: Keeps track of various metrics such as
identitynet_strength_ratio,adapter_strength_ratio,num_inference_steps,guidance_scale, and more for each model used. - Cleans Up Log Files: Removes rows with missing or insufficient data to ensure the accuracy of analysis.
- Provides Detailed Stats: Outputs detailed statistics for individual models as well as combined averages across all models. This includes usage counts, average values of key metrics, and more.
- Efficiency Indexes: Calculates and displays special indexes like the Preference Efficiency Index (PEI) and Model Efficiency Index (MEI) to rank models based on performance and effectiveness.
- Navigate to
generation_imagesFolder: Ensure you are in thegeneration_imagesfolder where your generated images and log files are stored.cd kumori_cli_engine/generation_images
Run the Script: Execute the summarize_and_merge_generation_log.py script using Python.
python summarize_and_merge_generation_log.py
Detailed stats and efficiency indexes will be printed to the console. Individual and combined stats give insights into which models and settings were most effective. Use these insights to fine-tune your configurations and prompts for future runs. By using the summarize_and_merge_generation_log.py script, users can gain valuable insights into their image generation processes, allowing for more informed decisions and better optimization of future generation tasks.
In essence, the generation_log.csv and summarize_and_merge_generation_log.py script not only serve as vital record-keeping tools but also as catalysts for exploration, experimentation, and optimization within the advanced CLI-based image generation tool, setting it apart from and ahead of the traditional InstantID usage scenario.
In essence, the generation_log.csv not only serves as a vital record-keeping tool but also as a catalyst for exploration, experimentation, and optimization within the advanced CLI-based image generation tool, setting it apart from and ahead of the traditional InstantID usage scenario. Used in concert with summarize_and_merge_generation_log.py will help you fine-tune your likes!
gender_detect.py significantly enhances the CLI-based image generation process by introducing robust gender and age detection capabilities. This module empowers the generation tool with the ability to automatically adjust image generation parameters based on the detected gender and age of the subjects within the input images. This facilitates a deeper level of personalization and relevance in the generated artwork, harmoniously aligning with the overall vision of creating highly customized and identity-preserving images for both male and female prompts in the same automation.
-
Seamless Workflow Integration: The gender and age data extracted by
gender_detect.pyare pivotal inconfigs.pyfor deciding on appropriate prompts and styles. Based on the demographic attributes of the subjects in the input images,configs.pycan dynamically adjust its configurations to match, enhancing the relevancy and precision of the generated images. -
Custom Prompt Adjustment: Armed with the knowledge of a subject's gender and approximate age range, the script tailors the generative prompts to better fit the subject's identity, steering the AI towards producing images that not only captivate in style but also remain true to the subject's essence.
-
Style Selection Enhancement: Understanding the gender and age facilitates refined control over style application. For instance, specific styles might be more fitting for a female subject in a given age range, or vice versa, allowing
configs.pyto make informed decisions that greatly influence the stylistic outcome.
-
Increases Relevancy: By leveraging gender and age information, the system ensures that the generated images are not just visually stunning but also contextually appropriate, increasing the relevancy and personalization of the output.
-
Enables Gender-aware Prompting: The detected gender and age information enable
configs.pyto apply gender-aware and age-appropriate prompts. This adds a layer of sophistication to the generation process, ensuring that the final images are reflective of the subjects’ characteristics. -
Fosters Experimentation: With
gender_detect.py, users have the opportunity to experiment how different genders and age groups are represented in various artistic styles. This can lead to fascinating insights into the interaction between identity attributes and artistic expression.
In summary, gender_detect.py stands as a cornerstone module that significantly enriches the image generation process by ensuring that gender and age are taken into account. This careful consideration ensures that each generated image is not only a work of art but a personalized and identity-preserving creation that respects and reflects the subject’s characteristics. The seamless integration with configs.py empowers users to delve into unparalleled levels of customization, setting the foundation for generating images that are truly one-of-a-kind.
The estimate_similar_faces.py script enriches the CLI-based image generation project by incorporating an advanced pose estimation functionality. This module is specifically designed to analyze input images for facial landmarks, estimating poses that closely match those from a predefined repository or dynamically identified styles. This tailored approach significantly refines the preparatory phase of the image generation process, executed by configs.py, ensuring that generated images bear a striking resemblance to the source images not only in style but also in posture and expressiveness.
-
Enhanced Pose Alignment: By accurately estimating the facial pose of subjects in input images,
estimate_similar_faces.pyallowsconfigs.pyto select or adjust prompts and styles that harmonize with the subject's pose. This critical integration ensures that the generative models are primed with contextually cohesive inputs, enhancing the naturalness and authenticity of the generated images. -
Customization of Image Generation: Empowered with precise pose information,
configs.pycan delve into an expanded realm of customization. Depending on the pose features determined byestimate_similar_faces.py, various stylistic nuances and environmental contexts can be dynamically introduced or modified within the generation prompts, offering an enriched canvas for creative expressions. -
Optimization of Style and Pose Dynamics: The script plays a pivotal role in optimizing the interplay between style and pose dynamics. With the detailed pose data sourced from
estimate_similar_faces.py,configs.pyis adept at fine-tuning the generation process to ensure that the stylistic attributes and the physical posture of the subjects are in perfect symbiosis. This synergy is crucial for achieving high-fidelity outcomes that resonate with the original identity and setting of the source images.
-
Elevates Image Realism: Accurate pose estimation directly contributes to the realism of the generated images. By ensuring that the subjects' poses are correctly identified and matched, the final images not only capture the essence of the subject's identity but also reflect their demeanor and spatial context, thus elevating the overall realism.
-
Facilitates Context-aware Generation: The ability to estimate and align poses ensures that the generated images are not just static portraits but vibrant representations that capture subjects in contextually rich settings. This depth of context-awareness adds layers to the storytelling aspect of the images, enriching the narrative woven through visual imagery.
-
Promotes Creative Experimentation: With detailed insights into pose dynamics, users are encouraged to experiment with how different poses interact with various styles and contexts. This experimentation can lead to innovative artistic explorations, uncovering new aesthetical territories within the realm of AI-assisted image generation.
In essence, estimate_similar_faces.py serves as a critical component that seamlessly bridges the gap between raw input imagery and the nuanced requirements of artistic image generation. Through its integration with configs.py, it lays the groundwork for producing images that are not only visually captivating but also deeply personalized, reflecting a true convergence of art, technology, and individual identity.
-
diffusers==0.25.0: Utilized for accessing and interacting with pre-trained diffusion models, crucial for generating high-quality images. -
transformers==4.36.2: Provides easy access to pre-trained models and utilities from Hugging Face, enhancing model interaction capabilities. -
opencv-python: Offers a wide range of image processing and vision functions, instrumental in handling image manipulations for preprocessing and enhancements. -
numpy: Essential for numerical computations, especially for handling arrays and matrices used throughout the image processing phases. -
Pillow(PIL Fork): A powerful library for opening, manipulating, and saving many different image file formats, fundamental for image I/O operations. -
insightface: Facilitates facial analysis tasks, including detection, recognition, and landmark detection, key to the gender and pose estimation functionalities. -
dlib: Another comprehensive toolkit for facial landmark detection, used alongside or as an alternative to InsightFace for detailed facial feature analysis. -
torch: As the primary library for all neural network functionalities, its installation is crucial. Users are advised to follow the specific installation guidelines on PyTorch's official website to ensure compatibility with their hardware, especially for utilizing GPU acceleration. -
Additional Utilities:
accelerate: Simplifies running PyTorch applications on multi-GPU setups.safetensors: Provides a safe and efficient way to serialize and deserialize tensor data.einops: Enhances tensor operations with more readable and expressive transformations.onnxruntime: Enables fast inference for models exported in the ONNX format, augmenting the deployment efficiency.omegaconf: Offers flexible and hierarchical configurations, useful for managing complex setup parameters.peft: Might be used for optimizing execution time of tensor operations, though specifics of its usage weren't detailed.
To fully harness the capabilities of gender_detect.py and estimate_similar_faces.py, specific models and directories are referenced:
-
Model Files for Gender and Age Detection:
- Located under the
gender_detectdirectory, these models (opencv_face_detector.pbtxt,opencv_face_detector_uint8.pb,age_deploy.prototxt,age_net.caffemodel,gender_deploy.prototxt, andgender_net.caffemodel) are critical for executing the gender and age estimation tasks.
- Located under the
-
Facial Landmarks Model for Pose Estimation:
estimate_similar_faces.pyrequires a landmarks model (shape_predictor_68_face_landmarks.dat), typically stored within afacial_landmarks_modeldirectory, to accurately determine facial features and estimate poses.
Looking ahead, there are several exciting avenues I've kicked around for enhancing the Kumori CLI Engine. Mostly thoughts around building upon the existing infrastructure, enabling more refined outputs, greater interactivity, and expanded functionality. Hit me up if any are interesting/want to collaborate:
- Description: Incorporate more HuggingFace models to expand the range of styles, themes, and generation capabilities. This will empower users with a broader set of tools and creative options.
- Implementation: Regularly update the
configs.pywith new model entries and ensure compatibility with existing preprocessing and generation workflows.
- Description: Implement proactive facial detection to automatically suggest poses for the user. This feature can analyze input images and recommend optimal poses to achieve the best image generation results.
- Implementation: Enhance the
estimate_similar_faces.pyto not only estimate but also rank and suggest poses based on facial landmarks analysis.
- Description: Develop advanced post-processing algorithms to enhance the generated images further. Techniques may include sharpening, color correction, and noise reduction.
- Implementation: Integrate libraries like
PillowandOpenCVto apply a series of enhancements after the initial generation, driven by user-configurable settings inconfigs.py.
- Description: Update the
summarize_and_merge_generation_log.pyto automatically set values for successive runs based on previous user preferences and successful outcomes. - Implementation: Implement a feedback mechanism that analyzes the
generation_log.csvand adjusts the parameters dynamically for future generations, learning user preferences over time.
- Description: After a certain number of runs, provide users with suggested best settings based on analysis of previous outputs. This helps users quickly identify optimal configurations for their needs.
- Implementation: Include an analytical module in
summarize_and_merge_generation_log.pythat suggests the best settings after aggregating data from multiple runs.
- Description: Implement a notification system to update users on the progress of their image generation tasks via email, Slack, or text.
- Implementation: Utilize libraries like
smtplibfor email,slack_sdkfor Slack, andtwiliofor text messages to send progress notifications during and after the generation process.
- Description: Track and display detailed performance metrics of different models, helping users understand the trade-offs between speed and quality for various settings.
- Implementation: Modify
generation_log.csvto include metrics such as generation time, GPU/CPU usage, and quality scores. Summarize these metrics insummarize_and_merge_generation_log.py.
- Description: Allow users to manipulate specific facial attributes such as expressions or age while preserving identity. This adds a layer of personalization and creativity.
- Implementation: Integrate with facial attribute GANs to enable controlled modifications of facial features. Include corresponding commands in
configs.py.
- Description: Provide functionality to backup and restore user settings and configurations. This ensures users can easily revert to previous setups or transfer configurations between different environments.
- Implementation: Create backup and restore commands in the CLI that save and load settings from a specified file, ensuring portability and ease of configuration management.












