RVC AI: Retrieval-based Voice Conversion

[vc_row][vc_column][vc_headings linewidth=”0″ borderwidth=”1″ borderclr=”#000000″ title=”RVC AI” google_fonts=”font_family:Comfortaa%3A300%2Cregular%2C700|font_style:700%20bold%20regular%3A700%3Anormal” titlesize=”60″ titleclr=”#000000″]

Retrieval-based-Voice-Conversion-WebUI

[/vc_headings][vc_single_image image=”3056″ alignment=”center” onclick=”custom_link” img_link_target=”_blank”][vc_column_text]RVC AI – Retrieval-based Voice Conversion is a technique that uses a deep neural network to transform the voice of a speaker into another voice. It is based on the VITS model, which is a state-of-the-art end-to-end text-to-speech system. RVC can be used to create realistic and expressive voice conversions with minimal data and computational resources.[/vc_column_text][vc_btn title=”Visit Project” color=”warning” align=”center” i_align=”right” i_icon_fontawesome=”fas fa-external-link-alt” add_icon=”true” link=”url:https%3A%2F%2Fgithub.com%2FRVC-Project%2FRetrieval-based-Voice-Conversion-WebUI%2Fblob%2Fmain%2Fdocs%2FREADME.en.md”][vc_separator][/vc_column][/vc_row][vc_row][vc_column][vc_headings style=”theme4″ borderclr=”#000000″ style2=”image” title=”Features” google_fonts=”font_family:Comfortaa%3A300%2Cregular%2C700|font_style:700%20bold%20regular%3A700%3Anormal” lineheight=”3″ titlesize=”40″ titleclr=”#000000″ image_id=”2871″][/vc_headings][vc_separator color=”sandy_brown” border_width=”3″][vc_column_text]✅Minimize tone leakage by substituting source feature with training-set feature from top1 retrieval;

✅Train easily and quickly, even with low-end graphics cards;

✅Achieve decent results with little data (>=10min low noise speech recommended);

✅Support model fusion to alter timbres (use ckpt processing tab->ckpt merge);

✅User-friendly Webui interface;

✅Use the UVR5 model to separate vocals and instruments fast.[/vc_column_text][/vc_column][/vc_row][vc_row][vc_column][vc_separator][vc_headings style=”theme4″ borderclr=”#000000″ style2=”image” title=”Preparing Environment” google_fonts=”font_family:Comfortaa%3A300%2Cregular%2C700|font_style:700%20bold%20regular%3A700%3Anormal” lineheight=”3″ titlesize=”40″ titleclr=”#000000″ image_id=”2854″][/vc_headings][vc_column_text]To begin, you can install the necessary core dependencies for PyTorch. If you already have them installed, you can skip this step. Please refer to the following link for more information:

https://pytorch.org/get-started/locally/

Use the following command to install the required packages:

pip install torch torchvision torchaudio

For Windows users with Nvidia Ampere Architecture (RTX30xx), it is necessary to specify the CUDA version corresponding to PyTorch. You can refer to the experience shared on this GitHub issue:

https://github.com/liujing04/Retrieval-based-Voice-Conversion-WebUI/issues/21

Use the following command to install PyTorch with the specific CUDA version for Windows + Nvidia Ampere Architecture:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

Next, you will need to install the Poetry dependency management tool. If you already have it installed, you can skip this step. Please follow the instructions provided in the following link: https://python-poetry.org/docs/#installation

Use the following command to install Poetry:

curl -sSL https://install.python-poetry.org | python3 -

Finally, you can install the dependencies required for the project. Use the following command:

poetry install

[/vc_column_text][vc_message]faiss 1.7.2 will raise Segmentation Fault: 11 under MacOS, please use pip install faiss-cpu==1.7.0 if you use pip to install it manually.

pip install -r requirements.txt

[/vc_message][vc_separator][/vc_column][/vc_row][vc_row][vc_column][vc_headings style=”theme4″ borderclr=”#000000″ style2=”image” title=”Other Pre-Models Preparation” google_fonts=”font_family:Comfortaa%3A300%2Cregular%2C700|font_style:700%20bold%20regular%3A700%3Anormal” lineheight=”3″ titlesize=”40″ titleclr=”#000000″ image_id=”2854″][/vc_headings][vc_column_text]RVC AI depends on some pre-trained models for inference and training.

You can get them from Huggingface space.

These are the pre-trained models and other files that RVC uses:

hubert_base.pt

./pretrained 

./uvr5_weights

To use the v2 version model, which has a 12-layer Hubert input of 768 dimensions and 3 period discriminators, instead of a 9-layer Hubert+final_proj input of 256 dimensions, you need to download extra features.

./pretrained_v2

#If you are using Windows, you may also need this dictionary, skip if FFmpeg is installed
ffmpeg.exe

Then use this command to start Webui: