Making the Voice of the AI
Today I'm here to share with you the script I wrote to create the voice of our AI character!
This script will go through a text file line by line, turning the words of each line into hiragana and then romaji. These romaji words are then checked against a list of existing wav files named to match Japanese syllables (ru.wav, tsu.wav, a.wav) and will build a .wav file out of these to create that voiced line. For example, the line "Wake up, Master!" will become "wake upu masuteru" and combine wa.wav, ke.wav, u.wav, pu.wav, ma.wav, su.wav, te.wav, and ru.wav to created 1.wav, which will play the audio "wa ke u pu ma su te ru" when opened.
Let's break the code down into chunks. First things first, let's cover the libraries I use and why I chose them:
- romajitable - used to turn English text into similar Hiragana. "My name is Mikhail" -> むゆ・なめ・いす・みくはいる
- pykakasi - since romajitable doesn't have a way to access the romaji generated by it to create the hiragana text above (afaik at least), this turns Japanese text like the above hiragana into romaji
- re - used in the lamba function to parse words into valid romaji syllables
- pydub - used to build wav files. Requires installation of ffmpeg
- os.path - used in checking if wav files exist that match the syllables checked by the lamba function
If you've never worked with Python before, you need to import libraries like so:
import romajitable import pykakasi import re from pydub import AudioSegment import os.path
Next, I initialize a few variables, but the only one I believe needs discussion is the lamba function:
L = lambda x:re.sub('[bghkmnpr]~([auoei]|y[auo])|[sz]~[auoe]|[dt]~[aeo]|w~[ao]|([fv]~|ts)u|(j~|[cs]h)(i|y[auo])|y~[auo]|[auoien]'.replace('~','{1,2}'),'',x)==''
I found this function here, and it checks whether or not a given string is a valid romaji syllable. It's not a perfect check of valid romaji, but it will perfectly evaluate the text feed using this program.
From there, the main meat of the program begins:
with open('test.txt', 'r') as f: lines = f.readlines() for line in lines: romaji = romajitable.to_kana(line) hira = romaji.hiragana.replace("・", "") weeb = kks.convert(hira) romajiLines = [] removedDuplicates = []
I open up my text file, read it line by line, and then turn each line into a romaji line. Romaji table places a dot・in ・between・words, so I removed all of them from the hiragana line before turning the hiragana line into romaji.
Next, I store the romaji lines in an array and remove duplicated lines were created by pykakashi. It's possible my implementation duplicated lines, but I don't currently see where as I reset the arrays each iteration of the line in lines for loop.
for item in weeb: romajiLines.append(format(item['hepburn'])) #turn line of text into romaji for v in romajiLines: if v not in removedDuplicates: print(v) removedDuplicates.append(v) #remove duplicate lines made by pykakasi
So now we've got an array with romaji lines and no duplicates! Next, I needed to get the syllables of each word in that line and store them in an array. I begin by setting up my temporary variables:
for item in removedDuplicates: split_strings = [] n = 2 export_sounds = [] combined_sounds = AudioSegment.empty() combined = AudioSegment.empty()
combined_sounds and combined are calling functions from pydub that open a new instance of a wav file to create. To make sure that I build the files correctly, I need to make sure that the proper files are called, and since my wav files are named to match romaji syllables the thing to check is the text!
for index in range(0, len(item), n): test = item[index : index + n] #increment 2 syllables at a time print(index, test, L(test)) if (L(test) != True): #check if pair of chars is valid romaji, if not... test = item[index : index + n - 1] #try the first char if (L(test)): #if first char is valid romaji print(index, test, L(test)) split_strings.append(test) test = item[index + 1: index + n + 1] if (L(test)): #check if second char is start of new syllable, if it is... print(index, test, L(test)) split_strings.append(test) else: #check if initial 2 chars + next char make a valid syllable test = item[index: index + n + 1] if (L(test)): print(index, test, L(test)) split_strings.append(test) else: continue else: #if valid romaji split_strings.append(test) print(index, test, L(test)) else: #if pair of chars if valid romaji... print(test) split_strings.append(test)
The majority of romaji syllables are two characters long, so this functions goes through each romaji word in a romaji line by two characters at a time. If it's valid romaji, then we add that syllable to an array. If it's not, we check if the first character by itself is and if the second character plus the character in front of it is, and if the two characters being checked plus the next character make a valid syllable.
On the third check of 'wake upu' the program will find 'up,' check that 'u' is valid romaji, check that 'pu' is valid romaji and then add them to the split_strings array that houses each syllable.
The rest of the program is fairly simple. I check that each syllable matches the name of a wav file and if it is add it to an array housing the order of wav files to call.
for syllable in split_strings: syllable = 'C:/Users/User/Desktop/test/voice/' + syllable + '.wav' if (os.path.isfile(syllable)): sound = AudioSegment.from_wav(syllable) export_sounds.append(sound)
Next, that file is created and will be exported every second iteration it will create a file. This is because of the way the text is saved when going line by line, an empty line is added in between each line. This allows text lines to be exported with the correct line number!
for fname in export_sounds: combined += fname if (i % 2) == 0: print(j) generatedFile = 'test/' + str(j) + '.wav' combined.export(generatedFile, format='wav') j += 1 i += 1
Now, this program has some areas for improvement, but currently does a satisfactory job. The two most important areas for improvement I see are:
- Creating 'Engrish' versions of the lines requires knowing the pronunciation of the word that is not stored in just the text of a word. For example, this program will turn 'master' into 'masuteru' but the ideal program would create "masutaa" instead.
- Optimization. I believe this can be cleaned up to look a lot more readable. It currently uses nested for loops, but when I call the lamba regex function I see room for a recursive function. The below code also has some for loops nested in others despite not needing to be. Oops! I also left my print variables in there.
The full code is below:
import romajitable import pykakasi import re from pydub import AudioSegment import os.path kks = pykakasi.kakasi() L = lambda x:re.sub('[bghkmnpr]~([auoei]|y[auo])|[sz]~[auoe]|[dt]~[aeo]|w~[ao]|([fv]~|ts)u|(j~|[cs]h)(i|y[auo])|y~[auo]|[auoien]'.replace('~','{1,2}'),'',x)=='' i = 0 j = 1 with open('test.txt', 'r') as f: lines = f.readlines() for line in lines: romaji = romajitable.to_kana(line) hira = romaji.hiragana.replace("・", "") weeb = kks.convert(hira) romajiLines = [] removedDuplicates = [] for item in weeb: romajiLines.append(format(item['hepburn'])) #turn line of text into romaji for v in romajiLines: if v not in removedDuplicates: removedDuplicates.append(v) #remove duplicate lines made by kks for item in removedDuplicates: split_strings = [] n = 2 export_sounds = [] combined_sounds = AudioSegment.empty() combined = AudioSegment.empty() for index in range(0, len(item), n): test = item[index : index + n] #increment 2 syllables at a time if (L(test) != True): #check if pair of chars is valid romaji, if not... test = item[index : index + n - 1] #try the first char if (L(test)): #if first char is valid romaji split_strings.append(test) test = item[index + 1: index + n + 1] if (L(test)): #check if second char is start of new syllable, if it is... split_strings.append(test) else: #check if initial 2 chars + next char make a valid syllable test = item[index: index + n + 1] if (L(test)): split_strings.append(test) else: continue else: #if valid romaji split_strings.append(test) print(index, test, L(test)) else: #if pair of chars if valid romaji... print(test) split_strings.append(test) for syllable in split_strings: syllable = 'C:/Users/User/Desktop/test/voice/' + syllable + '.wav' if (os.path.isfile(syllable)): sound = AudioSegment.from_wav(syllable) export_sounds.append(sound) for fname in export_sounds: combined += fname if (i % 2) == 0: print(j) generatedFile = 'test/' + str(j) + '.wav' combined.export(generatedFile, format='wav') j += 1 i += 1
Files
Get Once More
Once More
Write the ending you always deserved
Status | In development |
Author | Malheur Games |
Genre | Visual Novel |
Tags | Anime, Dating Sim, Meaningful Choices, Multiple Endings, Story Rich |
More posts
- Naninovel TitleUI CustomizationApr 22, 2021
- Exploring NaniNovelApr 19, 2021
Leave a comment
Log in with itch.io to leave a comment.