Nikkie began his career as a software engineer in 2016. He started Python as a hobby in 2017 and fell in love with it. He is engaged in Natural Language Processing as a data scientist at Uzabase, inc. Tokyo, Japan from 2019. He is working on the Python community in Japan as a staff of the following event: - [PyCon Japan](https://www.pycon.jp/organizer/index.html): the largest PyCon in Japan - staff on 2019 and 2020 (Program committee, lead on 2020) - [chair](https://pyconjp.blogspot.com/2020/10/pyconjp-2021-chair.html) on 2021 He gave a talk (and lightning talks) at many PyCons in Japan and abroad. - EuroPython 2020, [PyCon APAC 2020](https://youtu.be/JiXnEA7pM7U) (English) He loves anime (Japanese animetation) as much as Python, and implements ideas related to some anime with Python. In 2022, he write code related to "Sing a Bit of Harmony" (e.g. Twitter bot, prototyping AI character, e.t.c.).
Abstract
How can we create a program that can speak (not write) with a human? I love anime and fell in love with a movie "Sing a Bit of Harmony"(讓我聽見愛的歌聲). The character, AI (robot) Shion, is very attractive from an engineer's point of view, and I wanted to implement even some of its functions. I implemented shion.py, which allows humans to enter text by voice and the script responds by voice. In short, it is like a smart speaker that parrots. In other word, the program reads aloud the spoken texts. I started with an easy implementation (with Web API and OS command) to check the idea and then reworked it with pre-trained machine learning models to get closer to Shion. I will share those implementations with you. I would be happy to provide a little inspiration for your Maker project. Keywords like hashtag: #TTS, #ASR, #subprocess, #SpeechRecognition, #ttslearn #ESPnet, #soundfile, #HuggingFace
Description
The movie: Sing a Bit of Harmony
In my opinion, this is an awesome film.
It has the distinction of winning several film festivals:
https://en.wikipedia.org/wiki/Sing_a_Bit_of_Harmony#Reception
In October 2021, Sing a Bit of Harmony won the Audience Award at the Scotland Loves Animation film festival.
In Japan, fans support their favorite animated films by drawing illustrations.
I cannot draw illustrations, but I wanted to support this movie somehow.
In this film, an AI (robot, android) named Shion plays an key role.
Since Shion is an AI, some parts of it can be reproduced by writing a program.
So, instead of illustrations, I decided to support this film by implementing some of Shion's features.
I started with implementation of Shion small.
I implement Shion as software. (As for the hardware, it is a future work)
I defined Shion v0.0.1 as a program that enables the following:
Text processing is also worth devising, but this time the focus is on handling speech.
[1]: call say
command (macOS) like https://docs.python.org/3/howto/logging-cookbook.html#speaking-logging-messages
[2] https://cloud.google.com/speech-to-text
⚠️Mainly deals with ASR and TTS in Japanese. Best effort for ASR and TTS in English.
⚠️I am a beginner of ASR and TTS, so the focus will be on what implementations are possible. (I will not deal with the theory)
Video