The Alphsistant Project

Context

Welcome to the Alphsistant Project ! After spending the year making game protoypes, I felt I was rather lost without a specific purpose. When I realised the end of my PhD was getting closer, I thought it would be nice to make a more global project that would feature all of my interests and skills. The Alphsistant Project aim is to create a vocal assistant with a face, powered by deep learning. Ideally, I want it to be able to generate text, speak it, synchronise facial animations. Additional features should be added over time.

Workflow

To keep myself motivated, I plan to first create the full pipeline, and then refine and improve each part. Hence, here are the things that shall be done:

  • Sculpt the face. Retopologize.
  • Text to Speech model
  • Text Generation
  • Synchronise face with text
  • Make a pipeline

The Face

I created a first version of the face (which will probably be adapted over time). A video with my very first shape keys is available here For the face retopology, I followed this tutorial

Text to Speech

It is my first contact with text to speech (TTS) approaches. Having not much of an expertise in this field, I followed sendtex’s advices and went with the DC TTS model (Covolutional model with Guided Attention). It is a two-staged model, with a first part that transforms text input to a spectrogram while the second part upscales the spectrogram. Sendtex’s video has a link to the github page holding the model.

Text Generation

For now, I am going with the well-known GPT-2 from OpenAI. Check the HuggingFace Github page for the model and relevant instructions.

Conclusion

The Alphsistant Project is just getting started and a lot of work lies ahead ! I hope that keeping track of my progresses through blog posts will ensure motivation stays as high as possible ! Stay tuned :)