Earkind
AI generated podcasts that don't bore you to death.
Listen to our first show "GPT Reviews": a daily mix commenting on Artificial Intelligence news, joke attempts, and research paper dives. By our host Giovani Pete Tizzano (an overly hyped AI follower); and collaborators Robert (an often unimpressed analyst), Olivia (an overly online internet surfer) and Belinda (an all around witty research expert).
Contact me at sergi@earkind.com — all feedback is welcome!
How does it work?
The idea was to combine LMs with neural expressive text-to-speech and programmatic audio editing to have a pipeline that basically creates a full podcast episode + description based on a list of news and research papers. It should be both useful and fun, so I've tried to make it non-serious and a bit over-the-top with the transitions, characters, and music.
• It all starts with a .txt file with a few news I select in plain text + a list of 3 recent arXiv papers urls. Title + abstract are crawled from arXiv, and other info is extracted from the raw pdf (noisy) text using the chatGPT API.
• I've created and tuned system and user prompts for each of the sections (intro, outro, transitions, sponsor) and subsections (each news/paper). It's all 0 or 1-shot depending on the intricacy of the prompts (it's easier to provide an example for what you want but if you want creative outputs giving examples often narrows down too much what the model will output). The content discussion is organized as a conversation between 2 characters. One of the most fun parts has been defining characters. We have the host Giovani Pete Tizzano, an overly hyped annoying but earnest tech bro, Robert (a sarcastic unimpressed analyst), and Belinda (a witty research expert who reads all the papers).
• After scripts are created for every section of the program and parsed accordingly (eg. for speaker markers), the "recording phase" begins. I'm using Azure's TTS cause MSFT research is top-notch research (e.g. arxiv.org/abs/2304.09116) + and they deploy their stuff there.
• When all recordings are done, it's time for editing! I've created a bunch of jingles, sound effects, and background music. Using Pydub, narrations, and other audios are combined, volumes adjusted automatically, sections overlayed and loped, etc.
• Finally, automatically generate a pod description with timestamps (you can get that from Pydub when editing the pod) + an overall description + titles generated with chatGPT as well.
I'm super keen to hear thoughts and feedback on whether ppl would be interested in this. While this is very rough, I see a lot of potential on making personalized audio content! Also, I plan on making the code public and publishing a more in-depth explanation on the design and thinking behind it if there's some interest.