The one thing I wanted to do after my book Generative AI in Action was complete was to create a summary in natural speech and possibly use TTS (Text-to-speech) to create an audio summary—think of it as a podcast that is easier for people to consume and get a quick sense of what the book is about.
TTS (Text to Speech) or not to TTS?
Initially, I was inclined towards using TTS (Text to Speech) for the audio summary. This technology, I thought, would be a convenient way to create a podcast-like summary that would be easier for people to consume and get a quick sense of what the book is about. My journey began with TTS - using GPT 4o to create a summary after ingesting the book and then using that as into the Azure AI Speech stack . However, I stumbled across something intriguing. Instead of TTS, I opted for NotebookLM from Google Labs to generate the audio overview - the podcast. This decision marked a significant shift in my approach, and I created two podcasts using the content from the book - one from multiple sources and another from a single source (the book).
NotebookLM is an experimental AI-first notebook from Google Labs designed to help users gain insights faster by grounding the language model in their documents. It aims to assist with synthesizing facts and ideas from multiple sources, making connections quicker and easier. It can help users understand, summarize, and generate new ideas based on their content. What is fascinating is that it can generate audio, which is a natural dialog between two people - with wit, humor, and a natural flow. It is like conversing with someone who has read the book and is summarizing it for you. And if I hadn’t told you that this was AI-generated, it would be hard to tell that it was not a real conversation.
The “Podcasts”
For the first audio generation, the multiple sources I used were:
- The book, and my blog post announcing the book
- A real podcast I did with Miko on his podcast Hockeystick
- And another real podcast with Jamie on his podcast The modern .NET show .
For the second audio generation I only used a single source - the book.
The results of the AI-generated audio are truly impressive. In each instance, the audio was produced in a natural voice, simulating a genuine conversation between two people. The quality of the audio, the conversation, and the flow are nothing short of mind-blowing. Even in the second audio, where some acronyms were not pronounced correctly, it’s a minor issue, considering the audio was generated in just a few minutes with the press of a button. I was genuinely surprised at the level of realism and natural flow in the conversation. 🤯
Have a listen and let me know what you think.
Podcast Summary 1 - using multiple sources
The sources provide a comprehensive overview of generative AI and its application within enterprises. The first source, a YouTube video transcript, features an interview with Amit Bahree, a technical program manager at Microsoft, who discusses the rise of large language models (LLMs) and their potential impact on society. The second source, a book excerpt, delves into the technical aspects of generative AI, covering foundational models, large language models, retrieval-augmented generation, and the architectural principles for building generative AI applications. The book also explores various use cases, including image generation, code generation, and the ethical considerations surrounding the use of generative AI.
Podcast Summary 2 - using single source (the book)
This book is a comprehensive guide to Generative AI, focusing on how this transformative technology can be leveraged within an enterprise. It explains core concepts like foundational models and large language models (LLMs) as well as practical applications for generating various content, including text, images, code, audio, and video. The book also explores responsible AI practices, highlighting the importance of prompt engineering, ethical considerations, and security measures for implementing these technologies. The author emphasizes the need for careful evaluation, monitoring, and scalability when deploying Generative AI models in a production environment.
Conclusion
AI-generated podcasts showcase the potential of AI to revolutionize content creation. Generating natural-sounding audio summaries from text is a game-changer for authors, creators, and educators. As AI continues to advance, we anticipate more opportunities to improve how we access information. I am excited to explore the new possibilities in AI-generated content.