Build with GenAI: Personal News Podcast with Local Llama-3 & xTTS

If you read Medium a lot, I’m guessing you’d like to catch up with the latest trends in the field of tech or generative AI and you probably follow newsletters or podcasts that cover these topics. But have you ever thought of having your very own personal podcast providing you with curated news updates on the topic of your interest like generative AI?

Now it can be a reality with the latest development of generative AI like large language models (LLMs) and text-to-speech (TTS) models. It’s like having an AI news anchor delivering only on the topics of your interest with audio summaries. In this article, let’s build your own personalized news podcast with local Llama-3, the latest open-source LLM from Meta, and xTTS, a natural-sounding TTS engine. You may also find some new inspirations in creating your own product as this can be easily adapted to audio stories, language learning aids, audio tour guides, etc.

Don’t worry if you’re new to coding. I’ll provide detailed code with step-by-step explanations. Let’s dive right in!

The need for a personal podcast

As a product manager, I’m curious about the new technologies and products that get created every day, now with an unprecedented speed after generative AI has become sort of mainstream. I signed up for tons of newsletters and podcasts, but I really don’t have that much time to browse through all of them and many of them actually contain repeated news info. Through a little bit of search online, I found that I’m not the only one who has this issue.

Distill content into a podcast. Created with icons from

It just occurs to me that large language models and TTS would make a perfect combination to solve this problem. And with the newly released Llama-3, you can easily run a quantized 7B model on your consumer-grade computer at a pretty good speed. We can easily use the LLM to create summaries of daily news and then generate easy-to-consume audio using TTS to listen on the go or while cooking.

Understand the flow

The architecture is pretty simple and consists of three main components:

News API: We can use a free News API to fetch the most relevant news articles based on your topics of interest. (You can actually swap this component out with original content like RSS feed for a podcast, or a generative search engine.)Local LLM (Lllama-3): After we’ve got the news articles, we can use a local LLM to summarize the content and rewrite it into a podcast script. Thanks to quantization, we can squeeze it into just under 6GB VRAM (or even smaller with more extreme quantization).TTS: We can then generate audio based on the podcast script using a TTS engine. Here I’m choosing xTTS because it’s a perfect balance between speed and natural speech quality at the time of writing.Flow for creating podcasts from news. Created with icons from

Building it step by step

Setting up the text-to-speech engine

Our first step is to set up the text-to-speech part of this application. I’m choosing xTTS as the TTS engine because it produces natural-sounding speech while supporting pretty long text input. If you’re not satisfied with the premade voices provided by it, you can even clone your own voice or other voices with a very short 6-second audio clip. There are also other TTS models that may be suitable for this job, like GPT-SoVITS, OpenVoice, Parler TTS, etc. But I found xTTS the easiest to work with and have a pretty good quality.

(Huggingface provides a leaderboard/arena of the TTS models available, you can check it out here:

Here we install the TTS library and define a function that turns text into audio, which will be used to generate podcast audio based on the script later.

!pip install TTS

import torch
from TTS.api import TTS

def text_to_audio(input: str) -> str:
device = “cuda” if torch.cuda.is_available() else “cpu”
tts = TTS(“tts_models/multilingual/multi-dataset/xtts_v2”).to(device)

file_path = “output.wav”
tts.tts_to_file(text=input, language=”en”, file_path=file_path, speaker=”Ana Florence”)
# You can replace the speaker with various other options or even use your own voice

del tts
if device == “cuda”:

return file_path

Setting up local Llama-3

After setting up the TTS engine, we can set up the large language model part. For the large language model, to ensure privacy, we can use a local LLM, and Meta’s open-source Llama-3 8B is a great model for its size.

We can easily download a quantized version from Huggingface (you can think of quantization as compressing the model).

# Install llama-cpp-python
!CMAKE_ARGS=”-DLLAMA_CUBLAS=on” FORCE_CMAKE=1 pip install llama-cpp-python –no-cache-dir

%cd /content
!apt-get update -qq && apt-get install -y -qq aria2

# Download a local large language model, here we’re using Llama-3-8B-Instruct-32k-v0.1-GGUF
# If you want to use other local models that can easily run on consumer hardware, check out this repo:
!aria2c –console-log-level=error -c -x 16 -s 16 -k 1M -d /content/model/ -o Llama-3-8B-Instruct-32k-v0.1.Q5_K_M.gguf

With llama-cpp-python, which provides a good abstraction for running quantized LLMs locally, we can easily interact with the model.

# Setting up a local LLM for summarization or chat
from llama_cpp import Llama

def load_llama():
llm = Llama(
model_path=”/content/model/Llama-3-8B-Instruct-32k-v0.1.Q5_K_M.gguf”, # If you’re using another model, change the name
chat_format=”chatml”, # Use the chat_format that matches the model
n_gpu_layers=-1, # Use -1 for all layers on GPU
n_ctx=16384 # Set context size
return llm

def call_llama(input: str, llm) -> str:
llm = llm
output = llm.create_chat_completion(
“role”: “system”,
“content”: “You’re a helpful assistant.”,
}, # Feel free to modify the prompt to suit your own formatting needs
{“role”: “user”, “content”: input},
output_text = output[‘choices’][0][‘message’][‘content’] return output_text

Here the load_llama function loads the model when we need it (we need to load the model after we unload the TTS model, otherwise our consumer-grade graphics card won’t have enough VRAM, but if your graphics card is powerful enough you can load both models at the beginning). The call_llama function is just like calling OpenAI’s GPT API, which can take in a prompt and output the LLM’s response.

Helper functions for getting news and podcast script

Before we assemble everything, we need two more simple functions to get the daily news and let Llama-3 write our podcast script.

We can use a free News API from to get our news based on the topics of your choice. You only need to sign up to get your free API key.

import requests
from datetime import datetime, timedelta

# For demonstration purposes, we use a free News API from, which requires a registration for API key
def get_news_by_keyword(keyword):
date_string = ( – timedelta(days=2)).strftime(‘%Y-%m-%d’) # articles from the past 2 days
apiKey = “<your_API_key>” # replace with your own API key
url = (f’{keyword}&from={date_string}&sortBy=popularity&apiKey={apiKey}&language=en&pageSize=15′)
response = requests.get(url)
response_json = response.json()

response_string = “”
if response_json[‘status’] == ‘ok’:
articles = response_json[‘articles’] for article in articles:
title = article[‘title’] content = article[‘content’] response_string += f”Title: n{title}nContent:n{content}n—n”
print(f”Error: {response_json[‘message’]}”)

return response_json, response_string

The get_news_by_keyword function takes in a topic keyword, fetches the most relevant news articles from the past two days (you can change the time frame) from the News API, and returns the API response as a JSON and a formatted string (for passing on to Llama-3).

To summarize the news and write a podcast script, you can do this with a simple prompt and a call to the Llama-3 model. You can change the prompt so that it will write in a tone or style that you prefer.

def write_podcast_script(headlines_string, llm):
prompt = f”Please write a podcast script based on the headlines content. Your script should only contain spoken parks from the host without transitions, sound effects or speaker descriptions. Podcast name is Daily Wrapup. Host name is Ana Florence. Headlines: n{headlines_string}”
podcast_script = call_llama(prompt, llm)
return podcast_script

Putting it all together

We now have all the components we need for our personal podcast generator. We can put them together in the simple flow we described before: get a topic keyword, get news from a keyword, write a podcast script with Llama-3, then generate podcast audio using xTTS.

def generate_podcast_from_keyword():
# Get a topic from the user
keyword = input(“What should be the topic of your news podcast?n”)

# Get news
news_json, news_string = get_news_by_keyword(keyword)

# Write a podcast script based on the news
llm = load_llama()
podcast_script = write_podcast_script(news_string, llm)
del llm

# Generate audio based on podcast script
audio_file = text_to_audio(podcast_script)
return audio_file

If you’re in Google Colab, you can also play the audio online.

from IPython.display import Audio

# Use the function to generate a podcast
audio_file = generate_podcast_from_keyword()
print(f”Audio file saved at {audio_file}”)

# Play the audio file in Google Colab

Now you can easily put in a keyword of the topic of your interest and get a podcast that’s tailor-made to you. With xTTS’s voice cloning capabilities, you can have anyone be your podcast host. Enjoy your very own AI-made podcast!

Potential improvements and adaptations

With a similar architecture, you can actually make different improvements or adaptations to make your podcast even better or style it as you wish.

For the content, you can swap out the News API with other content like RSS links to actual podcasts, use generative AI search for gathering info, or transcribe YouTube channels as your source of content.

For the local LLM, you can change Llama-3 with other LLMs that provide additional capabilities. For example, you can use an uncensored local LLM if you want to get some spicy podcast, you can use a long-context LLM if you want to squeeze in as much information as possible, and you can use an LLM that excels at foreign languages to make a podcast in a different language.

If you find the podcast too bland with only spoken voices, you can also add on a music generation model to add some background music.


We just explored how to combine both text and audio modalities in generative AI with local Llama-3, xTTS, and a news API. As you can see, it can be really easy to use generative AI models to make applications that were previously difficult to make. You can also quickly adapt this project to other applications that take in a piece of text and output a narrated audio — bedtime stories, virtual tour guides, audiobooks, etc. You’re just bound by your imagination.

I believe generative AI is the democratizing technology that enables anyone to build the applications and solutions they want in their lives. Just give it a try and let’s continue exploring what’s possible with multi-modal generative AI. Thanks for reading and enjoy building!


GitHub repo with full code: with GenAI: Generative AI Search with Local LLM: model arena: of Local LLMs that can be run on consumer hardware:

Build with GenAI: personal news podcast with local Llama-3 & xTTS was originally published in Level Up Coding on Medium, where people are continuing the conversation by highlighting and responding to this story.

​ Level Up Coding – Medium

about Infinite Loop Digital

We support businesses by identifying requirements and helping clients integrate AI seamlessly into their operations.

Gartner Digital Workplace Summit Generative Al

GenAI sessions:

  • 4 Use Cases for Generative AI and ChatGPT in the Digital Workplace
  • How the Power of Generative AI Will Transform Knowledge Management
  • The Perils and Promises of Microsoft 365 Copilot
  • How to Be the Generative AI Champion Your CIO and Organization Need
  • How to Shift Organizational Culture Today to Embrace Generative AI Tomorrow
  • Mitigate the Risks of Generative AI by Enhancing Your Information Governance
  • Cultivate Essential Skills for Collaborating With Artificial Intelligence
  • Ask the Expert: Microsoft 365 Copilot
  • Generative AI Across Digital Workplace Markets
10 – 11 June 2024

London, U.K.