How to turn your favorite tech blogs into personal podcasts

Keeping up with tech news these days feels almost impossible. I retreat for three days, and suddenly there’s a new AI model, a new framework, and a new tool that everyone says I should learn. Reading everything isn’t the scales anymore, but I still want to stay informed.

So instead of giving up I decided to change the format. I took some tech blogs that I already enjoy reading, selected the best articles, converted them to audio using my voice, and turned the result into a private podcast. Now I can stay up-to-date while walking, running or driving.

In this tutorial, you’ll learn how to build a simplified version of this pipeline step by step.

What are you going to make?

You a node. js script that does the following:

Fetch articles from RSS feeds.
Extracts clean, readable text from every article.
Filters out content you don’t want to hear.
Cleans up text so it sounds nice when speaking.
Converts text to natural-sounding audio using your voice.
Upload the audio to CloudFlyer R2.
Generates a podcast RSS feed.
Runs automatically on schedule.

Finally, you’ll have a real podcast feed you can subscribe to on your phone.

Produced podcasts feature converted blog posts as episodes.

If you want to skip the tutorial and jump right into using the ready-made tool, you can find the full version and instructions G.ITHUB

Conditions

To follow this, you need basic knowledge of JavaScript.

You also need:

node.js 22 or newer.
A place to store audio files (Cloud Flyer R2 in this tutorial).
A text-to-speech API (The orang clone in this tutorial).

Project overview

Before writing the code, it helps to understand the idea clearly.

This project is a pipeline:

Fetch content -> Filter content -> Clean up content -> Convert to audio -> Repeat

Each step takes the product of the previous one. Keeping the flow linear makes the project easy to reason, debug and automate.

All of the code in this tutorial resides in a file called index.js.

to begin

Create a new project folder and your main file.

mkdir podcast-pipeline
cd podcast-pipeline
touch index.js

Start the project and install the dependencies.

npm init -y
npm install rss-parser @mozilla/readability jsdom node-fetch uuid xmlbuilder @aws-sdk/client-s3

Enable ESM import The syntax in node 22 works.

npm pkg set type=module

Here is what is used for each dependency:

rss-parser Reads the RSS feed.
@mozilla/readability Extracts readable article text.
jsdom Provides a DOM for readability.
node-fetch Brings remote content.
uuid Generates unique filenames.
xmlbuilder Produces a podcast RSS feed.
@aws-sdk/client-s3 Upload audio to CloudFlyer R2.

How to get content

The first decision is where your content comes from.

Avoid scraping websites directly. Scrapped HTML is noisy and inconsistent. RSS feeds are structured and reliable. Most serious blogs provide one.

open index.js And explain your sources.

import Parser from "rss-parser";
import fetch from "node-fetch";
import { JSDOM } from "jsdom";
import { Readability } from "@mozilla/readability";

const parser = new Parser();

const NUMBER_OF_ARTICLES_TO_FETCH = 15;

const SOURCES = (
  "https://www.freecodecamp.org/news/rss/",
  "https://hnrss.org/frontpage",
);

Now fetch the articles and extract the readable content.

async function fetchArticles() {
  const articles = ();

  for (const source of SOURCES) {
    const feed = await parser.parseURL(source);

    for (const item of feed.items.slice(0, NUMBER_OF_ARTICLES_TO_FETCH)) {
      if (!item.link) continue;

      const response = await fetch(item.link);
      const html = await response.text();

      const dom = new JSDOM(html, { url: item.link });
      const reader = new Readability(dom.window.document);
      const content = reader.parse();

      if (!content) continue;

      articles.push({
        title: item.title,
        link: item.link,
        content: content.content,
        text: content.textContent,
      });
    }
  }

  return articles.slice(0, NUMBER_OF_ARTICLES_TO_FETCH);
}

This function:

How to filter content

Not every article deserves your attention. Start by filtering out the topics you don’t want to hear about.

const BLOCKED_KEYWORDS = ("crypto", "nft", "giveaway");

function filterByKeywords(articles) {
  return articles.filter(
    (article) =>
      !BLOCKED_KEYWORDS.some((keyword) =>
        article.text.toLowerCase().includes(keyword)
      )
  );
}

Next, remove the promotional content.

function removePromotionalContent(articles) {
  return articles.filter(
    (article) => !article.text.toLowerCase().includes("sponsored")
  );
}

Finally, remove articles that are too short.

function filterByWordCount(articles, minWords = 700) {
  return articles.filter(
    (article) => article.text.split(/\s+/).length >= minWords
  );
}

After these steps, you’re left with articles you really want to hear about.

How to clean content

Raw subject text needs to be cleaned to sound good when speaking. First, replace the images with speaking placeholders.

function replaceImages(html) {
  return html.replace(/)*alt="((^")*)"(^>)*>/gi, (_, alt) => {
    return alt ? `(Image: ${alt})` : `(Image omitted)`;
  });
}

Next, remove the code blocks.

function replaceCodeBlocks(html) {
  return html.replace(
    /(\s\S)*?<\/code><\/pre>/gifor , for , for , .

"(code example omitted)"

) ; Doh

Strip URLs and replace them with spoken text.
function replaceUrls(text) {
  return text.replace(/https?:\/\/\S+/gi, "link removed");
}
Normalize normal symptoms.
function normalizeSymbols(text) {
  return text
    .replace(/&/g, "and")
    .replace(/%/g, "percent")
    .replace(/\$/g, "dollar");
}
Convert HTML to text so TTS doesn’t read tags.
function stripHtml(html) {
  return html.replace(/<(^>)+>/g, " ");
}
Combine everything in one cleaning step.
function cleanArticle(article) {
  let cleaned = replaceImages(article.content);
  cleaned = replaceCodeBlocks(cleaned);
  cleaned = stripHtml(cleaned);
  cleaned = replaceUrls(cleaned);
  cleaned = normalizeSymbols(cleaned);

  return {
    ...article,
    cleanedText: cleaned,
  };
}
At this point, the text is ready for audio generation.
How to convert content to audio
Browser speech APIs sound robotic. I wanted something that was human and familiar. After trying several tools, I settled on OrangeClone. It was the only option that actually sounded like me.
Create a free account and copy your API key from the dashboard.
Record 10 to 15 seconds of clean audio and save it SAMPLE_VOICE.wav At the root of the project. Then create a voice character (one-time setup).
import fs from "node:fs/promises";

const ORANGECLONE_API_KEY = process.env.ORANGECLONE_API_KEY;
const ORANGECLONE_BASE_URL =
  process.env.ORANGECLONE_BASE_URL || "https://orangeclone.com/api";

async function createVoiceCharacter({ name, avatarStyle, voiceSamplePath }) {
  const audioBuffer = await fs.readFile(voiceSamplePath);
  const audioBase64 = audioBuffer.toString("base64");

  const response = await fetch(
    `${ORANGECLONE_BASE_URL}/characters/create`,
    {
      method: "POST",
      headers: {
        Authorization: `Bearer ${ORANGECLONE_API_KEY}`,
        "Content-Type": "application/json",
      },
      body: JSON.stringify({
        name,
        avatarStyle,
        voiceSample: {
          format: "wav",
          data: audioBase64,
        },
      }),
    }
  );

  if (!response.ok) {
    const errorText = await response.text();
    throw new Error(`Failed to create character: ${errorText}`);
  }

  const data = await response.json();

  return (
    data.data?.id ||
    data.data?.characterId ||
    data.id ||
    data.characterId
  );
}
Generate audio from text.
async function generateAudio(characterId, text) {
  const response = await fetch(`${ORANGECLONE_BASE_URL}/voices_clone`, {
    method: "POST",
    headers: {
      Authorization: `Bearer ${ORANGECLONE_API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      characterId,
      text,
    }),
  });

  return response.json();
}
Wait for the job to complete.
async function waitForAudio(jobId) {
  while (true) {
    const response = await fetch(`${ORANGECLONE_BASE_URL}/voices/${jobId}`);
    const data = await response.json();

    if (data.status === "completed") {
      return data.audioUrl;
    }

    await new Promise((r) => setTimeout(r, 5000));
  }
}
How to Upload Audio to CloudFlyer R2
Orangclone returns an audio URL, but podcast apps need a stable, public file that won’t expire.
That’s where the CloudFlyer R2 comes in.
R2 is S3-compatible storage, which means we can upload files using the AWS SDK and serve them publicly for podcast apps.
How to set credentials
Create an R2 bucket in your CloudFlyer dashboard and set the following environment variables:
R2_ACCOUNT_ID
R2_ACCESS_KEY_ID
R2_SECRET_ACCESS_KEY
R2_BUCKET_NAME
R2_PUBLIC_URL
These values allow the script to upload files and generate a public URL for them.
How to start the R2 client
import { S3Client, PutObjectCommand } from "@aws-sdk/client-s3";

const r2 = new S3Client({
  region: "auto",
  endpoint: `https://${process.env.R2_ACCOUNT_ID}.r2.cloudflarestorage.com`,
  credentials: {
    accessKeyId: process.env.R2_ACCESS_KEY_ID,
    secretAccessKey: process.env.R2_SECRET_ACCESS_KEY,
  },
});
This creates an S3-compatible client that connects directly to your CloudFlyer R2 account instead of AWS.
How to download audio
async function downloadAudio(audioUrl) {
  const response = await fetch(audioUrl);
  const buffer = await response.arrayBuffer();
  return Buffer.from(buffer);
}
Orangclone gives us a URL, not a file.
This function downloads the audio and sends it to node. js buffer so that it can be uploaded to R2.
How to Upload to R2
import { v4 as uuid } from "uuid";

async function uploadToR2(audioBuffer) {
  const fileName = `${uuid()}.mp3`;

  const command = new PutObjectCommand({
    Bucket: process.env.R2_BUCKET_NAME,
    Key: fileName,
    Body: audioBuffer,
    ContentType: "audio/mpeg",
  });

  await r2.send(command);

  return `${process.env.R2_PUBLIC_URL}/${fileName}`;
}
This function uploads the audio buffer to R2 using a unique filename and returns a public URL that podcast apps can access.
Putting it together
const audioUrl = await waitForAudio(jobId);
const audioBuffer = await downloadAudio(audioUrl);
const publicAudioUrl = await uploadToR2(audioBuffer);
At the end of this step, publicAudioUrl A podcast is the ultimate audio file used in an RSS feed.
How to Create a Podcast
With a public audio URL, you can now generate an RSS feed.
import xmlbuilder from "xmlbuilder";

function generatePodcastFeed(episodes) {
  const feed = xmlbuilder
    .create("rss", { version: "1.0" })
    .att("version", "2.0")
    .ele("channel");

  feed.ele("title", "My Tech Podcast");
  feed.ele("description", "Tech articles converted to audio");
  feed.ele("link", "https://your-site.com");

  episodes.forEach((ep) => {
    const item = feed.ele("item");
    item.ele("title", ep.title);
    item.ele("enclosure", {
      url: ep.audioUrl,
      type: "audio/mpeg",
    });
  });

  return feed.end({ pretty: true });
}
How to automate a pipeline
Automation in this project takes place in two phases. First, the code itself must be able to process multiple subjects in a single run. Second, the script should run automatically on a schedule. We’ll start with code-level automation.
Automated within code
Previously, we obtained fifteen articles. Now we need to make sure that every article that passes through our filters goes through the full pipeline.
Add near the bottom of the following function index.js.
async function runPipeline() {
  const rawArticles = await fetchArticles();

  const filteredArticles = filterByWordCount(
    removePromotionalContent(filterByKeywords(rawArticles))
  );

  if (filteredArticles.length === 0) {
    console.log("No articles passed the filters");
    return ();
  }

  const characterId = await createVoiceCharacter({
    name: "My Voice",
    avatarStyle: "realistic",
    voiceSamplePath: "./SAMPLE_VOICE.wav",
  });

  const episodes = ();

  for (const article of filteredArticles) {
    console.log(`Processing: ${article.title}`);

    const cleaned = cleanArticle(article);

    const job = await generateAudio(characterId, cleaned.cleanedText);

    const audioUrl = await waitForAudio(job.id);
    const audioBuffer = await downloadAudio(audioUrl);
    const publicAudioUrl = await uploadToR2(audioBuffer);

    episodes.push({
      title: article.title,
      audioUrl: publicAudioUrl,
    });
  }

  return episodes;
}
This function does all the heavy lifting:
Brings articles
Applies all filters
Creates a voice character once
Loop through each valid subject
Converts every article to audio
Upload the audio to CloudFlyer R2
Collects podcast episode data
At this point, a single script run can generate multiple podcast episodes.
Running the pipeline and generating feed
Now we need a single entry point that runs the pipeline and writes the podcast feed. Add it below the pipeline function.
import fs from "node:fs/promises";

async function main() {
  const episodes = await runPipeline();

  if (episodes.length === 0) {
    console.log("No episodes generated");
    return;
  }

  const rss = generatePodcastFeed(episodes);

  await fs.mkdir("./public", { recursive: true });
  await fs.writeFile("./public/feed.xml", rss);

  console.log("Podcast feed generated at public/feed.xml");
}

main().catch(console.error);
When you run node index.jsnow this:
Processes all selected subjects
Creates multiple audio files
Creates an accurate podcast RSS feed
This is basic automation.
Scheduling a pipeline with GitHub actions
The last step is to run this script automatically. Create a GitHub Actions workflow file .github/workflows/podcast.yml.
name: Podcast Pipeline

on:
  schedule:
    - cron: "0 6 * * *"

jobs:
  run:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-node@v4
        with:
          node-version: 22
      - run: npm install
      - run: node index.js
        env:
          ORANGECLONE_API_KEY: ${{ secrets.ORANGECLONE_API_KEY }}
          R2_ACCOUNT_ID: ${{ secrets.R2_ACCOUNT_ID }}
          R2_ACCESS_KEY_ID: ${{ secrets.R2_ACCESS_KEY_ID }}
          R2_SECRET_ACCESS_KEY: ${{ secrets.R2_SECRET_ACCESS_KEY }}
          R2_BUCKET_NAME: ${{ secrets.R2_BUCKET_NAME }}
          R2_PUBLIC_URL: ${{ secrets.R2_PUBLIC_URL }}
This workflow runs the pipeline every morning at 6am.
Each run:
Brings new subjects
Produces fresh audio
Updates the podcast feed
Once it’s set up, your podcast updates itself without manual work.
The result
This is a basic version of my complete production pipeline, Postcastbut the basic idea is the same.
Now you know how to turn blogs into personal podcasts. Be aware of copyright and only use content that you have permission to use.
If you have questions, reach me at X @sprucekhalifa. I regularly write practical tech articles like this one.

Table of Contents

What are you going to make?

Conditions

Project overview

to begin

How to get content

How to filter content

How to clean content

How to convert content to audio

How to Upload Audio to CloudFlyer R2

How to set credentials

How to start the R2 client

How to download audio

How to Upload to R2

Putting it together

How to Create a Podcast

How to automate a pipeline

Automated within code

Running the pipeline and generating feed

Scheduling a pipeline with GitHub actions

The result

Editor's pick

Get latest news

How to turn your favorite tech blogs into personal podcasts

Table of Contents

What are you going to make?

Conditions

Project overview

to begin

How to get content

How to filter content

How to clean content

How to convert content to audio

How to Upload Audio to CloudFlyer R2

How to set credentials

How to start the R2 client

How to download audio

How to Upload to R2

Putting it together

How to Create a Podcast

How to automate a pipeline

Automated within code

Running the pipeline and generating feed

Scheduling a pipeline with GitHub actions

The result

Learn dynamic programming with dynamic visuals

Yan Lakin’s new project is a counterintuitive bet against big language models

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news