How to Convert Websites to LLM Ready Data Using Cracker

If you’ve ever tried to feed web pages to an AI model, you know the pain.

The website comes with ads, navigation bars, and dirty HTML. Before your Large Language Model (LLM) can understand the content, you have to clean and format it.

Right there Crackers Makes life easier. It’s an open source API tool that turns any website into clean, structured data that’s ready for LLMS in seconds.

In this tutorial, we will see two ways to use firecrackers. One is through the Cracker API (a paid API with a free tier) and the other is a self-hosted version.

What is a firecracker?

Crackers is a web crawling and scraping service that helps developers collect clean data from websites. You give it a URL, and it returns content in formats like Markdown, HTML, JSON, or even screenshots.

Cracker example - open source and cloud version

Unlike basic scrapers, Cracker understands complex websites that load content with JavaScript. It can crawl through links, follow pages, and handle the heavy lifting like proxy and anti-bot systems automatically.

In short, it does the hard part of collecting web data, so you can focus on using that data for your AI or automation projects.

Why LLM Needs Clean Data

LLMs learn and respond based on the text you assign. If the text contains clutter like HTML tags, scripts, or unrelated parts, the AI gets confused.

Clean, well-structured data helps the model stay focused on real content like article bodies, product details, or documents.

Crackers make this process easy. Instead of spending hours scraping or cleaning up text, you can get ready-to-use content in a single API call.

Setting off firecrackers

To get started, create an account fircrawl.dev And grab your API key. Running Crackers on your machine involves setting up the server, Redis cache, etc. So we will use the API key from Fircrawl.dev to test the API.

We can also quickly test its capabilities in the website’s UI.

Let’s use it as a domain to see if crackers can return some results.

Creepy Free Codecamp

And yes, we can see many URLs being scraped by Cracker.

Firecracker results

Now access the crackers using the code. The free plan lets you scrape 500 pages, so we need to understand how it works.

You can use either Python SDK, node.js sdkor direct API requests with curl.

Here you install the SDK:

Python:

pip install firecrawl-py

node.js:

npm install @mendable/firecrawl-js

Once installed, all you need to do is set up your API key and you’re ready to crawl.

Scraping a single page

Let’s say you want to extract important content from the home page of a cracker. You can do this in just a few lines.

Python example:

from firecrawl import Firecrawl

firecrawl = Firecrawl(api_key="fc-YOUR_API_KEY")

doc = firecrawl.scrape(
    "https://firecrawl.dev",
    formats=("markdown", "html")
)

print(doc.markdown)

This script returns a clean version of the page in Markdown format, perfect for reading or analyzing LLM.

With this one command, you get the basic text, free of HTML clutter.

Crawl an entire website

If you need data from multiple pages, such as a full documentation site, you can crawl the entire domain. Crackers finds all links and deletes them automatically.

Example API call:

curl -X POST  \
  -H 'Authorization: Bearer fc-YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "url": "
    "limit": 10,
    "scrapeOptions": {
      "formats": ("markdown", "html")
    }
  }'

This starts the crawl job and returns the job ID. Once done, you can download all scraped pages in clean, LLM-ready formats.

One of the best features of Crackers is being powered by AI. You can tell Firecracker to read a page and return structured data, such as a product’s price, description, or reviews, in JSON format.

Example:

curl -X POST  \
  -H 'Authorization: Bearer fc-YOUR_API_KEY' \
  -H 'Content-Type: application/json' \
  -d '{
    "urls": ("
    "prompt": "Extract the company mission and whether it is open source.",
    "schema": {
      "type": "object",
      "properties": {
        "company_mission": { "type": "string" },
        "is_open_source": { "type": "boolean" }
      }
    }
  }'

Uses built-in LLM to read mature content and automatically fill in structures. You can even skip the schema and just provide natural language notation, like:

“Remove all pricing details and feature names from this page.”

It’s ideal for AI pipelines, RAG (retrieval-related generation) systems, or dashboards that rely on clean, structured data.

A self-hosted cracker using seola

Cracker is open source, which means you don’t need to pay for the API if you prefer full control. You can deploy it on your server and customize it.

You can install Cracker on your local machine by configuring database, cache and other required components. But this setup will only work for local projects and will not allow you to build or deploy applications that use crackers.

To install Cracker, you can choose any cloud provider AWSfor , for , for , . Hirokoor others to set up this project. But I will use Siola.

Seoul is a modern, usability-oriented platform-as-a-service provider. It offers application hosting, database, object storage, and static site hosting for your projects.

I’m using Sevilla for hosting for two reasons:

Each platform will charge you to create a cloud resource. Siola comes with a $50 credit for our use, so we won’t incur any costs for this instance.
Seoul has one Template for crackersso it simplifies manual installation and setup for every resource you’ll need for crackers.

Seoul Templates

Click “Deploy Now” and select the server in the popup, and click “Deploy”. Seoul will start supplying us with these resources to run our cracker.

Firecracker resources

Once the configuration is complete, you will see three events delivered:

Go to FireCrawl-API Request Under the Deployment section, click “View App” after the deployment is complete.

Deployment of firecrackers

Now you can use your private endpoint in your applications. My API URL is (This is a temporary URL – don’t use it), so I can replace api.firecrawl.dev with this URL.

curl -X POST /v2/extract \
  -H 'Content-Type: application/json' \
  -d '{
    "urls": ("
    "prompt": "Extract the company mission and whether it is open source.",
    "schema": {
      "type": "object",
      "properties": {
        "company_mission": { "type": "string" },
        "is_open_source": { "type": "boolean" }
      }
    }
  }'

If you want to run the project locally by installing applications like Redis, PostgreSQL, and Dramatist, Here is a detailed guide.

Use cases

Developers and data scientists use Crackers for a wide range of tasks. They often rely on document sites to turn them into training data for large language models, ensuring that their models can learn from accurate and systematic sources.

Others use it to collect blog posts or news articles Sentiment analysishelping them understand trends, opinions, or public reactions across the web.

Cracker is also valuable for monitoring changes to web content, which is essential for research projects or compliance tracking where up-to-date information is important.

Teams can also use it to create “chat with your website” AI assistants that can answer questions based on the site’s latest content.

In each of these cases, Firecracker ensures that your model receives clean, structured and consistent data, making it easier to develop reliable and intelligent AI systems.

The result

Converting messy websites into readable text was one of the hardest parts of building an AI system. Crackers change that. With one API call, you can scrape, crawl, and extract high-quality data that your LLM can instantly understand.

If you’re building anything related to AI, RAG, or data pipelines, Firecracker is one of those tools you’ll wish you’d discovered earlier.

Table of Contents

What is a firecracker?

Why LLM Needs Clean Data

Setting off firecrackers

Scraping a single page

Crawl an entire website

A self-hosted cracker using seola

Use cases

The result

Editor's pick

Get latest news

How to Convert Websites to LLM Ready Data Using Cracker

Table of Contents

What is a firecracker?

Why LLM Needs Clean Data

Setting off firecrackers

Scraping a single page

Crawl an entire website

A self-hosted cracker using seola

Use cases

The result

The Psychology of Stealing a Bad Data Story: Why People Misread Your Data

The top framework for game developers

You may also like

Leave a Comment Cancel Reply

Editor's pick

Get latest news