Extract Markdown from a webpage

The /markdown endpoint retrieves a webpage's content and converts it into Markdown format. You can specify a URL and optional parameters to refine the extraction process.

Basic usage

Using a URL

curl
TypeScript SDK

This example fetches the Markdown representation of a webpage.

curl -X 'POST' 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/markdown' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <apiToken>' \
  -d '{
    "url": "https://example.com"
  }'

  "success": true,
  "result": "# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)"
}

import Cloudflare from "cloudflare";

const client = new Cloudflare({
  apiEmail: process.env["CLOUDFLARE_EMAIL"], // This is the default and can be omitted
  apiKey: process.env["CLOUDFLARE_API_KEY"], // This is the default and can be omitted
});

const markdown = await client.browserRendering.markdown.create({
  account_id: "account_id",
});

console.log(markdown);

Use raw HTML

Instead of fetching the content by specifying the URL, you can provide raw HTML content directly.

curl -X 'POST' 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/markdown' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <apiToken>' \
  -d '{
    "html": "<div>Hello World</div>"
  }'

{
  "success": true,
  "result": "Hello World"
}

Advanced usage

You can refine the Markdown extraction by using the rejectRequestPattern parameter. In this example, requests matching the given regex pattern (such as CSS files) are excluded.

curl -X 'POST' 'https://api.cloudflare.com/client/v4/accounts/<accountId>/browser-rendering/markdown' \
  -H 'Content-Type: application/json' \
  -H 'Authorization: Bearer <apiToken>' \
  -d '{
    "url": "https://example.com",
    "rejectRequestPattern": ["/^.*\\.(css)/"]
  }'

{
  "success": true,
  "result": "# Example Domain\n\nThis domain is for use in illustrative examples in documents. You may use this domain in literature without prior coordination or asking for permission.\n\n[More information...](https://www.iana.org/domains/example)"
}

Potential use-cases

Content extraction: Convert a blog post or article into Markdown format for storage or further processing.
Static site generation: Retrieve structured Markdown content for use in static site generators like Jekyll or Hugo.
Automated summarization: Extract key content from web pages while ignoring CSS, scripts, or unnecessary elements.

Was this helpful?

Community
X
Discord
YouTube
GitHub