What is HTML to Markdown Conversion?

HTML to Markdown conversion transforms HTML markup into Markdown, a lightweight plain-text formatting syntax created by John Gruber in 2004. Markdown was designed to be readable without rendering. Where HTML requires opening and closing tags like <strong> and <a href="">, Markdown uses shorthand characters: **bold**, [links](url), and # headings. Converting HTML to Markdown produces files that are easier to read, edit, and version-control than raw HTML.

The conversion process maps HTML elements to their Markdown equivalents. A <h2> becomes ##, a <ul><li> becomes - item, and an <a> tag becomes [text](url). Some HTML elements have no direct Markdown equivalent, such as <div>, <span>, or custom data attributes. Converters typically strip these tags or pass them through as raw HTML, depending on configuration.

Markdown has become the standard writing format for developer documentation (GitHub, GitLab, Bitbucket), static site generators (Hugo, Jekyll, Astro), note-taking apps (Obsidian, Notion), and technical blogs. Converting existing HTML content to Markdown is a common step when migrating websites, importing CMS content, or archiving web pages in a portable format. Unlike HTML, Markdown files produce clean diffs in version control, making code review of documentation changes practical.

Why Use an HTML to Markdown Converter?

Manually rewriting HTML as Markdown is slow and error-prone, especially for pages with nested lists, tables, or dozens of links. An automated converter handles the structural mapping instantly and consistently.

⚡

Convert instantly in your browser

Paste HTML and get Markdown output in milliseconds. No server round-trip, no waiting for processing queues. The conversion runs entirely in your browser using JavaScript.

🔒

Keep your data private

Your HTML never leaves your machine. All processing happens client-side, so there is no upload, no logging, and no third-party access to your content.

📝

Preserve document structure

Headings, lists, links, images, code blocks, and tables are mapped to their correct Markdown equivalents. Nested structures and inline formatting are handled recursively.

🔀

No account or installation required

Open the tool and start converting. There is nothing to install, no API key to configure, and no sign-up form. Works on any device with a modern browser.

HTML to Markdown Use Cases

Frontend Developer: CMS Migration

Export blog posts or pages from WordPress, Drupal, or a headless CMS as HTML, then convert them to Markdown for use with static site generators like Next.js, Astro, or Hugo.

Backend Engineer: API Documentation

Convert auto-generated HTML API docs into Markdown files that live alongside your source code. Markdown docs integrate with GitHub rendering and can be versioned with the code they describe.

DevOps: Runbook Conversion

Turn internal wiki pages (Confluence, SharePoint) exported as HTML into Markdown runbooks stored in your Git repository alongside the infrastructure code they describe.

QA Engineer: Test Case Documentation

Convert HTML test reports or manual test plans from web-based tools into Markdown files that can be reviewed in pull requests alongside the code changes they verify.

Data Engineer: Web Scraping Cleanup

Strip HTML boilerplate from scraped web pages and produce clean Markdown text. This removes navigation, ads, and layout markup while preserving the article content and structure.

Student: Research Notes

Copy content from web resources and convert the HTML to Markdown for import into Obsidian, Notion, or any Markdown-based note-taking system. Preserves headings, links, and formatting.

HTML to Markdown Element Reference

The table below shows how common HTML elements map to their Markdown equivalents. This mapping follows GitHub-Flavored Markdown (GFM) conventions, which extend the CommonMark spec with tables, strikethrough, and task lists. Elements not listed here (such as <div>, <form>, or custom web components) have no Markdown equivalent and are either stripped or passed through as raw HTML.

HTML Element	Markdown Syntax	Notes
<h1>...<h6>	# ... ######	ATX headings, level matches tag number
<p>	Blank line separation	Double newline between paragraphs
<strong>, <b>	text	Bold / strong emphasis
<em>, <i>	text	Italic / emphasis
<a href="url">	[text](url)	Inline link with optional title
<img src="url">	![alt](url)	Image with alt text
<ul><li>	- item	Unordered list with dash or asterisk
<ol><li>	1. item	Ordered list, numbers restart per block
<blockquote>	> text	Block quote, nestable with >>
<code>	`code`	Inline code span
<pre><code>	```lang\ncode\n```	Fenced code block with optional language
<hr>	---	Horizontal rule (three dashes)
<table>	\| col \| col \|	GFM table syntax with alignment
<del>, <s>	~~text~~	Strikethrough (GFM extension)

Markdown Flavors: GFM vs CommonMark vs Original

Not all Markdown is the same. The output format matters because different platforms parse Markdown differently. The three most common flavors are GitHub-Flavored Markdown (GFM), CommonMark, and Gruber's original Markdown.

GitHub-Flavored Markdown (GFM)

The most widely used flavor. Adds tables (pipe syntax), strikethrough (~~text~~), task lists (- [x]), and auto-linked URLs. Used by GitHub, GitLab, and most developer tools. This converter outputs GFM-compatible Markdown by default.

CommonMark

A strict specification that resolves ambiguities in the original Markdown syntax. Defines exact rules for list continuation, emphasis parsing, and block-level nesting. Used as the base for GFM and many static site generators.

Original Markdown

John Gruber's 2004 specification. Does not support tables, fenced code blocks, or strikethrough. Most modern tools treat it as a subset of CommonMark. Rarely used as a target format today.

Code Examples

Working examples in JavaScript (Turndown), Python (markdownify and html2text), Go, and Pandoc on the command line.

JavaScript (Turndown)

import TurndownService from 'turndown'

const turndown = new TurndownService({ headingStyle: 'atx' })
const html = '<h1>Title</h1><p>A <strong>bold</strong> paragraph.</p>'
const md = turndown.turndown(html)
console.log(md)
// → "# Title\n\nA **bold** paragraph."

Python (markdownify)

from markdownify import markdownify

html = '<h2>Section</h2><ul><li>First</li><li>Second</li></ul>'
md = markdownify(html, heading_style='ATX')
print(md)
# → "## Section\n\n- First\n- Second"

Python (html2text)

import html2text

converter = html2text.HTML2Text()
converter.body_width = 0  # disable line wrapping

html = '<p>Visit <a href="https://example.com">Example</a> for details.</p>'
md = converter.handle(html)
print(md)
# → "Visit [Example](https://example.com) for details."

Go (html-to-markdown)

package main

import (
	"fmt"
	md "github.com/JohannesKaufmann/html-to-markdown"
)

func main() {
	converter := md.NewConverter("", true, nil)
	html := `<h3>Go Example</h3><p>Code: <code>fmt.Println()</code></p>`
	markdown, _ := converter.ConvertString(html)
	fmt.Println(markdown)
	// → "### Go Example\n\nCode: `fmt.Println()`"
}

CLI (Pandoc)

# Convert an HTML file to Markdown
pandoc input.html -f html -t markdown -o output.md

# Pipe HTML from stdin
echo '<p>Hello <em>world</em></p>' | pandoc -f html -t markdown
# → Hello *world*

# Use GitHub-Flavored Markdown output
pandoc input.html -f html -t gfm -o output.md

Frequently Asked Questions

What HTML elements cannot be converted to Markdown?

Markdown has no equivalent for <div>, <span>, <form>, <input>, <iframe>, or any element with custom CSS classes and styles. Most converters strip these tags and keep only the inner text. Some converters can pass unsupported HTML through unchanged, which is valid since the Markdown specification explicitly allows inline HTML. If you need to preserve those elements, configure your converter to keep raw HTML rather than strip it.

How are HTML tables converted to Markdown?

HTML tables map to GFM pipe-table syntax: | Header | Header | with a separator row | --- | --- |. GFM tables do not support colspan, rowspan, or cell-level styling. Complex tables with merged cells are flattened, which may lose structural information. For simple data tables, the conversion is lossless.

Is the conversion from HTML to Markdown lossless?

No. Markdown is a subset of HTML, so some information is always lost during conversion. CSS classes, inline styles, data attributes, form elements, and semantic tags like <article> or <section> have no Markdown equivalent. The text content and basic structure (headings, lists, links, emphasis) are preserved accurately. For most documentation and content migration workflows, the preserved elements are the ones that matter.

What is the difference between HTML to Markdown and HTML to plain text?

HTML to plain text strips all formatting and produces raw text with no structure. HTML to Markdown preserves the document structure: headings remain headings, links keep their URLs, lists stay as lists, and emphasis is retained. The Markdown output can be rendered back to HTML with the same logical structure.

Can I convert Markdown back to HTML?

Yes. Every Markdown processor (marked, markdown-it, Python-Markdown, goldmark) converts Markdown to HTML. This round-trip is one reason Markdown is popular: you write in a readable format and render to HTML for the web. The round-trip is not perfectly symmetrical because HTML-to-Markdown conversion drops unsupported elements.

How does the converter handle HTML with inline CSS or JavaScript?

Inline CSS (style attributes) and <style> blocks are stripped during conversion since Markdown has no styling syntax. JavaScript (<script> tags and event handlers like onclick) is also removed. The converter extracts only the document content and structure. This makes HTML-to-Markdown conversion a useful sanitization step when importing untrusted HTML content into your documentation.

Which Markdown flavor should I use for my project?

Use GitHub-Flavored Markdown (GFM) if your content will be viewed on GitHub, GitLab, or most documentation platforms. Use CommonMark if you need strict spec compliance and predictable parsing across different renderers. Original Markdown is only relevant for legacy systems. GFM is the safest default for most projects.

HTML to Markdown