Duplicate Line Remover
Remove duplicate lines from text, keeping only unique lines
Input lines
Unique lines
What Is Duplicate Line Removal?
Duplicate line removal is the process of scanning a block of text line by line and keeping only the first occurrence of each unique line. When you remove duplicate lines online, the tool splits your input on newline characters, tracks which lines have already appeared using a hash-based data structure (like a Set), and outputs only the lines that were not seen before. The original line order is preserved.
Two lines are considered duplicates when they match exactly, character for character. However, real-world data rarely cooperates with exact matching. Leading or trailing whitespace, inconsistent capitalization, and invisible characters like tabs or carriage returns can all cause lines that look identical to be treated as unique. That is why most deduplication tools offer options for case-insensitive comparison and whitespace trimming before comparison.
Deduplication is a distinct operation from sorting. The Unix command sort -u both sorts and deduplicates, which changes line order. If you need to preserve the original order of lines, you need a seen-set approach: iterate through lines sequentially, add each line's normalized form to a set, and skip any line whose key already exists. This tool uses the seen-set method, so your first occurrences stay in their original positions.
Why Use This Duplicate Remover?
Paste your text, choose your comparison options, and see the deduplicated result immediately. No command-line setup, no regex writing, no file uploads.
Duplicate Line Remover Use Cases
Deduplication Methods Compared
There are several approaches to removing duplicate lines, each with different trade-offs for order preservation, memory usage, and accuracy.
| Method | How It Works | Output Order | Where Used |
|---|---|---|---|
| Set | Hash-based, O(1) lookup | Unordered | JavaScript Set, Python set() |
| Sorted + scan | Sort then skip adjacent | Sorted output | Unix sort -u, C++ std::unique |
| Seen-set + list | Track seen, preserve order | Original order | This tool, Python dict.fromkeys() |
| Bloom filter | Probabilistic membership | May miss some | Large-scale pipelines, Redis |
| SQL DISTINCT | Database-level dedup | Query-dependent | SELECT DISTINCT col FROM table |
Case Sensitivity and Whitespace Handling
Two options control how this tool decides whether two lines are duplicates. Understanding when to use each option prevents both false positives (treating different lines as duplicates) and false negatives (missing lines that should match).
Code Examples
Remove duplicate lines programmatically in JavaScript, Python, Go, and the command line. Each example shows order-preserving deduplication and handles case sensitivity.
const text = `apple
banana
apple
Cherry
banana
cherry`
// Remove exact duplicates, preserve order
const unique = [...new Map(
text.split('\n').map(line => [line, line])
).values()].join('\n')
// → "apple\nbanana\nCherry\ncherry"
// Case-insensitive deduplication
const seen = new Set()
const ciUnique = text.split('\n').filter(line => {
const key = line.toLowerCase()
if (seen.has(key)) return false
seen.add(key)
return true
}).join('\n')
// → "apple\nbanana\nCherry"
// Trim whitespace before comparing
const trimDedup = text.split('\n').filter(line => {
const key = line.trim().toLowerCase()
if (seen.has(key)) return false
seen.add(key)
return true
}).join('\n')text = """apple
banana
apple
Cherry
banana
cherry"""
lines = text.splitlines()
# Remove duplicates, preserve order (Python 3.7+)
unique = list(dict.fromkeys(lines))
# → ['apple', 'banana', 'Cherry', 'cherry']
# Case-insensitive deduplication
seen = set()
ci_unique = []
for line in lines:
key = line.lower()
if key not in seen:
seen.add(key)
ci_unique.append(line)
# → ['apple', 'banana', 'Cherry']
# With whitespace trimming
seen = set()
trimmed = []
for line in lines:
key = line.strip().lower()
if key not in seen:
seen.add(key)
trimmed.append(line)package main
import (
"fmt"
"strings"
)
func removeDuplicates(text string) string {
lines := strings.Split(text, "\n")
seen := make(map[string]bool)
result := make([]string, 0, len(lines))
for _, line := range lines {
if !seen[line] {
seen[line] = true
result = append(result, line)
}
}
return strings.Join(result, "\n")
}
func main() {
text := "apple\nbanana\napple\ncherry\nbanana"
fmt.Println(removeDuplicates(text))
// → apple\nbanana\ncherry
}# Remove duplicates (sorts output — does not preserve order)
sort -u file.txt
# Remove duplicates while preserving original order
awk '!seen[$0]++' file.txt
# Case-insensitive dedup, preserve order
awk 'BEGIN{IGNORECASE=1} !seen[tolower($0)]++' file.txt
# Trim whitespace then dedup
sed 's/^[[:space:]]*//;s/[[:space:]]*$//' file.txt | awk '!seen[$0]++'
# Count duplicates before removing
sort file.txt | uniq -c | sort -rn