JSON to CSV Python β DictWriter + pandas Examples
Use the free online JSON to CSV directly in your browser β no install required.
Try JSON to CSV Online βAlmost every data pipeline eventually hits the same step: an API returns JSON, but the next consumer β a spreadsheet, an import script, a Redshift COPY command β needs CSV. Converting JSON to CSV in Python sounds trivial until you hit nested objects, inconsistent keys, or datetime values that need special handling. Python gives you two solid paths: the built-in json + csv modules for zero-dependency scripts, and pandas for nested flattening and larger datasets β or the online JSON to CSV converter for quick one-off conversions without any code. This guide covers both approaches end to end, with runnable Python 3.8+ examples.
- βcsv.DictWriter converts a list of dicts to CSV with zero dependencies β use json.load() to parse, then writeheader() + writerows().
- βAlways open CSV files with newline="" on Windows to prevent blank rows between data rows.
- βpd.json_normalize() flattens nested JSON into a flat DataFrame before calling to_csv() β handles multi-level nesting automatically.
- βPass index=False to DataFrame.to_csv() β without it, pandas writes an unwanted row-number column.
- βFor files over 500 MB, use ijson for streaming JSON parsing combined with csv.DictWriter for constant memory usage.
What is JSON to CSV Conversion?
JSON to CSV conversion transforms an array of JSON objects into a tabular format where each object becomes a row and each key becomes a column header. JSON is hierarchical β objects can nest arbitrarily deep. CSV is flat β every value sits in a row-column grid. The conversion works cleanly when every object shares the same set of top-level keys. Nested objects, arrays, and inconsistent keys are where things get interesting. The raw data stays identical; only the structure changes.
[{"order_id":"ord_91a3","total":149.99,"status":"shipped"},
{"order_id":"ord_b7f2","total":34.50,"status":"pending"}]order_id,total,status ord_91a3,149.99,shipped ord_b7f2,34.50,pending
csv.DictWriter β Convert JSON to CSV Without Pandas
The csv module ships with every Python installation. No pip install, no virtual environment gymnastics. csv.DictWriter takes a list of dictionaries and writes each one as a CSV row, mapping dict keys to column headers. The fieldnames parameter controls both the column order and which keys get included.
import json
import csv
# Sample JSON data β an array of order objects
json_string = """
[
{"order_id": "ord_91a3", "product": "Wireless Keyboard", "quantity": 2, "unit_price": 74.99},
{"order_id": "ord_b7f2", "product": "USB-C Hub", "quantity": 1, "unit_price": 34.50},
{"order_id": "ord_c4e8", "product": "Monitor Stand", "quantity": 3, "unit_price": 29.95}
]
"""
records = json.loads(json_string)
with open("orders.csv", "w", newline="", encoding="utf-8") as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=records[0].keys())
writer.writeheader()
writer.writerows(records)
# orders.csv:
# order_id,product,quantity,unit_price
# ord_91a3,Wireless Keyboard,2,74.99
# ord_b7f2,USB-C Hub,1,34.50
# ord_c4e8,Monitor Stand,3,29.95That newline="" argument on open() is not optional on Windows. Without it, you get double carriage returns β which show up as blank rows between every data row in Excel. On macOS and Linux it is harmless, so just always include it.
The code above uses json.loads() for a string. Use json.load() (no trailing s) when reading from a file handle. This trips people up constantly β one reads a string, the other reads a file object.
import json
import csv
with open("server_metrics.json", encoding="utf-8") as jf:
metrics = json.load(jf) # json.load() for file objects
# Explicit fieldnames control column order
columns = ["timestamp", "hostname", "cpu_percent", "memory_mb", "disk_io_ops"]
with open("server_metrics.csv", "w", newline="", encoding="utf-8") as cf:
writer = csv.DictWriter(cf, fieldnames=columns, extrasaction="ignore")
writer.writeheader()
writer.writerows(metrics)
# Only the five specified columns appear, in exactly that orderSetting extrasaction="ignore" silently drops any keys in the dicts that are not in your fieldnames list. The default is "raise", which throws a ValueError if any dict has an unexpected key. Pick whichever matches your tolerance for surprises.
csv.DictWriter vs csv.writer: DictWriter maps dict keys to column positions automatically. csv.writer writes raw lists as rows β you handle the column ordering yourself. DictWriter is almost always the right choice for JSON-to-CSV because JSON records are already dictionaries.Python's csv module ships with three named dialects: excel (comma delimiter, CRLF line endings β the default), excel-tab (tab delimiter, CRLF endings), and unix (LF line endings, quotes all non-numeric fields). Pass the dialect name as the dialect argument to csv.DictWriter. You can also define a custom dialect with csv.register_dialect() when your target system has unusual quoting or delimiter rules. For most JSON-to-CSV workflows the excel dialect is correct, but switch to unix when writing files that will be processed by POSIX tools like awk or sort.
Handling Non-Standard Types: datetime, UUID, and Decimal
JSON from APIs often contains dates as ISO strings, UUIDs as hyphenated strings, and monetary values as floats. When you parse these into Python objects for processing before writing CSV, you need to convert them back to strings. The csv module calls str() on every value, so most types just work. But datetime objects produce messy default string representations, and Decimal values need explicit formatting to avoid scientific notation.
import json
import csv
from datetime import datetime, timezone
from decimal import Decimal
from uuid import UUID
# Simulating parsed API response with Python types
transactions = [
{
"txn_id": UUID("a1b2c3d4-e5f6-7890-abcd-ef1234567890"),
"created_at": datetime(2026, 3, 15, 9, 30, 0, tzinfo=timezone.utc),
"amount": Decimal("1249.99"),
"currency": "USD",
"merchant": "CloudHost Inc.",
},
{
"txn_id": UUID("b2c3d4e5-f6a7-8901-bcde-f12345678901"),
"created_at": datetime(2026, 3, 15, 14, 12, 0, tzinfo=timezone.utc),
"amount": Decimal("87.50"),
"currency": "EUR",
"merchant": "DataSync GmbH",
},
]
def prepare_row(record: dict) -> dict:
"""Convert non-string types to CSV-friendly strings."""
return {
"txn_id": str(record["txn_id"]),
"created_at": record["created_at"].isoformat(),
"amount": f"{record['amount']:.2f}",
"currency": record["currency"],
"merchant": record["merchant"],
}
with open("transactions.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=["txn_id", "created_at", "amount", "currency", "merchant"])
writer.writeheader()
for txn in transactions:
writer.writerow(prepare_row(txn))
# transactions.csv:
# txn_id,created_at,amount,currency,merchant
# a1b2c3d4-e5f6-7890-abcd-ef1234567890,2026-03-15T09:30:00+00:00,1249.99,USD,CloudHost Inc.
# b2c3d4e5-f6a7-8901-bcde-f12345678901,2026-03-15T14:12:00+00:00,87.50,EUR,DataSync GmbHThe prepare_row() function is the right approach here. Rather than trying to teach csv.DictWriter about custom types, you normalize each record to strings before writing. I prefer calling .isoformat() explicitly on datetime objects rather than relying on str() β the output format is more predictable, and downstream parsers handle ISO 8601 reliably.
Decimal values pass through without formatting, very small or very large numbers may render in scientific notation (e.g., 1.5E+7). Always format Decimal with an explicit f-string like f"{value:.2f}" when writing financial data to CSV.An alternative pattern for pipelines with many custom types is to extend json.JSONEncoder. Subclass it, override the default() method to return a JSON-serializable value for each custom type, then pass the subclass as the cls argument to json.dumps(). Re-encoding through the custom encoder before writing to CSV normalizes all types in one step without a per-row prepare_row() call. The prepare_row() pattern shown above is simpler for one-off scripts; the JSONEncoder subclass approach scales better when the same domain model with custom types is shared across many pipeline stages or microservices.
csv.DictWriter Parameters Reference
The full constructor signature is csv.DictWriter(f, fieldnames, restval="", extrasaction="raise", dialect="excel", **fmtparams). Most of these have sensible defaults. The ones you will actually change are fieldnames, delimiter, and extrasaction.
pandas β Convert JSON to CSV with DataFrames
If you are already working in a pandas-heavy codebase, or your JSON has nested objects that you need to flatten, the pandas approach is significantly less code than the stdlib version. The tradeoff: pandas is a ~30 MB dependency. For a throwaway script, that is fine. For a Docker image you ship to production, the stdlib approach keeps things lighter.
import pandas as pd
# Read JSON array directly into a DataFrame
df = pd.read_json("warehouse_inventory.json")
# Write to CSV β index=False prevents the auto-generated row numbers
df.to_csv("warehouse_inventory.csv", index=False)
# That's it. Two lines. pandas infers column types automatically.The index=False flag is one of those things you look up every single time. Without it, pandas writes a 0, 1, 2, ... column as the first column of your CSV. Nobody wants that.
Flattening Nested JSON with json_normalize
Real API responses are rarely flat. Orders contain shipping addresses, users contain nested preferences, telemetry events contain nested metadata. pd.json_normalize() walks nested dictionaries and flattens them into columns with dot-separated names.
import json
import pandas as pd
api_response = """
[
{
"order_id": "ord_91a3",
"placed_at": "2026-03-15T09:30:00Z",
"customer": {
"name": "Sarah Chen",
"email": "s.chen@example.com",
"tier": "premium"
},
"shipping": {
"method": "express",
"address": {
"city": "Portland",
"state": "OR",
"zip": "97201"
}
},
"total": 299.95
},
{
"order_id": "ord_b7f2",
"placed_at": "2026-03-15T14:12:00Z",
"customer": {
"name": "James Park",
"email": "j.park@example.com",
"tier": "standard"
},
"shipping": {
"method": "standard",
"address": {
"city": "Austin",
"state": "TX",
"zip": "73301"
}
},
"total": 87.50
}
]
"""
orders = json.loads(api_response)
# json_normalize flattens nested dicts β sep controls the delimiter
df = pd.json_normalize(orders, sep="_")
df.to_csv("flat_orders.csv", index=False)
# Resulting columns:
# order_id, placed_at, customer_name, customer_email, customer_tier,
# shipping_method, shipping_address_city, shipping_address_state,
# shipping_address_zip, totalThe sep="_" parameter controls how nested key names are joined. The default is ".", which produces columns like customer.name. I prefer underscores because dots in column names cause trouble with SQL imports and some spreadsheet formulas.
For API responses that wrap the records array under a nested key, use the record_path parameter. If the response looks like {"data": {"orders": [...]}}, pass record_path=["data", "orders"] to navigate to the right list. The optional meta parameter lets you pull parent-level fields alongside the nested records β useful when the response includes top-level pagination info (page number, total count) that you want as a column in every row. Together, record_path and meta handle most real-world nested API response shapes without custom preprocessing.
DataFrame.to_csv() Parameters Reference
DataFrame.to_csv() has over 20 parameters. These are the ones that matter for JSON-to-CSV workflows.
import pandas as pd
df = pd.read_json("telemetry_events.json")
# TSV output with explicit encoding and missing value handling
df.to_csv(
"telemetry_events.tsv",
sep="\t",
index=False,
encoding="utf-8",
na_rep="NULL",
columns=["event_id", "timestamp", "source", "severity", "message"],
)
# Write to stdout for piping in shell scripts
print(df.to_csv(index=False))
# Return as string (no file written)
csv_string = df.to_csv(index=False)
print(len(csv_string), "characters")Convert JSON to CSV from a File and API Response
The two most common real-world scenarios: reading JSON from a file on disk and converting it, or fetching JSON from an HTTP API and saving the result as CSV. In development you can get away with no error handling. In production, that choice becomes a 2 a.m. alert. Files might not exist, APIs might return 4xx or 5xx status codes instead of JSON, the response body might be an error object rather than an array, or the JSON could be truncated due to a network timeout. The patterns below handle all of these cases explicitly, log errors to stderr, and return a row count so callers can detect zero-row outputs and alert accordingly.
File on Disk β Read, Convert, Save
import json
import csv
import sys
def json_file_to_csv(input_path: str, output_path: str) -> int:
"""Convert a JSON file containing an array of objects to CSV.
Returns the number of rows written.
"""
try:
with open(input_path, encoding="utf-8") as jf:
data = json.load(jf)
except FileNotFoundError:
print(f"Error: {input_path} not found", file=sys.stderr)
return 0
except json.JSONDecodeError as exc:
print(f"Error: invalid JSON in {input_path}: {exc.msg} at line {exc.lineno}", file=sys.stderr)
return 0
if not isinstance(data, list) or not data:
print(f"Error: expected a non-empty JSON array in {input_path}", file=sys.stderr)
return 0
# Collect all unique keys across all records β handles inconsistent schemas
all_keys: list[str] = []
seen: set[str] = set()
for record in data:
for key in record:
if key not in seen:
all_keys.append(key)
seen.add(key)
with open(output_path, "w", newline="", encoding="utf-8") as cf:
writer = csv.DictWriter(cf, fieldnames=all_keys, restval="", extrasaction="ignore")
writer.writeheader()
writer.writerows(data)
return len(data)
rows = json_file_to_csv("deploy_logs.json", "deploy_logs.csv")
print(f"Wrote {rows} rows to deploy_logs.csv")HTTP API Response β Fetch and Convert
import json
import csv
import urllib.request
import urllib.error
def api_response_to_csv(url: str, output_path: str) -> int:
"""Fetch JSON from a REST API endpoint and write it as CSV."""
try:
req = urllib.request.Request(url, headers={"Accept": "application/json"})
with urllib.request.urlopen(req, timeout=30) as resp:
if resp.status != 200:
print(f"Error: API returned status {resp.status}")
return 0
body = resp.read().decode("utf-8")
except urllib.error.URLError as exc:
print(f"Error: could not reach {url}: {exc.reason}")
return 0
try:
records = json.loads(body)
except json.JSONDecodeError as exc:
print(f"Error: API returned invalid JSON: {exc.msg}")
return 0
if not isinstance(records, list) or not records:
print("Error: expected a non-empty JSON array from the API")
return 0
with open(output_path, "w", newline="", encoding="utf-8") as cf:
writer = csv.DictWriter(cf, fieldnames=records[0].keys())
writer.writeheader()
writer.writerows(records)
return len(records)
rows = api_response_to_csv(
"https://api.internal.example.com/v2/deployments?status=completed",
"completed_deployments.csv",
)
print(f"Exported {rows} deployments to CSV")urllib from the standard library to keep the script dependency-free. If you have requests installed, replace the urllib section with resp = requests.get(url, timeout=30); records = resp.json() β the rest of the CSV writing code stays identical.Command-Line JSON to CSV Conversion
Sometimes you just need a one-liner in the terminal. Python's -c flag lets you run a quick conversion without creating a script file. For more complex transformations, pipe through jq first to reshape the data, then convert.
# Python one-liner: reads JSON from stdin, writes CSV to stdout cat orders.json | python3 -c " import json, csv, sys data = json.load(sys.stdin) w = csv.DictWriter(sys.stdout, fieldnames=data[0].keys()) w.writeheader() w.writerows(data) " # Save output to a file cat orders.json | python3 -c " import json, csv, sys data = json.load(sys.stdin) w = csv.DictWriter(sys.stdout, fieldnames=data[0].keys()) w.writeheader() w.writerows(data) " > orders.csv
# Save as json2csv.py and run: python3 json2csv.py input.json -o output.csv
python3 -c "
import json, csv, argparse, sys
parser = argparse.ArgumentParser(description='Convert JSON array to CSV')
parser.add_argument('input', help='Path to JSON file')
parser.add_argument('-o', '--output', default=None, help='Output CSV path (default: stdout)')
parser.add_argument('-d', '--delimiter', default=',', help='CSV delimiter')
args = parser.parse_args()
with open(args.input) as f:
data = json.load(f)
out = open(args.output, 'w', newline='') if args.output else sys.stdout
writer = csv.DictWriter(out, fieldnames=data[0].keys(), delimiter=args.delimiter)
writer.writeheader()
writer.writerows(data)
if args.output:
out.close()
print(f'Wrote {len(data)} rows to {args.output}', file=sys.stderr)
" "$@"# Install csvkit: pip install csvkit
# jq flattens and selects fields, in2csv handles the CSV formatting
cat api_response.json | jq '[.[] | {id: .order_id, customer: .customer.name, total}]' | in2csv -f json > orders.csv
# Miller (mlr) is another option for JSON-to-CSV
mlr --json2csv cat orders.json > orders.csvMiller (mlr) is a standalone binary that treats JSON, CSV, and TSV as first-class formats with no Python runtime required. The --json2csv flag converts JSON input to CSV in a single pass, and you can chain Miller verbs to filter, sort, or rename columns in the same command before writing output. Install via Homebrew on macOS (brew install miller) or your Linux package manager. It is particularly useful in CI pipelines where you want fast JSON-to-CSV conversion without spinning up a Python environment.
High-Performance Alternative β pandas with pyarrow
For datasets in the tens-of-millions-of-rows range, pandas with the pyarrow backend reads and writes significantly faster than the default. The C-backed Arrow engine processes columnar data more efficiently than Python's row-by-row csv module. The API stays the same β you just set the engine parameter.
pip install pyarrow
import pandas as pd
# Read JSON with pyarrow engine (faster parsing for large files)
df = pd.read_json("sensor_readings.json", engine="pyarrow")
# to_csv doesn't have an engine parameter, but the DataFrame operations
# between read and write benefit from pyarrow's columnar layout
df.to_csv("sensor_readings.csv", index=False)
# For truly large exports, consider writing to Parquet instead of CSV
# β binary format, 5-10x smaller, preserves types
df.to_parquet("sensor_readings.parquet", engine="pyarrow")If you are processing more than a few hundred MB of JSON and the final consumer accepts Parquet, skip CSV entirely. Parquet is smaller, preserves column types, and both Redshift and BigQuery load it natively. CSV is a lossy format β every value becomes a string.
Terminal Output with Syntax Highlighting
The rich library renders tables with borders, alignment, and color in the terminal β useful for previewing a conversion during development without opening the output file.
pip install rich
import json
from rich.console import Console
from rich.table import Table
json_string = """
[
{"hostname": "web-prod-1", "cpu_percent": 72.3, "memory_mb": 3840, "uptime_hours": 720},
{"hostname": "web-prod-2", "cpu_percent": 45.1, "memory_mb": 2560, "uptime_hours": 168},
{"hostname": "db-replica-1", "cpu_percent": 91.7, "memory_mb": 7680, "uptime_hours": 2160}
]
"""
records = json.loads(json_string)
console = Console()
table = Table(title="Server Metrics Preview", show_lines=True)
for key in records[0]:
table.add_column(key, style="cyan" if key == "hostname" else "white")
for row in records:
table.add_row(*[str(v) for v in row.values()])
console.print(table)
# Renders a color-highlighted table with borders in the terminalcsv.DictWriter or DataFrame.to_csv(), and use rich only for previewing.Working with Large JSON Files
json.load() reads the entire file into memory. For a 200 MB JSON file, that means ~200 MB of raw text plus the Python object overhead β easily 500 MB+ of heap usage. For files over 100 MB, stream the input with ijson and write CSV rows as you go.
pip install ijson
Streaming JSON Array to CSV with ijson
import ijson
import csv
def stream_json_to_csv(json_path: str, csv_path: str) -> int:
"""Convert a large JSON array to CSV without loading it all into memory."""
with open(json_path, "rb") as jf, open(csv_path, "w", newline="", encoding="utf-8") as cf:
# ijson.items yields each element of the top-level array one at a time
records = ijson.items(jf, "item")
first_record = next(records)
fieldnames = list(first_record.keys())
writer = csv.DictWriter(cf, fieldnames=fieldnames)
writer.writeheader()
writer.writerow(first_record)
count = 1
for record in records:
writer.writerow(record)
count += 1
return count
rows = stream_json_to_csv("clickstream_2026_03.json", "clickstream_2026_03.csv")
print(f"Streamed {rows} records to CSV")NDJSON / JSON Lines β One Object Per Line
NDJSON (Newline-Delimited JSON), also called JSON Lines or .jsonl, stores one valid JSON object per line with no wrapping array. This format is common in log pipelines, event streams (Kafka, Kinesis), and bulk exports from services like Elasticsearch and BigQuery. Because each line is a self-contained JSON object, you can process an NDJSON file with a plain Python for loop over the file handle β no need for the ijson library. Memory stays constant regardless of file size, making this the simplest streaming approach when your source data is already in JSON Lines format.
import json
import csv
def ndjson_to_csv(ndjson_path: str, csv_path: str) -> int:
"""Convert a newline-delimited JSON file to CSV, one line at a time."""
with open(ndjson_path, encoding="utf-8") as nf:
first_line = nf.readline()
first_record = json.loads(first_line)
fieldnames = list(first_record.keys())
with open(csv_path, "w", newline="", encoding="utf-8") as cf:
writer = csv.DictWriter(cf, fieldnames=fieldnames)
writer.writeheader()
writer.writerow(first_record)
count = 1
for line in nf:
line = line.strip()
if not line:
continue
try:
record = json.loads(line)
writer.writerow(record)
count += 1
except json.JSONDecodeError:
continue # skip malformed lines
return count
rows = ndjson_to_csv("access_log.ndjson", "access_log.csv")
print(f"Converted {rows} log entries to CSV")json.load() can consume 3β5 GB of RAM due to Python object overhead. With ijson, memory stays flat regardless of file size. If you just need a quick conversion of a small file, paste it into the JSON to CSV converter instead.Common Mistakes
Problem: The csv module writes line endings. Without newline='', Python's text mode adds another on Windows, producing double-spaced output.
Fix: Always pass newline='' when opening a file for CSV writing. It is harmless on macOS/Linux.
with open("output.csv", "w") as f:
writer = csv.DictWriter(f, fieldnames=columns)
writer.writeheader()
writer.writerows(data)
# Blank rows between every data row on Windowswith open("output.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=columns)
writer.writeheader()
writer.writerows(data)
# Clean output on all platformsProblem: Without index=False, pandas prepends an auto-incrementing row number column (0, 1, 2, ...) that pollutes the CSV with data that was never in the original JSON.
Fix: Pass index=False to to_csv(). If you actually need an index column, name it explicitly with df.index.name = 'row_num'.
df = pd.read_json("events.json")
df.to_csv("events.csv")
# CSV gets an extra unnamed column: ,event_id,timestamp,...
# The leading comma breaks many CSV parsersdf = pd.read_json("events.json")
df.to_csv("events.csv", index=False)
# Clean CSV: event_id,timestamp,...Problem: If JSON objects have different keys (some records have optional fields), using the first record's keys as fieldnames silently drops columns that only appear in later records.
Fix: Collect all unique keys across all records before creating the DictWriter.
records = json.load(f) writer = csv.DictWriter(out, fieldnames=records[0].keys()) # Misses "discount" field that only appears in records[2]
records = json.load(f) all_keys = list(dict.fromkeys(k for r in records for k in r)) writer = csv.DictWriter(out, fieldnames=all_keys, restval="") # Every key from every record is included as a column
Problem: csv.DictWriter calls str() on nested dicts, producing columns with values like "{'city': 'Portland'}"β raw Python repr, not actual data.
Fix: Flatten nested objects first using pd.json_normalize() or a custom flattening function.
records = [{"id": "evt_1", "meta": {"source": "web", "region": "us-west"}}]
writer = csv.DictWriter(f, fieldnames=["id", "meta"])
writer.writerows(records)
# meta column contains: {'source': 'web', 'region': 'us-west'}import pandas as pd
records = [{"id": "evt_1", "meta": {"source": "web", "region": "us-west"}}]
df = pd.json_normalize(records, sep="_")
df.to_csv("events.csv", index=False)
# Columns: id, meta_source, meta_regioncsv.DictWriter vs pandas β Quick Comparison
Use csv.DictWriter when you need zero dependencies, your JSON is flat, and the script runs in a restricted environment (CI containers, Lambda functions, embedded Python). Use pd.json_normalize() + to_csv() when the JSON is nested, you need to transform or filter data before export, or you are already in a pandas workflow. For files that do not fit in memory, combine ijson with csv.DictWriter for constant-memory streaming.
For quick, no-code conversions, the JSON to CSV converter on ToolDeck handles it without any Python setup.
Frequently Asked Questions
How do I convert JSON to CSV in Python without pandas?
Use the built-in json and csv modules. Call json.load() to parse the JSON file into a list of dicts, extract fieldnames from the first dict's keys, create a csv.DictWriter, call writeheader(), then writerows(). This approach has zero external dependencies and works in any Python 3.x environment. It also runs faster than pandas for small files since there is no DataFrame allocation overhead. If your JSON objects have inconsistent keys across records, collect all unique keys first with dict.fromkeys(k for r in records for k in r) before passing them as fieldnames to avoid missing columns.
import json
import csv
with open("orders.json") as f:
records = json.load(f)
with open("orders.csv", "w", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=records[0].keys())
writer.writeheader()
writer.writerows(records)How do I handle nested JSON when converting to CSV?
Flat JSON arrays map directly to CSV rows, but nested objects need flattening first. With pandas, pd.json_normalize() handles this automatically β it joins nested keys with a dot separator (e.g., "address.city"). Without pandas, write a recursive function that walks the dict and concatenates keys with a delimiter. For deeply nested structures with multiple levels, json_normalize handles them all in one pass. The sep parameter controls the joining character between key segments β underscore is usually safer than the default dot for SQL imports and spreadsheet formula compatibility.
import pandas as pd
nested_data = [
{"id": "ord_91a3", "customer": {"name": "Sarah Chen", "email": "s.chen@example.com"}},
]
df = pd.json_normalize(nested_data, sep="_")
# Columns: id, customer_name, customer_email
df.to_csv("flat_orders.csv", index=False)Why does my CSV have blank rows between data rows on Windows?
The csv module writes \r\n line endings by default. On Windows, opening the file in text mode adds another \r, producing \r\r\n β which displays as a blank row. The fix is to always pass newline="" to open(). This tells Python not to translate line endings, letting the csv module handle them. This pattern is required regardless of operating system β it is harmless on macOS and Linux, and critical on Windows. The Python documentation explicitly calls this out in the csv module section as the correct way to open files for CSV writing.
# Wrong β blank rows on Windows
with open("output.csv", "w") as f:
writer = csv.writer(f)
# Correct β newline="" prevents double \r
with open("output.csv", "w", newline="") as f:
writer = csv.writer(f)How do I append JSON records to an existing CSV file?
Open the file in append mode ("a") and create a DictWriter with the same fieldnames. Skip writeheader() since the header row already exists. With pandas, use to_csv(mode="a", header=False). Make sure the column order matches the existing file, or the data will land in the wrong columns. If you are unsure about the column order in the existing file, open it first with csv.DictReader and read fieldnames from its fieldnames attribute before creating the writer for appending.
import csv
new_records = [
{"order_id": "ord_f4c1", "total": 89.50, "status": "shipped"},
]
with open("orders.csv", "a", newline="", encoding="utf-8") as f:
writer = csv.DictWriter(f, fieldnames=["order_id", "total", "status"])
writer.writerows(new_records)What is the fastest way to convert a large JSON file to CSV in Python?
For files under 500 MB, pd.read_json() followed by to_csv() is the fastest single-call approach β pandas uses optimized C code internally. For files above 500 MB, use ijson to stream JSON records and write them to CSV with csv.DictWriter row by row. This keeps memory usage constant regardless of file size. For NDJSON files (one JSON object per line), you do not need ijson at all β a plain Python for loop over the file handle processes each line independently and achieves constant memory without any third-party library.
# Fast for files that fit in memory
import pandas as pd
df = pd.read_json("large_dataset.json")
df.to_csv("large_dataset.csv", index=False)
# Streaming for files that don't fit in memory
import ijson, csv
with open("huge.json", "rb") as jf, open("huge.csv", "w", newline="") as cf:
records = ijson.items(jf, "item")
first = next(records)
writer = csv.DictWriter(cf, fieldnames=first.keys())
writer.writeheader()
writer.writerow(first)
for record in records:
writer.writerow(record)Can I write CSV output to stdout instead of a file in Python?
Yes. Pass sys.stdout as the file object to csv.writer() or csv.DictWriter(). This is useful for piping output in shell scripts or quick debugging. With pandas, call to_csv(sys.stdout, index=False) or to_csv(None) to get a string you can print. No temporary file needed. When writing to stdout on Windows, call sys.stdout.reconfigure(newline="") first to avoid the double carriage-return issue, since stdout opens in text mode by default.
import csv
import sys
import json
data = json.loads('[{"host":"web-1","cpu":72.3},{"host":"web-2","cpu":45.1}]')
writer = csv.DictWriter(sys.stdout, fieldnames=data[0].keys())
writer.writeheader()
writer.writerows(data)
# host,cpu
# web-1,72.3
# web-2,45.1Related Tools
Maria is a backend developer specialising in Python and API integration. She has broad experience with data pipelines, serialisation formats, and building reliable server-side services. She is an active member of the Python community and enjoys writing practical, example-driven guides that help developers solve real problems without unnecessary theory.
Priya is a data scientist and machine learning engineer who has worked across the full Python data stack β from raw data ingestion and cleaning to model deployment and monitoring. She is passionate about reproducible research, Jupyter-based workflows, and the practical engineering side of ML. She writes about NumPy, Pandas, data serialisation, and the Python patterns that make data pipelines reliable at scale.