Base64
7 tools
Co je kódování?
Encoding is the process of converting data from one representation to another. In web development, encoding is used to safely transmit data through channels that were designed for a limited character set — for example, sending binary data through a text-based protocol, or including special characters in a URL.
Character encoding defines how text characters map to bytes. Base64 encoding converts binary data to ASCII text. URL encoding makes arbitrary text safe for use in URLs. Understanding these three layers of encoding prevents subtle bugs and security vulnerabilities.
Character Encoding History
Character encoding is the mapping between characters (letters, symbols) and their binary representations. The evolution from ASCII to Unicode reflects the web going global:
7-bit American Standard Code. 128 characters: English letters, digits, punctuation, and control codes. Defined the foundational encoding that all others extend.
Extended ASCII for Western European languages. Added 128 characters (accented letters, symbols). Not suitable for non-Latin scripts.
Universal character set covering all of the world's writing systems. Defines code points for over 149,000 characters across 161 scripts. Does not define encoding — that is UTF-8/16/32.
Variable-width encoding of Unicode using 1–4 bytes per character. ASCII-compatible (first 128 code points are single bytes). The dominant encoding on the web — over 98% of websites.
Variable-width encoding using 2 or 4 bytes per character. Used internally by Windows, Java, and JavaScript strings. Not ASCII-compatible.
Fixed-width encoding: always 4 bytes per character. Simple but wastes space. Used in some database internals. Rarely seen on the web.
Why UTF-8 Won
UTF-8 became dominant because it is backward-compatible with ASCII (the first 128 characters encode identically), is self-synchronizing (you can find character boundaries by scanning), and is space-efficient for Latin text. Any ASCII document is a valid UTF-8 document. This made migration seamless.
Base64 Encoding
Base64 converts binary data to a text representation using only 64 printable ASCII characters: A-Z, a-z, 0-9, +, and /. This is necessary when binary data must travel through channels that only handle text — email attachments, data URIs, JWT tokens, and HTTP Basic Auth all use Base64.
How It Works
Base64 groups the input bytes into 3-byte (24-bit) chunks and encodes each chunk as 4 Base64 characters (6 bits each). If the input is not a multiple of 3 bytes, = padding characters are added to complete the last group:
| Input | Hex bytes | Base64 |
|---|---|---|
| "Man" | 77 61 6E | TWFu |
| "Ma" | 4D 61 | TWE= |
| "M" | 4D | TQ== |
The = padding characters at the end indicate how many bytes were missing to complete the last 3-byte group. One = means one byte of padding was needed; == means two bytes were needed. Standard Base64 always produces output whose length is a multiple of 4.
URL Encoding
URLs can only contain a limited set of safe ASCII characters. Any character outside that set — including spaces, punctuation, non-ASCII characters, and special URL characters like &, =, and # — must be percent-encoded (URL-encoded) before being placed in a URL.
Percent encoding replaces each unsafe byte with a % followed by two hexadecimal digits representing that byte's value. A space becomes %20, an ampersand becomes %26, and so on.
Commonly Encoded Characters
| Character | Encoded | Notes |
|---|---|---|
| Space | %20 | Most common; used in form submissions as + in application/x-www-form-urlencoded |
| & | %26 | Query string separator; must be encoded when used as a literal value |
| = | %3D | Key-value separator in query strings; encode when used as data |
| + | %2B | Interpreted as a space in application/x-www-form-urlencoded; encode to preserve literal + |
| # | %23 | Fragment identifier; encode when used as literal data in a path or query |
| / | %2F | Path segment separator; encode when used as literal data, not a path delimiter |
| : | %3A | Scheme separator; encode in path and query contexts |
| @ | %40 | Used in mailto: and auth; encode when used as literal data |
encodeURI vs encodeURIComponent
JavaScript provides two encoding functions with different scopes. encodeURI encodes a complete URL — it leaves characters that have meaning in URLs (:, /, ?, #, @) unencoded. encodeURIComponent encodes a URL component (a single query parameter value or path segment) — it encodes all characters except A-Z, a-z, 0-9, -, _, ., ~. Always use encodeURIComponent for individual values and encodeURI for complete URLs.
Where Encoding Appears in Web Development
The Authorization: Basic header encodes credentials as Base64(username:password). This is encoding for transport convenience, NOT security — Base64 is trivially reversible. Always use HTTPS with Basic Auth.
Data URIs embed file content directly in HTML or CSS: data:image/png;base64,.... Base64-encoding images and fonts inline eliminates HTTP requests at the cost of increased document size (~33% overhead).
Email was designed for 7-bit ASCII. Binary attachments (images, PDFs) are Base64-encoded by MIME before transmission. Your email client decodes them transparently when displaying the email.
JWT tokens use Base64url encoding (a variant that replaces + with - and / with _, with no padding) for all three parts (header, payload, signature). This makes tokens URL-safe without additional percent-encoding.
Any user-provided data in URL query strings must be percent-encoded. Failing to encode & or = in a value will silently corrupt the query string parsing. Always use encodeURIComponent on individual values.
Non-ASCII domain names (e.g., münchen.de) are encoded using Punycode (xn-- prefix) for compatibility with the DNS system. The browser displays the Unicode form but sends the Punycode form to DNS resolvers.
Časté dotazy
No. Base64 is encoding, not encryption. It is a reversible transformation with no secret key. Anyone who sees a Base64 string can decode it instantly. Never use Base64 as a security measure.
Base64 processes input in 3-byte groups. If the input is not a multiple of 3 bytes, = padding is added to complete the last group. One = means one byte of padding; == means two. Some implementations omit padding (Base64url for JWTs).
Base64url is a URL-safe variant of Base64 that replaces + with - and / with _, and typically omits = padding. This makes it safe to use in URLs and HTTP headers without percent-encoding. JWTs use Base64url for all three parts.
Use encodeURIComponent for individual values (query parameter values, path segment values). Use encodeURI for a complete URL string where you want to preserve the URL structure characters (/, :, ?, #). When in doubt, use encodeURIComponent.
UTF-8 is ASCII-compatible and space-efficient for Latin text (most URLs, HTML tags, and code are ASCII). UTF-16 wastes space for ASCII content and is not backward-compatible. HTTP and HTML default to UTF-8.
Percent encoding (URL encoding) represents characters as % followed by their two-digit hexadecimal byte value. For example, a space is %20 (decimal 32, hex 20). Multi-byte UTF-8 characters encode each byte separately: é is %C3%A9.
Base64 increases data size by approximately 33% and adds encoding/decoding CPU overhead. For high-throughput systems, prefer binary protocols. URL encoding adds minimal overhead but can make URLs significantly longer with many special characters.
Punycode is an encoding for representing Unicode characters in the ASCII-compatible DNS system. Internationalized domain names like münchen.de are encoded as xn--mnchen-3ya.de in DNS queries. Browsers display the Unicode form but use Punycode internally.