Skip to content

Base Encoding Guide

Published Updated 7 min read

Base encoding converts binary data into text using a limited set of characters. This guide covers the most common encoding formats, their algorithms, and practical applications.

Try the interactive Base Encoding Tool →

What is Base Encoding?

Base encoding represents binary data as text using printable ASCII characters. It’s not encryption—it provides no security. Use it for data transmission and compatibility, not confidentiality.

When to Use Base Encoding

  • Data transmission - Send binary data through text-only channels (email, JSON, XML)
  • URL compatibility - Encode data safely in URLs and filenames
  • Human readability - Create identifiers that avoid ambiguous characters
  • Legacy systems - Interface with systems that only support ASCII

Security Warning

Base encoding is not encryption. Anyone can decode it instantly. For security:

  • Passwords - Use proper hashing (bcrypt, Argon2, scrypt)
  • Sensitive data - Use encryption (AES, RSA, ChaCha20)
  • Data integrity - Use HMAC or digital signatures

Format Comparison

FormatAlphabet SizePaddingCase SensitiveOverheadBest For
Base6464 charsYes (=)Yes~33%General purpose, MIME, data URIs
Base64 URL64 charsNoYes~33%URLs, filenames, JWT tokens
Base3232 charsYes (=)No~60%Human input, TOTP secrets, Tor
Base5858 charsNoYes~37%Cryptocurrency, IPFS, short codes

Base64 (Standard)

Overview

Base64 is the most widely used encoding format. It converts every 3 bytes (24 bits) into 4 Base64 characters (6 bits each).

Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/

Padding: Uses = when input length isn’t divisible by 3

How it Works

  1. Convert input text to binary (UTF-8)
  2. Split binary into 6-bit groups
  3. Map each 6-bit value (0-63) to a character
  4. Add padding if needed

Example Encoding

Text:     "Hi!"
UTF-8:    0x48 0x69 0x21
Binary:   01001000 01101001 00100001
Grouped:  010010 000110 100100 100001
Decimal:  18     6      36     33
Base64:   S      G      k      h
Result:   "SGkh"

Example with Padding

Text:     "Hi"
UTF-8:    0x48 0x69
Binary:   01001000 01101001
Grouped:  010010 000110 1001[00] (2 bits short, pad with 0s)
Decimal:  18     6      36
Base64:   S      G      k
Padding:  Need 1 more char to make multiple of 4
Result:   "SGk="

Use Cases

ApplicationWhy Base64
Email attachments (MIME)SMTP is text-only, can’t send binary
Data URIs (data:image/png;base64,...)Embed images/fonts in HTML/CSS
HTTP Basic AuthAuthorization: Basic base64(user:pass)
JSON/XML binary dataBoth formats are text-based
Certificates (PEM format)-----BEGIN CERTIFICATE-----

Advantages & Disadvantages

Advantages:

  • Universal support across platforms
  • Simple algorithm
  • Minimal overhead (~33%)

Disadvantages:

  • Not URL-safe (contains + and /)
  • Case-sensitive
  • Padding complicates parsing

Base64 URL-safe

Overview

A Base64 variant designed for URLs and filenames. Replaces problematic characters and removes padding.

Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789-_

Padding: None (removed)

Changes from Standard Base64:

  • +- (minus)
  • /_ (underscore)
  • No = padding

How it Works

  1. Encode using standard Base64
  2. Replace + with -
  3. Replace / with _
  4. Remove trailing = padding

Example

Text:           "Hi!"
Standard Base64: "SGkh"
URL-safe:       "SGkh" (no changes needed in this case)

Text:           ">>?"
Standard Base64: "Pj4/"
URL-safe:       "Pj4_" (/ becomes _)

Use Cases

ApplicationWhy Base64 URL
JWT tokensTokens often passed in URL parameters
URL parametersNo encoding needed for - and _
FilenamesSafe across all filesystems
URL shortenersCompact and clean URLs
OAuth state parametersPassed as URL query params

Advantages

  • Safe in URLs without percent-encoding
  • Works in filenames on all operating systems
  • More compact (no padding)
  • Double-click selectable

Base32

Overview

Base32 uses uppercase letters and digits 2-7, making it case-insensitive and human-friendly. It converts every 5 bytes (40 bits) into 8 Base32 characters (5 bits each).

Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567

Padding: Uses = when input length isn’t divisible by 5

Case: Insensitive (decoding accepts lowercase)

How it Works

  1. Convert input to binary (UTF-8)
  2. Split binary into 5-bit groups
  3. Map each 5-bit value (0-31) to a character
  4. Add padding to make length a multiple of 8

Example Encoding

Text:     "Hi"
UTF-8:    0x48 0x69
Binary:   01001000 01101001
Grouped:  01001 00001 10100 1[0000] (pad to 5 bits)
Decimal:  9     1     20    16
Base32:   J     B     U     Q
Padding:  Need 4 more chars to make 8
Result:   "JBUQ===="

Character Choice

Why A-Z and 2-7?

ExcludedReason
0 (zero)Looks like O (letter)
1 (one)Looks like I or l
8, 9Reserved for extended Base32 variants

This makes Base32 ideal for:

  • Verbal communication
  • Manual entry
  • Case-insensitive systems
  • OCR and handwriting

Use Cases

ApplicationWhy Base32
TOTP/HOTP secrets (2FA)Users manually enter seeds, case-insensitive helps
Tor hidden services.onion addresses use Base32
Recovery codesEasy to read and type correctly
License keysUnambiguous characters reduce support requests
DNS labelsCase-insensitive, RFC 4648 standard

Advantages & Disadvantages

Advantages:

  • Case-insensitive
  • No ambiguous characters
  • Good for human input

Disadvantages:

  • Higher overhead (~60%)
  • Longer output than Base64
  • Less common support

Base32 Alphabet Variants

RFC 4648 defines the standard Base32 alphabet, but several variants exist for specific use cases:

RFC 4648 Standard

Alphabet: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567

The most common variant, used in:

  • TOTP/HOTP (Google Authenticator, Authy)
  • Most RFC-compliant implementations
  • General-purpose applications

RFC 4648 Extended Hex

Alphabet: 0123456789ABCDEFGHIJKLMNOPQRSTUV

Uses digits first, then letters:

  • Preserves hexadecimal sort order
  • Useful when transitioning from hex encoding

z-base-32

Alphabet: ybndrfg8ejkmcpqxot1uwisza345h769

Designed for human usability:

  • Characters chosen to avoid confusion in all fonts
  • Optimized for verbal communication
  • Used in Tahoe-LAFS distributed filesystem

Crockford’s Base32

Alphabet: 0123456789ABCDEFGHJKMNPQRSTVWXYZ

Created by Douglas Crockford for human-readable IDs:

  • Excludes I, L, O, U (ambiguous or vowels)
  • Includes check symbols for error detection
  • Case-insensitive (decoding accepts lowercase)
  • Used for short identifiers and database keys

Special features:

  • Letters I, i, L, l decode as 1
  • Letter O, o decodes as 0

Bech32

Alphabet: qpzry9x8gf2tvdw0s3jn54khce6mua7l

Designed for Bitcoin SegWit addresses:

  • Optimized for error detection
  • No mixed-case requirement
  • Works well with QR codes
  • Note: The tool implements only the alphabet, not the full Bech32 checksum

Key design choices:

  • No b, i, o (visually ambiguous)
  • No 1 (used as separator in full Bech32)
  • Optimized for BCH error correction codes

Choosing a Base32 Variant

VariantBest ForKey Feature
RFC 4648TOTP, general useStandard, widest support
Extended HexHex-compatible systemsPreserves sort order
z-base-32Verbal communicationMaximum clarity
CrockfordShort IDs, human inputError tolerance
Bech32Crypto addressesError detection

Base58

Overview

Base58 excludes visually ambiguous characters, making it ideal for manual entry and copy-paste. No padding needed.

Alphabet: 123456789ABCDEFGHJKLMNPQRSTUVWXYZabcdefghijkmnopqrstuvwxyz

Padding: None

Case: Sensitive

Character Choice

Excluded Characters:

CharacterReason
0 (zero)Looks like O (capital o)
O (capital o)Looks like 0 (zero)
I (capital i)Looks like l (lowercase L) or 1
l (lowercase L)Looks like I or 1

Advantages of this alphabet:

  • Double-click selects entire string
  • No confusion in similar fonts
  • Easier to read aloud
  • Better for QR codes

How it Works

Unlike Base64/Base32, Base58 treats input as a large number:

  1. Convert input bytes to a big integer
  2. Repeatedly divide by 58
  3. Map remainders to alphabet characters
  4. Preserve leading zero bytes

Example Encoding

Text:     "Hi"
UTF-8:    0x48 0x69
Integer:  0x4869 = 18537 (decimal)

18537 ÷ 58 = 319 remainder 45 → 'j'
319 ÷ 58 = 5 remainder 29   → 'W'
5 ÷ 58 = 0 remainder 5     → '5'

Result: "5Wj" (reversed remainders)

Base58Check (Bitcoin variant)

Bitcoin uses Base58Check, which adds a version byte (identifies address type) and checksum (first 4 bytes of SHA256(SHA256(data))) for error detection.

Example Bitcoin Address Structure

Version + Payload + Checksum
   1 byte   20 bytes   4 bytes
     ↓         ↓         ↓
    [00] [pubkey hash] [checksum]

    Encoded as Base58Check

    1A1zP1eP5QGefi2DMPTfTL5SLmv7DivfNa

Use Cases

ApplicationWhy Base58
Bitcoin/crypto addressesUnambiguous, includes checksum variant
IPFS content IDs (v0)Human-readable content addressing
Blockchain explorersShort, URL-safe identifiers
Vanity addressesUsers can “mine” custom prefixes
Short codesMore compact than Base32

Advantages & Disadvantages

Advantages:

  • No ambiguous characters
  • No special characters (URL-safe)
  • Double-click selectable
  • More compact than Base32

Disadvantages:

  • Not as compact as Base64
  • Limited library support
  • Complex algorithm (big integer math)
  • Variable-length encoding

Custom Alphabets

When to Use Custom Alphabets

Create a custom alphabet when:

  • Domain-specific constraints - Your system only accepts certain characters (e.g., alphanumeric-only filenames)
  • Human readability - Remove confusing characters for your audience (e.g., no vowels to avoid profanity)
  • Legacy compatibility - Match an existing encoding scheme
  • Aesthetic requirements - Use characters appropriate for your context

Not for security. Custom alphabets provide obscurity, not cryptographic protection.

How to Design a Custom Alphabet

1. Choose your character set

Consider what constraints you have:

  • URL-safe? Avoid special characters
  • Case-sensitive system? Can use upper and lowercase
  • Human input? Avoid ambiguous characters (0/O, 1/I/l)
  • Voice communication? Use distinct sounds

2. Pick the alphabet size

Larger alphabets = more compact encoding:

  • Base16 (Hex) - 2 characters per byte, easy to read
  • Base32 - Good balance, case-insensitive options
  • Base58 - Compact, unambiguous
  • Base64 - Most compact, but has special chars

3. Order matters for sorting

If you need encoded values to sort the same as the original data, put characters in sort order:

  • 0123456789ABCDEF (hex order)
  • ABCDEFGHIJKLMNOPQRSTUVWXYZ234567 (RFC 4648 Base32)

Design Checklist

  • All characters are unique (no duplicates)
  • 2-256 characters in your alphabet
  • Works in your target environment (URLs, databases, filesystems)
  • Documented for future maintainers
  • Tested with edge cases (empty input, all zeros, large inputs)

Common Examples

Alphanumeric (Base36) - Case-insensitive, no special chars:

0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ

No vowels (Base32) - Avoid accidental words:

23456789BCDFGHJKLMNPQRSTVWXYZ

DNA encoding (Base4):

ACGT

Lowercase hex (Base16):

0123456789abcdef

Learn More

Try the interactive Base Encoding Tool →

For specifications, see RFC 4648 (Base64, Base32, Base16) and Bitcoin Base58Check.