Skip to content

Alphabet Presets

The string_gen.alphabets module provides alphabet presets for non-ASCII string generation.

Usage

from string_gen import StringGen
from string_gen.alphabets import CYRILLIC, GREEK, ASCII

# Cyrillic word characters
gen = StringGen(r'\w{10}', alphabet=CYRILLIC)
gen.render()  # e.g. 'ёЩкРблнЫйМ'

# Mixed alphabets (combine with +)
gen = StringGen(r'\w{10}', alphabet=CYRILLIC + ASCII)
gen.render()  # mix of Cyrillic and Latin letters

# Custom alphabet — any string of letters
gen = StringGen(r'\w{5}', alphabet='αβγδε')
gen.render()  # e.g. 'γα3δ_'

Available Presets

Preset Description
ASCII string.ascii_letters (default)
CYRILLIC Russian alphabet (а-я, А-Я, ё, Ё)
GREEK Greek alphabet (α-ω, Α-Ω)
LATIN_EXTENDED ASCII + accented Latin characters
HIRAGANA Japanese Hiragana
KATAKANA Japanese Katakana
CJK CJK Unified Ideographs
HANGUL Korean Hangul syllables
ARABIC Arabic script
DEVANAGARI Devanagari script (Hindi, Marathi, Nepali)
THAI Thai script
HEBREW Hebrew script
BENGALI Bengali/Bangla script
TAMIL Tamil script
TELUGU Telugu script
GEORGIAN Georgian script
ARMENIAN Armenian script
ETHIOPIC Ethiopic/Ge'ez script (Amharic)
MYANMAR Myanmar/Burmese script
SINHALA Sinhala script
GUJARATI Gujarati script
PUNJABI Punjabi/Gurmukhi script

How Alphabets Affect Categories

When alphabet is set, it replaces string.ascii_letters in character category resolution:

Category Behavior
\w alphabet + digits + "_"
\W everything in the printable set that is not in \w
\d always 0-9 (unchanged)
\D everything in the printable set that is not 0-9
\s always standard whitespace (unchanged)
\S everything in the printable set that is not whitespace
. any character from the printable set except \n
[^...] negated class drawn from the printable set

The printable set is derived from the alphabet: alphabet + digits + "_" + punctuation + whitespace.