Announcing Double-X-Encoding - Encode any UTF-8 string with [0-9a-zA-Z_]
We're happy to announce the release of Double-X-Encoding, a simple and
effective way to encode any Unicode string using only the characters [0-9a-zA-Z_]
.
Double-X-Encoding is similar to URL percent-encoding, but has been specifically designed to only use alpha-numeric ASCII characters and the underscore. This makes it particularly useful for generating unique IDs in various use-cases and especially for GraphQL IDs.
At Airsequel it allows users to use any Unicode string as a column name and still have a direct mapping to our automatically generated GraphQL API.
The encoding scheme is based on a set of straightforward rules:
Characters in the range
[0-9A-Za-z_]
are encoded as is, except forXX
which is encoded asXXXXXX
All other printable characters inside the ASCII range are encoded as
XX[0-9A-W]
All other Unicode code points until
U+fffff
(e.g. Emojis) are encoded as a sequence of 7 characters:XX[a-p]{5}
, where the 5 characters are the hexadecimal representation with an alternative hex alphabet ranging froma
top
instead of0
tof
.All Unicode code points in the Supplementary Private Use Area-B (
U+100000
toU+10ffff
) are encoded as a sequence of 9 characters:XXY[a-p]{6}
Additionally, the Double-X-Encoding scheme offers optional support for encoding leading digits and double underscores to fulfill GraphQL's constraints regarding IDs.
A leading digit can be encoded as
XXZ[0-9]
.Double underscores (
__
) can be encoded asXXRXXR
.
Here are a few examples to illustrate how Double-X-Encoding works:
Input | Encoded |
---|---|
camelCaseId | camelCaseId |
snake_case_id | snake_case_id |
__Schema | __Schema |
doxxing | doxxing |
DOXXING | DOXXXXXXING |
id with spaces | idXX0withXX0spaces |
id-with.special$chars! | idXXDwithXXEspecialXX4charsXX1 |
id_with_รผmlรคutร | id_with_XXaaapmmlXXaaaoeutXXaaanp |
Emoji: ๐
| EmojiXXGXX0XXbpgaf |
Multi Byte Emoji: ๐จโ๐ฆฒ | MultiXX0ByteXX0EmojiXXGXX0XXbpegiXXacaanXXbpjlc |
With encoding of leading digit and double underscore activated:
Input | Encoded |
---|---|
1FileFormat | XXZ1FileFormat |
__index__ | XXRXXRindexXXRXXR |
We believe that Double-X-Encoding is a simple and effective solution for encoding Unicode strings and we can't wait to see what you're going to use it for!
You can find the code in following GitHub repo:
github.com/Airsequel/double-x-encoding
Give it a try and let us know what you think!
๐ฌSubscribe to our monthly newsletter
๐Follow us on X
@Airsequelโฌ Previous Post
Airsequel 0.3 - Unleashing the power of SQLNext Post โฎ
Airsequel 0.4 - Our Biggest Release Yet