We're happy to announce the release of Double-X-Encoding, a simple and effective way to encode any Unicode string using only the characters [0-9a-zA-Z_].

Double-X-Encoding is similar to URL percent-encoding, but has been specifically designed to only use alpha-numeric ASCII characters and the underscore. This makes it particularly useful for generating unique IDs in various use-cases and especially for GraphQL IDs.

At Airsequel it allows users to use any Unicode string as a column name and still have a direct mapping to our automatically generated GraphQL API.

The encoding scheme is based on a set of straightforward rules:

  1. Characters in the range [0-9A-Za-z_] are encoded as is, except for XX which is encoded as XXXXXX

  2. All other printable characters inside the ASCII range are encoded as XX[0-9A-W]

  3. All other Unicode code points until U+fffff (e.g. Emojis) are encoded as a sequence of 7 characters: XX[a-p]{5}, where the 5 characters are the hexadecimal representation with an alternative hex alphabet ranging from a to p instead of 0 to f.

  4. All Unicode code points in the Supplementary Private Use Area-B (U+100000 to U+10ffff) are encoded as a sequence of 9 characters: XXY[a-p]{6}

Additionally, the Double-X-Encoding scheme offers optional support for encoding leading digits and double underscores to fulfill GraphQL's constraints regarding IDs.

  1. A leading digit can be encoded as XXZ[0-9].

  2. Double underscores (__) can be encoded as XXRXXR.

Here are a few examples to illustrate how Double-X-Encoding works:

InputEncoded
camelCaseIdcamelCaseId
snake_case_idsnake_case_id
__Schema__Schema
doxxingdoxxing
DOXXINGDOXXXXXXING
id with spacesidXX0withXX0spaces
id-with.special$chars!idXXDwithXXEspecialXX4charsXX1
id_with_รผmlรคutรŸid_with_XXaaapmmlXXaaaoeutXXaaanp
Emoji: ๐Ÿ˜…EmojiXXGXX0XXbpgaf
Multi Byte Emoji: ๐Ÿ‘จโ€๐ŸฆฒMultiXX0ByteXX0EmojiXXGXX0XXbpegiXXacaanXXbpjlc

With encoding of leading digit and double underscore activated:

InputEncoded
1FileFormatXXZ1FileFormat
__index__XXRXXRindexXXRXXR

We believe that Double-X-Encoding is a simple and effective solution for encoding Unicode strings and we can't wait to see what you're going to use it for!

You can find the code in following GitHub repo:
github.com/Airsequel/double-x-encoding

Give it a try and let us know what you think!