preszr
is a schema-less pre-serialization JavaScript library that lets you shove arbitrary JavaScript objects through the pipes.
Here's how you use it:
- Encode any value with
preszr
. You get a flat, JSON-legal value that's usually an array or sometimes a primitive. - You can take it and serialize it with
JSON.stringify
or anything else that can serialize JS objects, like BSON. - You send it via the network or maybe save it to file.
- At the other end (in time or space), you first deserialize the data using
JSON.parse
or whatever you used. - Then you decode it using
preszr
. - Now you have your thing back, with all its references and prototypes and everything (if any).
preszr
uses a strict, well-defined, and extensible format that can encode any JavaScript object.
🔗 Preserves references and prototypes!
🐐 Supports all built-in data types and values as of 2023!1
🛠️ Super easy to encode custom types, with several layers of customization!
🌍 Written in vanilla JS to work in all environments.
preszr
exports three things:
import { encode, decode, Preszr } from "preszr"
encode
will encode your thing into a preszr message:
const yourThing = {
a: new Uint16Array([1, 2, 3]),
b: undefined,
c: []
}
const encoded = encode({
a: new Uint16Array([1, 2, 3])
})
// Serializing the message
const serialized = JSON.stringify(encoded)
// Sending it
websocket.send(serialized)
decode
will do the opposite:
websocket.on("message", event => {
const data = event.data
// Deserializing
const deserialized = JSON.parse(data)
// Decoding
const decoded = decode(deserialized)
// Got your thing back!
expect(decoded).toEqual(yourThing)
})
The default functions will work for most objects - for example:
Date
ArrayBuffer
BigInt64Array
(if exists)
Or, more generally, any platform-independent, built-in object that's not explicitly unsupported.
What about other object types, though? Let's take a real-world example.
class UhhNumber {
constructor(_value) {
this._value = value
}
plus(other) {
return new UhhNumber(this._value + other._value)
}
valueOf() {
return this._value
}
get value() {
return this._Value
}
}
Say you wanted preszr
to encode one of those. You just create a new Preszr
instance and give it a config object like this:
// 'new' is actually optional.
const prz = new Preszr({
encodes: [UhhNumber]
})
And that's it. myInstance
can now encode an UhhNumber
! Here is an example:
const recoded = prz.decode(prz.encode(new UhhNumber(5)))
expect(recoded).toBeInstanceOf(UhhNumber)
expect(recoded.value).toBe(5)
expect(recoded.plus(recoded)).toEqual(new UhhNumber(10))
The Preszr
object is immutable and its configuration can't be modified later.
npm install preszr
Or:
yarn add preszr
preszr
follows semver, and changes in the format of a preszr message will always increment the major version.
To ensure it doesn't decode data incorrectly, preszr
injects its major version into non-trivial preszr messages. preszr
will use that version number to determine whether it can decode the message. Right now, preszr
will error unless that number is the same, but in the future, it might have some fallback.
This is one of the features that allow you to safely write preszr messages to disk.
To encode objects, preszr
uses objects called encodings. Here is what they look like:
{
// The thing this encoding is for.
// A constructor or prototype.
encodes: UhhNumber
// A name. Can usually be inferred.
name: "NumberOrSomething"
// A version. Defaults to 1. We'll talk about these later.
version: 1,
// Encoding logic.
encodes(/* LATER */) { /* LATER */ },
// Decoding logic.
decoder: {
create(/* LATER */) { /* LATER */ },
init(/* LATER */) { /* LATER */ }
}
}
When encoding, preszr
will use the encoding of the nearest prototype of an object that it knows, possibly down to Object.prototype
if it doesn't find anything else. You can't have two encodings for the same prototype unless they're versioned (but we'll talk about versioning later). The same is true for encodings with the same name.
These objects go into the encodes
property of the Preszr
configuration object. The order doesn't matter. When you put a constructor in there, like in the Usage section, preszr will generate a complete encoding behind the scenes, inferring or using defaults for everything except the encodes
property (which is required).
In some cases, like if preszr
fails to infer the constructor name or there is a collision, you'll need to supply a basic encoding object. It can just include the name
and encodes
properties, though:
const NamelessClass = (class {});
// throws PreszrError(config/spec/proto/no-name):
// Couldn't get the prototype's name. Add a 'name' property
let preszr = Preszr({
encodes: [NamelessClass]
});
// Doesn't throw
preszr = Preszr({
encodes: [{
encodes: NamelessClass
name: "NamelessClass"
}]
})
The default encoding logic is pretty dumb, but it will work for most normal objects. Here is what it does:
- When encoding, it will copy the object key by key1 and recursively encode each value.
- When decoding, it will create a new object with the right prototype and then copy the input key by key1, decoding its values.
1 Inherited or non-enumerable keys are not copied. Symbol keys are copied though, if they are enumerable.
Some objects can't be encoded like this. For example, Map
and Set
use hidden internal data. Some of your objects might too. They also might depend on variables bound to closures or on the phase of the moon.
For those objects, you'll need to write custom encoding logic. That's the encodes
function and the decoder
object. While you can have neither and just use the default behavior if you specify one of them, you have to also specify the other.
Encoding is done by a single function that looks like this:
function encodes(uhhNumber, ctx) {
return uhhNumber.value
}
This function takes the input (here called uhhNumber
) and returns a representation of it that consists of only structural data and JSON-legal values. The representation can be anything you choose - an object, a number, a string, and so on. So the following are all valid representations:
"abcd"
null
1500
{ value: 10}
[1]
[[[[[10]]]]]
The following are not:
new Uint8Array()
function () {console.log("hello world")}
100n
undefined
document.all
Preszr won't check your results, though (for performance reasons), and if you return anything weird, it can lead to undefined behavior.
Each encode
function is supposed to only ever encode a single prototype. When it stumbles on some internal value (that of a property, an element of a collection, or something else), it can give control back so preszr
can handle encoding it. It does that using the ctx.encode
method.
ctx.encode
differs from the previous functions called encode
we discussed. It's only for encoding the internals of an object - it doesn't return a preszr message or anything like that.
Instead, it will always return a JSON-legal primitive that you just need to plug in the right place - either the value itself, an encoded string, or a reference (which is also a string).
At any rate, it makes writing an encoding for something extremely simple. Even for a complex object, you just need to figure out how to represent the structure of its internal values.
Let's look at the encoding of Set as an example:
function encode(set, ctx) {
const result = []
for (const item of set) {
result.push(
// We don't need to worry about encoding the elements,
// and just let preszr handle it.
ctx.encode(item)
)
}
return result
}
It's usually that simple.
ctx.encode
can recurse - if it needs to encode a value of the type you're encoding down the line, it will call your encode
function again.
Calls to encode
are generally hard to predict, so it's important to make sure your encode
function doesn't cause side effects or have an internal state.
Decoding is a two-step process:
- You first CREATE the object based on its preszr representation, without decoding any internal data.
- Then you INIT it by decoding the internal data and putting it where it belongs.
A decoder object is just an object with those functions. However, both are optional.
const decoder = {
create(input) {},
init(input, ctx) {}
}
// CREATE stage for Set:
function create(encodedInput) {
return new Set()
}
During this stage, you need to return an instance with the correct prototype.
However, you can't decode any internal data that was encoded using ctx.encode
, since other objects might not have had their CREATE step execute, so there is nothing ctx.encode
can return.
The result of this stage will be an empty, uninitialized object. It'll have fields that are undefined
, methods that don't work, etc.
On the other hand, if you don't need to decode internal data (e.g. the object can't have references to other objects), you don't need the next stage at all, and just this function will be enough. One example is ArrayBuffer
, which is just encoded as a base64 string.
const decoder = {
create(base64) {
return base64ToArrayBuffer(base64)
}
}
If you don't have a create
method, the object will be created using Object.create
and left blank. This will often be enough.
// INIT stage for Set:
function init(target, encodedInput, ctx) {
// We chose our `input` to be an array.
for (const element of encodedInput) {
target.push(
ctx.decode(element);
)
}
}
During this stage, we initialize the object that was created in the CREATE stage (which we get through the target
parameter). This time, we can use the ctx
we get to decode internal data, much in the same way we did when we were encoding.
ctx.decode
lets us decode an encoded value. Its input is the output of the ctx.encode
function we used while encoding. The result will be the proper, decoded form of what you gave it. It will resolve references, decode encoded strings, or (if was JSON-legal in the first place) just return the value is it is.
Unlike ctx.encode
, ctx.decode
is not recursive. Objects you get from it will have had their CREATE stage execute, but not their INIT stage, so they might have properties that are undefined
and methods that don't work.
Your init
function should not return anything and its return value will be ignored.
If you don't have an init
function, this stage will do nothing.
You can have an empty decoder object, {}
. An empty decoder object will create the object using Object.create
and not initialize anything. I don't know why you'd want to do that though. Maybe if you're just testing the encoding part.
preszr
can also deal with JavaScript symbols, both as values and as object keys. Just like with prototypes, preszr
knows about all the built-in symbols (though most are not relevant to data), but you'll need to tell it about your custom symbols so it can reproduce them.
To do that, you can just put them in the encodes
array of the Preszr
configuration:
const mySecretSymbol = Symbol("A secret symbol")
const przWithSymbols = new Preszr({
encodes: [mySecretSymbol]
})
This creates a symbol encoding, a different type of encoding than we talked about earlier (which was actually a prototype encoding). Symbol encodings don't have the features of prototype encodings. Here is what a complete symbol encoding looks like:
const przWithSymbols = new Preszr({
encodes: [{
encodes: mySecretSymbol,
name: "MySecretSymbol"
}]
)
const symbolEncoding =
Like with prototype encodings, encodes
is required but name
can be omitted - that's what happens if you just provide the symbol. In that case, the name is inferred from the symbol description. If the symbol doesn't have a description or if there is a collision, you'll need to provide the property after all.
Symbol encodings go into the same encodes
array as other encodings.
If preszr
encounters a symbol it doesn't recognize, it won't ignore it or error. It will instead replace all of its appearances with a stand-in symbol. This is similar to what it does with unsupported values. The symbol's description will be similar to:
preszr unrecognized ${description}
preszr
's internal versioning system for custom encodings has been designed to handle two specific use-cases:
- Reading legacy data, such as data that was written to disk before a change in your objects was made.
- Overriding built-in encodings, such as for
Set
and the like.
But first, let's look at how preszr
manages these versions in general.
preszr
identifies each encoding with an encoding key. For prototype encodings, this key will be a combination of the name and version of the encoding:
;`${ENCODING_NAME}.v${ENCODING_VERSION}`
The rule is that two encodings with the same key can't exist on a Preszr
object.
- For built-in encodings, the version is always
0
. Instead of using this system, any change in a built-in encoding will cause a change in the library’s major version. - For user-defined encodings, the version defaults to
1
, but can be any positive, safe integer.
Another rule is that two encodings with the same name must encode the same prototype.
When preszr
receives a value to encode, it will encode it using the encoding with the highest version, noting that version down in the preszr message. It won't do this when decoding, though. Instead, preszr
will try to find a decoder that matches the encoding key exactly. If it can't find, it will throw an error.
This follows the robustness principle:
Be conservative in what you send, be liberal in what you accept
It means that you'll be able to read legacy data, but never write it, and updating it is as easy as decoding and then re-encoding it.
When you version an encoding, there are a few more restrictions you have to follow:
- If you have a versioned encoding, each instance must have the
version
property - it will never be inferred. This makes it clear that your encoding is versioned. - You must also specify the
name
property. This is because a change in your object can cause the inferred name to change.
Versioned encodings still go into the encodes
list of encodings, in any order, as separate objects. Grouping them together doesn't do anything.
When you want to make a change to how your object is encoded while still being able to read old data, you need to do the following:
- Create a new version of your encoding. Your logic can be as similar or different as you want.
- If, after the modification, your object needs new or different data, modify each previous version of the encoding you want to support so that it returns a compatible object.
Versions work the same way for built-in encodings, except that you're not allowed to set their name
property when you define an override - they're identified uniquely by their prototype, and it would just be confusing.
Built-in encodings will always have the version 0
, so you can start your versions from 1
, but version
is still required.