Thank you for the detailed and comprehensive explanations! > There are actually ...

kstenerud · on Jan 4, 2022

> If you'd like to eventually harden the binary implementations, you might also be interested in coverage-guided fuzz testing which feeds random garbage data to a method to try and find errors in it: https://llvm.org/docs/LibFuzzer.html

Yes, I plan to fuzz the hell out of the reference implementation once it's done. So much to do, so little time...

> I see, so the main use case of the CTE text format is rapid prototyping, and then the user should convert to the CBE binary format in production?

CTE would be for prototyping, initial data loads, debugging, auditing, logging, visualizing, possibly even for configuration (since the config would be local and not sourced from unknown origin). Basically: CBE when data passes from machine to machine, and CTE only where a human needs to get involved.

> Another approach would be to embed the grammar and parser into an existing language like Python, Rust, or Haskell, and let the user define their own custom types in that language.

I demonstrate this in the reference implementation by adding cplx() type support for go as a custom type. Then people are free to come up with their own encodings for their custom needs (one could specify in the schema how to decode them). I think there's enough there as-is to support most custom needs.

> maybe it would make sense to create a small set of core types (kind of like a standard library), and then permit extensions via user-defined types which must be whitelisted?

I thought about that, but the complexity grows fast, and then you have a constellation of "conformant" codecs that have different levels of support, which means you can now only count on the minimal set of required types and the rest are useless. The fewer optional parts, the better.