Towards a universal transcription notation for the Voynich Manuscript
Home | Transcriptions | Character Tables
These pages propose a new notation for transcribing the Voynich Manuscript which, once suitably refined in consultation, could be considered universal.
Most of the major transcriptions made to date have been presented in Eva ("Extensible Voynich Alphabet"), or have been translated without loss into Eva. The notable exception is Glen Claston's transcription into his own v101 alphabet.
V101 has been considered incompatible with Eva, in particular because in many cases larger units have been used. For example, the v101 character 'm', transforms to the sequence 'iin' in Eva.
Another fundamental difference of approach is that variations which are not considered "distinctive" in Eva are noted in great detail in v101. For example, two subtly different versions of the character 'f' in Eva are transcribed separately as 'f' and 'u' in v101.
The result is that while a v101 transcription can be roughly translated into Eva without great difficulty; crucially, there is a loss of information in doing so: The nature of the compound characters and the nuances of variation of form are lost.
If a "super alphabet" could be found which encompassed both Eva and v101, then it would have a good claim to being universal.
Ceva-RM is a working title for a proposed Eva-like representation of both Eva and v101 transcriptions. It is "lossless", in the sense that every nuance of the original transcription is represented (and can be recovered by a reverse translation if desired).
Ceva (working title) stands for "Compound Eva". It uses Eva as its basis but goes beyond the existing definition by introducing two new representations:
Variations in form within compound characters are allowed in Ceva. For example, 'F' in v101 translates to '{cfh}', while 'U' maps to '{cf1h}', by analogy to the differentiation of 'f' and 'u' as 'f' and 'f1' respectively.
Each set of variations is naturally gathered into a "family" by considering them "net" of the digit suffixes. So for example, the v101 characters 's', 't', '$', and 'T', which being variations of Eva 's' are denoted respectively 's', 's1', 's6' and 's7' in Ceva, are collected into the Ceva family 's'.
Thus far only the "raw" form of Ceva has been expounded. Like Eva, it is formed purely of text in 7-bit ASCII. It is ideal for machine analysis. Another more visual "presentation" form is also defined. This is the form that should naturally be presented for human-readable purposes. It is constructed from the underlying raw form as follows:
The character tables on this site associate each Ceva character element (whether simple, compound or suffixed) with a glyph from a universal Voynich font. We have used Rebecca Bettencourt's Voynich Unicode font for this purpose. We have numbered the characters 000 to 286 to match the offsets of the Unicode points used by RB, in the Unicode Private Use Area (PUA) starting at U+FF400. We have denoted this alphabet Uva-RB.
Ceva-RM is a work a progress and the ambition remains to allocate each of these glyphs a Ceva code, following the "compound" and "family" principles already established. In the interim, any glyphs which do not yet have a Ceva code, are represented in the transcriptions in the form e.g. &uva123; for character 123 of the Uva-RB set. This rationalises both the Eva and v101 "rare character" codes, which otherwise do not match, into a universal form.
For the time being, the Ceva-RM definition, to the extent that it already exists, should be considered provisional. It is subject to revision, particularly in the allocation of characters to families.
©Robert Marson, Bristol UK, 14/06/2022