voynichtranscription

Towards a universal transcription notation for the Voynich Manuscript

Home | Transcriptions | Character Tables

Intention

These pages propose a new notation for transcribing the Voynich Manuscript which, once suitably refined in consultation, could be considered universal.

Motivation

Most of the major transcriptions made to date have been presented in Eva ("Extensible Voynich Alphabet"), or have been translated without loss into Eva. The notable exception is Glen Claston's transcription into his own v101 alphabet.

V101 has been considered incompatible with Eva, in particular because in many cases larger units have been used. For example, the v101 character 'm', transforms to the sequence 'iin' in Eva.

Another fundamental difference of approach is that variations which are not considered "distinctive" in Eva are noted in great detail in v101. For example, two subtly different versions of the character 'f' in Eva are transcribed separately as 'f' and 'u' in v101.

The result is that while a v101 transcription can be roughly translated into Eva without great difficulty; crucially, there is a loss of information in doing so: The nature of the compound characters and the nuances of variation of form are lost.

If a "super alphabet" could be found which encompassed both Eva and v101, then it would have a good claim to being universal.

Introducing "Ceva-RM"

Ceva-RM is a working title for a proposed Eva-like representation of both Eva and v101 transcriptions. It is "lossless", in the sense that every nuance of the original transcription is represented (and can be recovered by a reverse translation if desired).

Ceva (working title) stands for "Compound Eva". It uses Eva as its basis but goes beyond the existing definition by introducing two new representations:

Compound characters (from an Eva perspective), such as are very prevalent in v101, are represented uniquely using brace notation. For example, 'm' is translated to '{iin}'. This convention already exists in Eva for representing ligatures by some transcribers, but it is formalised here so that '{iin}' is considered a single glyph in the universal alphabet and is distinct from the sequence 'iin' of three glyphs.
Variations in form of standard Eva characters (variations which are typically not noted by Eva transcribers but which are noted in detail in v101) are denoted by digit(s) appended to the base form. For example, while 'f' in v101 translates to 'f' in Eva and therefore also in Ceva, the variation denoted 'u' in v101, is represented by 'f1' in Ceva. There is no ambiguity in doing this, as digits do not form part of the existing Eva alphabet. The Ceva definition allows more than one digit to be appended, thus accommodating the possibility that more than 10 variations could exist in theory. Such characters as 'f1' are considered a single glyph in the universal alphabet and are not enclosed in braces in the Ceva representation.

Variations in form within compound characters are allowed in Ceva. For example, 'F' in v101 translates to '{cfh}', while 'U' maps to '{cf1h}', by analogy to the differentiation of 'f' and 'u' as 'f' and 'f1' respectively.

Each set of variations is naturally gathered into a "family" by considering them "net" of the digit suffixes. So for example, the v101 characters 's', 't', '$', and 'T', which being variations of Eva 's' are denoted respectively 's', 's1', 's6' and 's7' in Ceva, are collected into the Ceva family 's'.

Raw and Presentation forms

Thus far only the "raw" form of Ceva has been expounded. Like Eva, it is formed purely of text in 7-bit ASCII. It is ideal for machine analysis. Another more visual "presentation" form is also defined. This is the form that should naturally be presented for human-readable purposes. It is constructed from the underlying raw form as follows:

Compound characters in braces are instead presented with an upper bracket or breve. For example, {iin} is shown as iin.
Digit suffixes are shown as subscripts. For example, r3 is shown as r₃.

Association with glyphs

The character tables on this site associate each Ceva character element (whether simple, compound or suffixed) with a glyph from a universal Voynich font. We have used Rebecca Bettencourt's Voynich Unicode font for this purpose. We have numbered the characters 000 to 286 to match the offsets of the Unicode points used by RB, in the Unicode Private Use Area (PUA) starting at U+FF400. We have denoted this alphabet Uva-RB.

Ceva-RM is a work a progress and the ambition remains to allocate each of these glyphs a Ceva code, following the "compound" and "family" principles already established. In the interim, any glyphs which do not yet have a Ceva code, are represented in the transcriptions in the form e.g. &uva123; for character 123 of the Uva-RB set. This rationalises both the Eva and v101 "rare character" codes, which otherwise do not match, into a universal form.

For the time being, the Ceva-RM definition, to the extent that it already exists, should be considered provisional. It is subject to revision, particularly in the allocation of characters to families.