Dante for Publishers

The Dante database was created by a team of top-class dictionary editors to provide a language resource of unprecedented richness. It holds all the linguistic data a publisher would need in order to produce a new monolingual or bilingual dictionary, update an existing dictionary, or transform a print dictionary into an extended online version.

The data

Entries in Dante look like conventional dictionary entries with the formal ‘bones’ of the entry made explicit. But Dante’s great strength is the granularity of the data. Its detailed and systematic record of every relevant linguistic feature is complemented by over half a million corpus sentences, each selected to illustrate a particular fact. The examples offer a wealth of contexts for the headword and a model for eventual dictionary examples.

entry imageUsing Dante, publishers can bypass the costly initial stages in the development of new products. The data is generic in nature: the database was developed using IDM’s DPS tool, but can be used by any dictionary writing system capable of handling XML databases. And Dante is so rich in data that any dictionary generated from it would contain only a selected subset of the database, ensuring that no two dictionaries derived from Dante will be recognizably similar.

Some possible publishing applications are sketched out below.

1. New bilingual dictionary with English as source language (SL)

entry image

Starting from Dante could deliver a totally new corpus-based bilingual dictionary with a wordlist of 60,000 headwords, or any smaller version. In a project of this nature, the English research stage (before the addition of any target-language data) can represent 40% or more of the development costs. DANTE offers a cost-effective short-cut. Each entry records in detail all the various syntactic, semantic and lexical contexts in which the headword or phrase is found in the corpus, thus offering the translating editors all the information they need in order to provide the target language material. The examples are full sentences directly drawn from the corpus. The editors need never return to raw corpus data in the course of editing their bilingual dictionary.

Method of use
The Dante database is designed to allow the automatic insertion of translation fields that can hold target-language equivalents at the appropriate points in the Dante entry; click here for an example of this format This facilitates a quick ‘spontaneous’ insertion of equivalents by translators, resulting in a translated Dante entry which editors then tailor in order to create their own entry. Click here for an example of how Dante is used to prepare a translated entry in the New English-Irish Dictionary.

For a publisher starting from the data in DANTE, with translations inserted, the outstanding tasks would then be:

2. New monolingual English dictionary

Dante provides enough data to form the basis of one-volume collegiate dictionaries, for either adult native speakers of English or school students, or a pedagogical dictionary for advanced learners. DANTE offers significant savings: we estimate that starting from DANTE (instead of from scratch) could save at least 50% of the development costs.

Each entry holds all the various syntactic, semantic and lexical contexts in which the headword or phrase is found in the corpus. The examples are full sentences directly drawn from the corpus. The editors need never return to raw corpus data in the course of editing their monolingual dictionary.

Method of use
With the data supplied by Dante, the outstanding tasks would be:

3. Dictionary Updater

Updating existing assets is a regular requirement and entails significant costs as missing new material (1) must be first identified, then (2) found in corpus data and (3) the existing entries re-edited.

Method of use
The proposed method of semi-automating this process involves comparing the content of a published dictionary with the Dante data. The output of the comparison would be a file of omissions in the existing dictionary. With the exception of the meaning explanations and the full-sentence corpus examples, almost all of the Dante material is coded and machine-retrievable. This will allow a commissioning publisher to specify the Dante data types to be used in the comparison

A DANTE-based (rather than simply a corpus-based) comparison will produce a more focused and relevant set of data for revising and updating an existing bilingual dictionary, where English is the source language. And once the selection has been made from the data, and the translation process carried out, a reversal of the SL-TL data will form a sound launchpad for updating a dictionary with English as the target language (TL).

Contact us for further discussion of the use of Dante to update dictionaries.

4. E-dictionary enhancer

entry imageAs electronic media become the primary delivery mode for dictionaries, there is an expectation that electronic editions – freed from the space constraints of print – should include a ‘hinterland’ of additional lexical data. This trend is most marked in the case of pedagogical (ELT) English dictionaries, where an emerging ‘freemium’ business model envisages two types of electronic edition:

With the corpus data currently available, sourcing this additional data is not difficult – but it entails significant costs for the publisher. Here again, DANTE offers a shortcut, providing exactly the kind of data needed to enhance a ‘basic’ existing dictionary.

5. Generic dictionary DTD

The Dante Document Type Definition is generic and comprehensive in nature, and may be licensed as a launchpad for any new dictionary, bilingual or monolingual. It has already furnished the DTD used in IDM’s Free Online Dictionary initiative.