Moreover, even at a given level there may be different labeling schemes or even disagreement amongst annotators, such that we want to represent multiple versions.A second property of TIMIT is its balance across multiple dimensions of variation, for coverage of dialect regions and diphones.Moreover, notice that all of the data types included in the TIMIT corpus fall into the two basic categories of lexicon and text, which we will discuss below.

It could also be a phrasal lexicon, where the key field is a phrase rather than a single word.

A thesaurus also consists of record-structured data, where we look up entries via non-key fields that correspond to topics.

The inclusion of speaker demographics brings in many more independent variables, that may help to account for variation in the data, and which facilitate later uses of the corpus for purposes that were not envisaged when the corpus was created, such as sociolinguistics.

A third property is that there is a sharp division between the original linguistic event captured as an audio recording, and the annotations of that event.

Like the Brown Corpus, which displays a balanced selection of text genres and sources, TIMIT includes a balanced selection of dialects, speakers, and materials.

