English Language Study:
Selected Electronic Texts & Resources
| Dictionaries | Linguistic Corpora | Text Collections |
| Useful Links |
Definition:
In principle, any collection of more than one text can be called a
corpus, (corpus being Latin for "body", hence a corpus is any body of text). But the term "corpus" when used in the context of modern linguistics tends most frequently to have more specific connotations than this simple definition. For more information on how corpora are used in language studies, see T. McEnery and A. Wilson's site on Corpus Linguistics.
Dictionaries:
-
The Oxford English Dictionary (OED) allows you access to the largest historical dictionary ever published. The OED is the accepted authority on the
evolution of the English language over the last millennium. It is an unsurpassed guide to the meaning, history, and pronunciation of over half a million words, both
present and past. It traces the usage of words through 2.5 million quotations from a wide range of international English language sources.
-
The Early Modern English Dictionaries Database (EMEDD) is a reference work for English of the Renaissance period. It is designed to make
accessible the English-language content of bilingual (English and other languages) and monolingual (English-only) dictionaries, glossaries, grammars, and encyclopedias published in England from 1500 to 1660.
Linguistic Corpora:
-
The BNC is a very large (over 100 million words) corpus of modern English, both
spoken and written. The Corpus is designed to represent as wide a range
of modern British English as possible. The written part (90%) includes,
for example, extracts from regional and national newspapers, specialist
periodicals and journals for all ages and interests, academic books and
popular fiction, published and unpublished letters and memoranda,
school and university essays, among many other kinds of text. The
spoken part (10%) includes a large amount of unscripted informal
conversation, recorded by volunteers selected from different age,
region and social classes in a demographically balanced way, together
with spoken language collected in all kinds of different contexts,
ranging from formal business or government meetings to radio shows and
phone-ins. While we do not presently have access to the full corpus,
you can do simply searches on-line, which will provide results of up to
50 hits.
- VIEW: Variation in English Words and Phrases:
This website allows you to quickly and easily search for a wide range of words and phrases of English in the 100 million word British National Corpus.
As with some other BNC interfaces, you can search for words and phrases by exact word or phrase, wildcard or part of speech, or combinations of these. You can also search for surrounding words (collocates) within a ten-word window (e.g. all nouns somewhere near paper, all adjectives near woman, or all nouns near spin).
One unique aspect of the corpus is the ability to find the frequency of words and phrases in any combination of registers that you define (spoken, academic, poetry, medical, etc). In addition, you can compare between registers --
for example, verbs that are more common in legal or medical texts, or nouns near break that are more common in fiction than in academic writing. Finally, you can easily compare between synonyms and other semantically-related words.
- COBUILD Bank of English - Concordance & Collocations Sampler
The Collins WordbanksOnline English corpus is composed of 56 million words of contemporary written and spoken text from the following sources: British books, ephemera, radio, newspapers,
magazines, American books, ephemera and radio, and British transcribed speech. In this sample portion of the database, you can type in some
simple queries and get a display of concordance lines from the corpus. The query syntax allows you to specify word
combinations, wildcards, part-of-speech tags, and so on.
- Michigan Corpus of Academic Spoken English (MiCASE)
An on-line, searchable part of a collection of transcripts of academic speech events recorded at the University of Michigan.
There are currently 152 transcripts (totaling 1,848,364 words) available at this site.
Searchable Full-Text Collections:
- American Film Scripts Online - UW restricted
Contains hundreds of American motion picture scripts. AFSO uses PhiloLogic software, developed at the University of Chicago, to enable in-depth browsing and searching of both the bibliographic and the full-text elements within the database. Search for words or combination of words as they appear in the text.
- Black Drama 1850 to Present - UW restricted
Contains plays by playwrights from North America, English-speaking Africa, the Caribbean, and other African diaspora countries. Black Drama uses PhiloLogic software, developed at the University of Chicago, to enable in-depth browsing and searching of both the bibliographic and the full-text elements within the database. Search for words or combination of words as they appear in the text.
- Humanities Text Initiative (U Michigan)
The University of Michigan Humanities Text Initiative (HTI) provides free access to a large range of electronic text collections.
The text of these collections may be searched for the use/instance of particular words or phrases, using a variety of searching techniques. Be sure to look at the search tips to maximize search efficiency. Some of the collections in the HTI are listed below:
- Modern English Works
The texts in this collection come from a variety of sources on the Internet, including the Oxford Text Archive, Project Gutenberg, the Online Book Initiative, and contributions from individual text encoders.
Authors include Conrad, Dickens, Forster, Melville, Poe, Wharton, and many more.
- American Verse Project
The Humanities Text Initiative is assembling an electronic archive of volumes of American poetry. Most of the archive is made up of 19th century poetry, although a few 18th century and early 20th century texts are included.
- Michigan Early Modern English Materials
The Michigan Early Modern English Materials (MEMEM) were compiled by Richard W. Bailey, Jay L. Robinson, James W. Downer, with Patricia V. Lehman. The Materials consist of citations collected for the modal verbs and certain other
English words for the Early Modern English Dictionary. Many of the slips used in the work were the original Oxford English Dictionary slips, provided to the University of Michigan by the editors of the OED.
- Bible: King James Version and Revised Standard Version
The original electronic text for this version of the Bible was provided by the Oxford Text Archive. The Revised Standard Version of the Bible is copyright © National Council of Churches of Christ in America.
- Middle English Compendium
The Middle English Compendium has been designed to offer easy access to and interconnectivity between three major Middle English electronic resources: an electronic version of the Middle English Dictionary, a HyperBibliography of Middle
English prose and verse, based on the MED bibliographies, and an associated network of electronic resources.
- Corpus of Middle English Verse and Prose
This collection of Middle English texts was assembled from works contributed by University of Michigan faculty and from texts provided by the Oxford Text Archive, as well as works created specifically for the Corpus by the HTI. At present,
forty-two texts are available; several others will be added soon.
- Old English Corpus
Originally prepared for internal use at the Dictionary of Old English, the Corpus contains all surviving OE material, excluding some variant texts.
-
Lexis-Nexis provides an extensive array of full-text sources, including major U.S. and international newspapers and news transcripts from many core news television broadcasts. Search for words as they appear in the text of transcripts and newspapers. See Lexis-Nexis help guide for use of commands that enable detailed searching of full-text information.
-
London Times -
UW restricted
Full text of this major British newspaper from 03/18/1788 to 1985.
-
New York Times - UW restricted
Search this important newspaper full text from 1857 to 2001.
- North American Women's Letters & Diaries - UW restricted
Full-text database of letters and diaries of women who lived in North America before 1950. Browsing and searching of both the bibliographic and full-text elements provided by PhiloLogic software. Search for words or combination of words as they appear in the text.
- Google Fight
Compare the use of two different terms of phrases on the web.
Useful Links: