English Language Study:
Selected Electronic Texts & Resources
| Dictionaries | Linguistic Corpora | Text Collections |
| Useful Links |
Definition:
In principle, any collection of more than one text can be called a
corpus, (corpus being Latin for "body", hence a corpus is any body of text). Corpus linguistics approaches the study of language through corpora. For more specific information on how corpora are used in language studies, see T. McEnery and A. Wilson's book on Corpus Linguistics.
Uses:
Corpus linguistics can help answer the following questions:
- What are the most frequently used words and phrases in English?
- What are some of the primary differences between spoken English and written English?
- What tenses do people use most frequently?
- What prepositions tend to follow particular verbs?
- How many words must a non-native English speaker need to know in order to participate in everyday conversations?
Dictionaries:
-
The Oxford English Dictionary (OED) allows you access to the largest historical dictionary ever published. The OED is the accepted authority on the
evolution of the English language over the last millennium. It is an unsurpassed guide to the meaning, history, and pronunciation of over half a million words, both
present and past. It traces the usage of words through 2.5 million quotations from a wide range of international English language sources.
Linguistic Corpora:
-
The BNC is a very large (over 100 million words) corpus of modern English, both
spoken and written. The Corpus is designed to represent as wide a range
of modern British English as possible. The written part (90%) includes,
for example, extracts from regional and national newspapers, specialist
periodicals and journals for all ages and interests, academic books and
popular fiction, published and unpublished letters and memoranda,
school and university essays, among many other kinds of text. The
spoken part (10%) includes a large amount of unscripted informal
conversation, recorded by volunteers selected from different age,
region and social classes in a demographically balanced way, together
with spoken language collected in all kinds of different contexts,
ranging from formal business or government meetings to radio shows and
phone-ins. While we do not presently have access to the full corpus,
you can do simply searches on-line, which will provide results of up to
50 hits.
- Corpus of Contemporary American English - Mark Davies (BYU)
"The COCA is the largest publicly-available corpus of English, and the only genre-balanced corpus of American English.
The corpus contains more than 400 million words of text and is equally
divided among spoken, fiction, popular magazines, newspapers, and academic texts. It includes 20 million words each year from 1990-2009,
and the corpus is also updated every six to nine months. Because of its design, it is perhaps the only corpus of English that is suitable for looking at current, ongoing changes in the language. The interface allows you to search for
exact words or phrases, wildcards, lemmas, part of speech, or any combinations of these. You can search for surrounding words (collocates) within a ten-word window (e.g. all nouns somewhere near faint, all adjectives near woman, or all verbs near feelings), which often gives you good insight into the meaning and use of a word."
- COBUILD Bank of English - Concordance & Collocations Sampler
The Collins WordbanksOnline English corpus is composed of 56 million words of contemporary written and spoken text from the following sources: British books, ephemera, radio, newspapers,
magazines, American books, ephemera and radio, and British transcribed speech. In this sample portion of the database, you can type in some
simple queries and get a display of concordance lines from the corpus. The query syntax allows you to specify word
combinations, wildcards, part-of-speech tags, and so on.
For an excellent introduction to COBUILD, see the guide prepared by James Thomas (below).
- Michigan Corpus of Academic Spoken English (MiCASE)
An on-line, searchable part of a collection of transcripts of academic speech events recorded at the University of Michigan.
There are currently 152 transcripts (totaling 1,848,364 words) available at this site.
Searchable Full-Text Collections:
- American Film Scripts Online - UW restricted
Contains hundreds of American motion picture scripts. AFSO uses PhiloLogic software, developed at the University of Chicago, to enable in-depth browsing and searching of both the bibliographic and the full-text elements within the database. Search for words or combination of words as they appear in the text.
- Black Drama 1850 to Present - UW restricted
Contains plays by playwrights from North America, English-speaking Africa, the Caribbean, and other African diaspora countries. Black Drama uses PhiloLogic software, developed at the University of Chicago, to enable in-depth browsing and searching of both the bibliographic and the full-text elements within the database. Search for words or combination of words as they appear in the text.
- Humanities Text Initiative (U Michigan)
The University of Michigan Humanities Text Initiative (HTI) provides free access to a large range of electronic text collections.
The text of these collections may be searched for the use/instance of particular words or phrases, using a variety of searching techniques. Be sure to look at the search tips to maximize search efficiency. Some of the collections in the HTI are listed below:
- Modern English Works
The texts in this collection come from a variety of sources on the Internet, including the Oxford Text Archive, Project Gutenberg, the Online Book Initiative, and contributions from individual text encoders.
Authors include Conrad, Dickens, Forster, Melville, Poe, Wharton, and many more.
- American Verse Project
The Humanities Text Initiative is assembling an electronic archive of volumes of American poetry. Most of the archive is made up of 19th century poetry, although a few 18th century and early 20th century texts are included.
- Michigan Early Modern English Materials
The Michigan Early Modern English Materials (MEMEM) were compiled by Richard W. Bailey, Jay L. Robinson, James W. Downer, with Patricia V. Lehman. The Materials consist of citations collected for the modal verbs and certain other
English words for the Early Modern English Dictionary. Many of the slips used in the work were the original Oxford English Dictionary slips, provided to the University of Michigan by the editors of the OED.
- Bible: King James Version and Revised Standard Version
The original electronic text for this version of the Bible was provided by the Oxford Text Archive. The Revised Standard Version of the Bible is copyright © National Council of Churches of Christ in America.
- Middle English Compendium
The Middle English Compendium has been designed to offer easy access to and interconnectivity between three major Middle English electronic resources: an electronic version of the Middle English Dictionary, a HyperBibliography of Middle
English prose and verse, based on the MED bibliographies, and an associated network of electronic resources.
- Corpus of Middle English Verse and Prose
This collection of Middle English texts was assembled from works contributed by University of Michigan faculty and from texts provided by the Oxford Text Archive, as well as works created specifically for the Corpus by the HTI. At present,
forty-two texts are available; several others will be added soon.
- Old English Corpus
Originally prepared for internal use at the Dictionary of Old English, the Corpus contains all surviving OE material, excluding some variant texts.
-
Lexis-Nexis provides an extensive array of full-text sources, including major U.S. and international newspapers and news transcripts from many core news television broadcasts. Search for words as they appear in the text of transcripts and newspapers. See Lexis-Nexis help guide for use of commands that enable detailed searching of full-text information.
-
London Times -
UW restricted
Full text of this major British newspaper from 03/18/1788 to 1985.
-
New York Times - UW restricted
Search this important newspaper full text from 1857 to 2001.
- North American Women's Letters & Diaries - UW restricted
Full-text database of letters and diaries of women who lived in North America before 1950. Browsing and searching of both the bibliographic and full-text elements provided by PhiloLogic software. Search for words or combination of words as they appear in the text.
Useful Links: