[Milton-L] RE: Culturomics? Genome?
jamesrovira at gmail.com
Fri Dec 17 11:35:20 EST 2010
I'm guessing quite a few of those 8500 words added every year -- if that
number is correct -- are technical or field-specific words that should be
not represented in a normal English language dictionary. Some of these are
probably slang words that don't need to be added either, at least not yet.
On Fri, Dec 17, 2010 at 11:31 AM, Gilliatt, Cynthia Ann - gilliaca <
gilliaca at jmu.edu> wrote:
> .this in today's Guardian about two "culturomics" researchers at Harvard
> who are using Google data and $ to study the English language "genome":
> "In their initial analysis of the database, the team found that around
> 8,500 new words enter the English language every year and the lexicon grew
> by 70% between 1950 and 2000. But most of these words do not appear in
> dictionaries. "We estimated that 52% of the English lexicon – the majority
> of words used in English books – consist of lexical 'dark matter'
> undocumented in standard references," they wrote in the journal Science (the
> full paper is available with free online registration)."
> So how did their computerknow they were words? And what dictionaries did
> they use? Did they include proper names?
> "Let's talk a bit about terms like "culturomics" and "genome" and the
> apparent need to sound like a scientist (a wacky scientist at that) in order
> to be taken seriously by the media and govt grant dispensers these days."
> Good topic.
> "But first, let me try to cast some doubt on the notion that 52 % of the
> English lexicon (as represented by 4 % of the books ever published in
> English) the majority of words used in English books do not appear in any
> dictionaries or other reference books."
> Which 4% of books printed in English? Who chose? Did they include texts
> in Early Modern English? Or were the texts all 20th/21st c?
> "This claim falls so far outside my experience as a reader and dictionary
> user that I want say. Are you kidding? Maybe their computer algorithm is
> good at searching a word database and very very poor at using a dictionary.
> I suspect that their search algorithm (Harvard's, not Google's) fails to
> allow for any sort of conjugation and inflection, so, for example, the word,
> "indirectly" comes up as "dark matter.""
> Dark matter indeed. Well worth discussing. Thanks.
> Milton-L mailing list
> Milton-L at lists.richmond.edu
> Manage your list membership and access list archives at
> Milton-L web site: http://johnmilton.org/
Dr. James Rovira
Program Chair of Humanities
Assistant Professor of English
155 Miami Street
Tiffin, OH 44883
roviraj at tiffin.edu
Blake and Kierkegaard: Creation and Anxiety
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Milton-L