[Milton-L] RE: Culturomics? Genome?

James Rovira jamesrovira at gmail.com
Fri Dec 17 11:35:20 EST 2010

I'm guessing quite a few of those 8500 words added every year -- if that
number is correct -- are technical or field-specific words that should be
not represented in a normal English language dictionary.  Some of these are
probably slang words that don't need to be added either, at least not yet.

Jim R

On Fri, Dec 17, 2010 at 11:31 AM, Gilliatt, Cynthia Ann - gilliaca <
gilliaca at jmu.edu> wrote:

> .this in today's Guardian about two "culturomics" researchers at Harvard
> who are using Google data and $ to study the English language "genome":
> "In their initial analysis of the database, the team found that around
> 8,500 new words enter the English language every year and the lexicon grew
> by 70% between 1950 and 2000. But most of these words do not appear in
> dictionaries. "We estimated that 52% of the English lexicon – the majority
> of words used in English books – consist of lexical 'dark matter'
> undocumented in standard references," they wrote in the journal Science (the
> full paper is available with free online registration)."
> So how did their computerknow they were words?  And what dictionaries did
> they use?  Did they include proper names?
> "Let's talk a bit about terms like "culturomics" and "genome" and the
> apparent need to sound like a scientist (a wacky scientist at that) in order
> to be taken seriously by the media and govt grant dispensers these days."
> Good topic.
> "But first, let me try to cast some doubt on the notion that 52 % of the
> English lexicon (as represented by 4 % of the books ever published in
> English) the majority of words used in English books do not appear in any
> dictionaries or other reference books."
> Which 4% of books printed in English?  Who chose?  Did they include texts
> in  Early Modern English? Or were the texts all 20th/21st c?
>  "This claim falls so far outside my experience as a reader and dictionary
> user that I want say. Are you kidding?  Maybe their computer algorithm is
> good at searching a word database and very very poor at using a dictionary.
> I suspect that their search algorithm (Harvard's, not Google's) fails to
> allow for any sort of conjugation and inflection, so, for example, the word,
> "indirectly" comes up as "dark matter.""
> Dark matter indeed. Well worth discussing.   Thanks.
> C
> _______________________________________________
> Milton-L mailing list
> Milton-L at lists.richmond.edu
> Manage your list membership and access list archives at
> http://lists.richmond.edu/mailman/listinfo/milton-l
> Milton-L web site: http://johnmilton.org/

Dr. James Rovira
Program Chair of Humanities
Assistant Professor of English
Tiffin University
155 Miami Street
Tiffin, OH 44883
(419) 448-3586
roviraj at tiffin.edu
Blake and Kierkegaard: Creation and Anxiety
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://lists.richmond.edu/pipermail/milton-l/attachments/20101217/ee0211ba/attachment.html

More information about the Milton-L mailing list