[Milton-L] More on Culturomics? Genome?

Thomas H. Luxon Thomas.H.Luxon at dartmouth.edu
Fri Dec 17 12:44:14 EST 2010

Dear Fellow Scholars,

Here's more directly from the article in SCIENCE (http://www.sciencemag.org/content/330/6011/1600.full):

<blockquote>Even after excluding proper nouns, more than 50% of the words in the n-gram database do not appear in any published dictionary. Widely used words such as “deletable” and obscure ones like “slenthem” (a type of musical instrument) slipped below the radar of standard references."</blockquote>

I used the online OED just now and found the word "delete" with no trouble.  Using Google to search for "deletable" turns up an article for "delete" at dictionary.com in just seconds.  Wikipedia returns a perfectly acceptable article for "slenthem."  Does that not count as a reference tool? If not, how about Webster's online which has a very fine definition online at http://www.websters-online-dictionary.org/definitions/Slenthem?cx=partner-pub-0939450753529744%3Av0qd01-tdlq&cof=FORID%3A9&ie=UTF-8&q=Slenthem&sa=Search#922

So, as  suspected, their algorithm does not search dictionaries, online or otherwise, with appropriate skills mastered by most high-schoolers. And, oddly enough for them, Google does it all just fine.  So much for all that "dark matter."  Steven Pinker should be ashamed to be associated with this project, or at least with this version of it.  He gushes:

<blockquote>The first surprise, says Pinker, is that books contain “a huge amount of lexical dark matter.” Even after excluding proper nouns, more than 50% of the words in the n-gram database do not appear in any published dictionary.</blockquote>

Then there's this in the same article:

<blockquote>Analysis of the n-gram database can also reveal patterns that have escaped the attention of historians. Aviva Presser Aiden led an analysis of the names of people that appear in German books in the first half of the 20th century. (She is a medical student at Harvard and the wife of Erez Lieberman Aiden.) A large number of artists and academics of this era are known to have been censored during the Nazi period, for being either Jewish or “degenerate,” such as the painter Pablo Picasso. Indeed, the n-gram trace of their names in the German corpus plummets during that period, while it remains steady in the English corpus.</blockquote>

That information about Picasso and Freud was suppressed by Nazi Germany is widely known by historians in Germany and in the US.  This claim is simply uninformed by research of the simplest kind. Like this from the National Gallery of Victoria:

<blockquote>Despite the fact that Picasso was regarded by the Nazi regime as a degenerate artist and Guernica had become a symbol of defiance against Fascism he remained free from persecution. At the time the Nazis were keen not to offend the U.S.A. and it was probably Picasso's widespread fame that protected him. However, he was denied publicity and prevented from exhibiting his work, resulting in his disappearance from the world stage.</blockquote> (http://www.ngv.vic.gov.au/picasso/education/ed_JTE_TWY.html)

Finally, kudos to Geoffrey Nunberg who says the following:

"If the available tools can be expanded beyond word frequency, “it could become extremely useful,” says Geoffrey Nunberg, a linguist at the University of California, Berkeley. “But calling it ‘culturomics’ is arrogant.” Nunberg dismisses most of the study's analyses as “almost embarrassingly crude.”

Thomas H. Luxon
Dartmouth College

On Dec 17, 2010, at 5:18 PM, alan horn wrote:

> Terrible journalism--they make a sensational claim ("dark matter") and
> fail to provide a single example to give some idea of what they’re
> talking about. Also, which dictionaries? Including technical
> dictionaries? Philological reference works?
> Nairba Sirrah, People sometimes use this list to discuss tangentially
> related topics of common interest. Sorry if this is not to your
> liking.
> _______________________________________________
> Milton-L mailing list
> Milton-L at lists.richmond.edu
> Manage your list membership and access list archives at http://lists.richmond.edu/mailman/listinfo/milton-l
> Milton-L web site: http://johnmilton.org/

More information about the Milton-L mailing list