[Milton-L] Humanities and Technology

Shoulson, Jeffrey jshoulson at mail.as.miami.edu
Sun Dec 19 20:45:15 EST 2010

I was talking with my brother, a computer scientist who did graduate work in search algorithms and who also has a very strong interest in linguistics, about the story Tom Luxon called our attention to earlier this week.

He'd seen it as well and had done some poking around on the new Google search function on his own.  Like Tom and others, he noticed how the algorithm does not take account of different word forms or cases in its statistical analysis.  He also pointed out that because the scanned texts come from books printed as early as the 16th and 17th centuries and because typography has evolved (and was often a bit spotty), there's a lot of misrecognition in the database.  The example he gave is illuminating:

He searched for the word "clone" to see when it came into frequent use.  He was shocked to discover a high frequency of the word in the 17th century.  When he went and looked at some of the hits, he saw that what the computer thought was "clone" was really "done"--the d had a slight imperfection or space within it that made it look like a cl to the scanner.

Just imagine if Macbeth has said "If 'twere clone when 'tis clone, 'twere bettter 'twere clone quickly."  A whole new play might have emerged...


Jeffrey S. Shoulson, Ph. D.
Associate Professor of English and Judaic Studies
University of Miami
PO Box 248145
Coral Gables, FL 33124-4632

(o) 305-284-5596
(f) 305-284-5635

ON LEAVE, AY 2010-11
Katz Center for Advanced Judaic Studies
University of Pennsylvania
420 Walnut Street
Philadelphia, PA 19106

(o) 215-2381290, ext. 413

jshoulson at miami.edu<mailto:jshoulson at miami.edu>

More information about the Milton-L mailing list