The members of National Federation of Advanced Information Services are no strangers to complexity. Speakers at this year's Annual Conference swapped stones about how to deal with whatever happens next. Evan Sandhaus, director for search, archives, and semantics at The New York Times, said that for the past 8 years, he has been focused on the venerable newspaper's archive, comprising more than 15 million articles, most of which are in the form of scanned images with no text equivalents. Kalev Leetaru, Yahoo fellow-in-residence at Georgetown University discussed data mining of vast content archives. He reminded attendees that while there are more than 6 million ebooks in the Internet Archive, half a millennium's worth of images remains buried in them, since classically, they have identified articles and books but not the images they contain.
Reproduced with permission of the copyright owner. Further reproduction or distribution is prohibited without permission.