Humanities Computing

How is computing used in humanities scholarship? How does information technology impact teaching and learning?
Topics include: Digital libraries, electronic publishing, scholarly communication, web remediation of humanities scholarship, etc.

Monday, January 31, 2005

typewriters, word processing, Google Desktop

Will we ever get away from the "word processor as typewriter"? Here's an article that ponders the effects of Google Desktop on writing and thinking...

NY Times, January 30, 2005**
**Tool for Thought **
* By STEVEN JOHNSON *

One often hears from younger writers that they can't imagine how anyone managed to compose an article, much less an entire book, with a typewriter. Kerouac banging away at his Underwood portable? Hemingway perched over his Remington? They might as well be monastic scribes or cave painters.

But if the modern word processor has become a near-universal tool for today's writers, its impact has been less revolutionary than you might think. Word processors let us create sentences without the unwieldy cross-outs and erasures of paper, and despite the occasional catastrophic failure, our hard drives are better suited for storing and retrieving documents than file cabinets. But writers don't normally rely on the computer for the more subtle arts of inspiration and association. We use the computer to process words, but the ideas that animate those words originate somewhere else, away from the screen. The word processor has changed the way we write, but it hasn't yet changed the way we think.

Changing the way we think, of course, was the cardinal objective of many early computer visionaries: Vannevar Bush's seminal 1945 essay that envisioned the modern, hypertext-driven information machine was called ''As We May Think''; Howard Rheingold's wonderful account of computing's pioneers was called ''Tools for Thought.'' Most of these gurus would be disappointed to find that, decades later, the most sophisticated form of artificial intelligence in our writing tools lies in our grammar checkers.

But 2005 may be the year when tools for thought become a reality for people who manipulate words for a living, thanks to the release of nearly a dozen new programs all aiming to do for your personal information what Google has done for the Internet. These programs all work in slightly different ways, but they share two remarkable properties: the ability to interpret the meaning of text documents; and the ability to filter through thousands of documents in the time it takes to have a sip of coffee. Put those two elements together and you have a tool that will have as significant an impact on the way writers work as the original word processors did.

For the past three years, I've been using tools comparable to the new ones hitting the market, so I have extensive firsthand experience with the way the software changes the creative process. (I have used a custom-designed application, created by the programmer Maciej Ceglowski at the National Institute for Technology and Liberal Education, and now use an off-the-shelf program called DEVONthink.) The raw material the software relies on is an archive of my writings and notes, plus a few thousand choice quotes from books I have read over the past decade: an archive, in other words, of all my old ideas, and the ideas that have influenced me.

Having all this information available at my fingerprints does more than help me find my notes faster. Yes, when I'm trying to track down an article I wrote many years ago, it's now much easier to retrieve. But the qualitative change lies elsewhere: in finding documents I've forgotten about altogether, documents that I didn't know I was looking for.

What does this mean in practice? Consider how I used the tool in writing my last book, which revolved around the latest developments in brain science. I would write a paragraph that addressed the human brain's remarkable facility for interpreting facial expressions. I'd then plug that paragraph into the software, and ask it to find other, similar passages in my archive. Instantly, a list of quotes would be returned: some on the neural architecture that triggers facial expressions, others on the evolutionary history of the smile, still others that dealt with the expressiveness of our near relatives, the chimpanzees. Invariably, one or two of these would trigger a new association in my head -- I'd forgotten about the chimpanzee connection -- and I'd select that quote, and ask the software to find a new batch of documents similar to it. Before long a larger idea had taken shape in my head, built out of the trail of associations the machine had assembled for me.

Compare that to the traditional way of exploring your files, where the computer is like a dutiful, but dumb, butler: ''Find me that document about the chimpanzees!'' That's searching. The other feels different, so different that we don't quite have a verb for it: it's riffing, or brainstorming, or exploring. There are false starts and red herrings, to be sure, but there are just as many happy accidents and unexpected discoveries. Indeed, the fuzziness of the results is part of what makes the software so powerful.

These tools are smart enough to get around the classic search engine failing of excessive specificity: searching for ''dog'' and missing all the articles that have only ''canine'' in them. Modern indexing software learns associations between individual words, by tracking the frequency with which words appear near each other. This can create almost lyrical connections between ideas. I'm now working on a project that involves the history of the London sewers. The other day I ran a search that included the word ''sewage'' several times. Because the software knows the word ''waste'' is often used alongside ''sewage'' it directed me to a quote that explained the way bones evolved in vertebrate bodies: by repurposing the calcium waste products created by the metabolism of cells.

That might seem like an errant result, but it sent me off on a long and fruitful tangent into the way complex systems -- whether cities or bodies -- find productive uses for the waste they create. It's still early, but I may well get an entire chapter out of that little spark of an idea.

Now, strictly speaking, who is responsible for that initial idea? Was it me or the software? It sounds like a facetious question, but I mean it seriously. Obviously, the computer wasn't conscious of the idea taking shape, and I supplied the conceptual glue that linked the London sewers to cell metabolism. But I'm not at all confident I would have made the initial connection without the help of the software. The idea was a true collaboration, two very different kinds of intelligence playing off each other, one carbon-based, the other silicon.

IF these tools do get adopted, will they affect the kinds of books and essays people write? I suspect they might, because they are not as helpful to narratives or linear arguments; they're associative tools ultimately. They don't do cause-and-effect as well as they do ''x reminds me of y.'' So they're ideally suited for books organized around ideas rather than single narrative threads: more ''Lives of a Cell'' and ''The Tipping Point'' than ''Seabiscuit.''

No doubt some will say that these tools remind them of the way they use Google already, and the comparison is apt. (One of the new applications that came out last year was Google Desktop -- using the search engine's tools to filter through your personal files.) But there's a fundamental difference between searching a universe of documents created by strangers and searching your own personal library. When you're freewheeling through ideas that you yourself have collated -- particularly when you'd long ago forgotten about them -- there's something about the experience that seems uncannily like freewheeling through the corridors of your own memory. It feels like thinking.

//

/Steven Johnson is the author, most recently, of ''Mind Wide Open.'' His new book, ''Everything Bad Is Good for You,'' will be published in May.
/

/http://www.nytimes.com/2005/01/30/books/review/30JOHNSON.html?ei=5070&en=b7b8fb7c34744540&ex=1108184400&pagewanted=print&position=
/

Thursday, January 06, 2005

Google, searching, how much data, microsoft

Will Microsoft challenge Google in the search wars? In the article "What's Next for Google," Charles H. Ferguson discusses the possibilitites http://www.technologyreview.com/articles/05/01/issue/ferguson0105.asp?p=1

Whoever wins the standards/architecture battle will win the search war. Microsoft has deep pockets and a record of winning this type of battle. However, it doesn't always win (ex. Adobe, esp. PhotoShop) and it has become a bit of a slow moving behemoth.

" Thus, while Google provides an ex­cellent service for searching the public Web and has made a good start on PCs with Google Desktop (the hard-drive search tool) and Google Deskbar (which performs searches without launching a browser), the search universe as a whole remains a mess, full of unexplored territories and mutually exclusive zones that a common architecture would unify."

“Microsoft effectively disbanded the Internet Explorer group after killing Netscape,” [an anonymous MS exec] said. “But recently, they realized that Firefox was starting to gain share and that browser enhancements would be useful in the search market.” He agreed that if Microsoft got “hard-core” about search (as Bill Gates has promised), then, yes, Google would be in for a very rough time. "

" Why? Because in contrast to Microsoft, Google doesn’t yet control standards for any of the platforms on which this contest will be waged—not even for its own website. Although Google has released noncommercial APIs—which programmers may use for their own purposes, but not in commercial products—until recently, it avoided the creation of commercial APIs." It may feel it does not need to. The author believes this would be a mistake. Or, it may feel they are not the most important concern. " There is, however, another possibility: Google understands that an architecture war is coming, but it wants to delay the battle. One Google executive told me that the company is well aware of the possibility of an all-out platform war with Microsoft. According to this executive, Google would like to avoid such a conflict for as long as possible and is therefore hesitant to provide APIs that would open up its core search engine services, which might be interpreted as an opening salvo. The release of APIs for the Google Deskbar may awaken Microsoft’s retaliatory instincts nonetheless. For Google to challenge Microsoft on the desktop before establishing a secure position on the Web or on enterprise servers could be unwise. "

"Google should first create APIs for Web search services and make sure they become the industry standard. It should do everything it can to achieve that end—including, if necessary, merging with Yahoo. Second, it should spread those standards and APIs, through some combination of technology licensing, alliances, and software products, over all of the major server software platforms, in order to cover the dark Web and the enterprise market. Third, Google should develop services, software, and standards for search functions on platforms that Microsoft does not control, such as the new consumer devices. Fourth, it must use PC software like Google Desktop to its advantage: the program should be a beachhead on the desktop, integrated with Google’s broader architecture, APIs, and services. And finally, Google shouldn’t compete with Microsoft in browsers, except for developing toolbars based upon public APIs. Remember Netscape.

When Google’s Peter Norvig was read this list—presented not as recommendations, but as things that Google would do—he did not deny any of it. "

" Whether Google or Microsoft wins, the implications of a single firm’s controlling an enormous, unified search industry are troubling. First, this firm would have access to an unparalleled quantity of personal information, which could represent a major erosion of privacy. Already, one can learn a surprising amount about ­people simply by “googling” them. A decade from now, search providers and users (not to mention those armed with subpoenas) will be able to gather far more personal information than even financial institutions and intelligence agencies can collect today. Second, the emergence of a dominant firm in the search market would aggravate the ongoing concentration of media ownership in a global oligopoly of firms such as Time Warner, Ber­telsmann, and Rupert Murdoch’s News Corporation."

"If the firm dominating the search industry turned out to be Microsoft, the implications might be more disturbing still. The company that supplies a substantial fraction of the world’s software would then become the same company that sorts and filters most of the world’s news and information, including the news about software, antitrust policy, and intellectual property. Moreover, Microsoft could reach a stage at which its grip on the market remains strong, but its productivity falls prey to complacency and internal politics. Dominant firms sometimes do more damage through incompetence than through predation."

"Indeed, as so many have noted, much of Microsoft’s software is just plain bad. In contrast, Google’s work is often beautiful. One of the best reasons to hope that Google survives is simply that quality improves more reliably when markets are competitive. If Google dominated the search industry, Microsoft would still be a disciplining presence; whereas if Microsoft dominated everything, there would be fewer checks upon its mediocrity."


And here's an interesting chart from the article that describes where data is stored:



Monday, January 03, 2005

McKiernan: Wiki bibliography

"WikiBibliography is devoted to significant articles, presentations, reports, as well as audio and video programs, Web sites, and other key "publications" about Wikis in general and their select applications and uses"

WikiBibliography is compiled and maintained by Gerry McKiernan, Science and Technology Librarian and Bibliographer, Science and Technology Department, Iowa State University Library

http://www.public.iastate.edu/~CYBERSTACKS/WikiBib.htm

WikiBibliography is a companion resource to SandBox(sm): Wiki Applications and Uses, a categorized registry of select applications and uses of wikis.
http://www.public.iastate.edu/~CYBERSTACKS/SandBox.htm