The Hypertext Revolution

Espen Aarseth
(2003)

http://www.educ.fc.ul.pt/hyper/resources/eaarseth.htm

What is hypertext?

Some years ago, an email list for PhD students writing dissertations on hypertext discussed how to define the concept. A number of possible definitions were put forward and debated, but it did not seem possible for the group to reach consensus. In the end, a bright student asked the question, “But why are we trying to define hypertext”? An even brighter student came up with the reply, “Because we need it for our dissertations.”

The W3 organization, a primary source and non-profit developer of web standards, merely defines hypertext in this negative way: “Text which is not constrained to be linear”[1] Since very few texts are literally linear, since texts such as this one is at least two-dimensional, and most texts can be accessed in a far more liberal way than a radio transmission or a TV broadcast, the definition used by W3 tells us next to nothing. Perhaps it is not important for them to come up with a precise definition?

The Microsoft Encarta World English Dictionary defines hypertext as “a system of storing images, text, and other computer files that allows direct links to related text, images, sound, and other data.” This is certainly more sophisticated than the local newspaper in my home town Bergen, which once defined hypertext simply as “information marked in blue.” Still, the Encarta definition seems to make hypertext dependent on the computer, and on “direct links,” a phrase that is not defined further.

It might be pertinent, or perhaps impertinent, to then ask, 1) does hypertext depend on computers? and 2) what is a direct link? In this essay, hypertext is not taken for granted as a material technology, but rather seen as a fluid, ill-defined concept with unclear borders towards neighboring concepts such as electronic textual communication, experimental literature, printed books, digital games and entertainment, information retrieval, and, in principle, any method of storing text. Hypertext is an ideological concept, the “next big thing,” the desirable alternative to the status quo. This also means, of course, that “true” hypertext can never be achieved, but must forever remain just out of reach, a technological utopian condition of perfect communication.

The Web Without Hypertext

These days, the prevalence of the World Wide Web is so strong that it is often confused with the Internet itself. Despite being different technologies, or at least different levels of technology, the Internet is often used as a synonym for WWW: We talk of “internet pages” when we mean web pages, and we say “internet browser” when we mean web browsers. It is easy to forget the non-web sectors of the internet, such as email, USENET, IRC, MUDS, etc; especially when “the web” can be used to access them all.

The “web,” of course, exists in at least two different and independent technological levels: an interface technology (browsers, servers and search engines, defined by http and other protocols and plugins) and a means of organizing information and content structures (html, xml, JavaScript, SQL/PHP etc). It is quite possible to use components from one layer without using the other: Browsers can be used to retrieve non-html text or data from a file server or via a search engine, and html can be used by standard word processors such as MS Word. Also, http (hypertext transfer protocol) is only one of a number of protocols supported by modern browsers.

So, what is at the core of the web, if not http or html? Perhaps there is no core at all? Well, there is, in fact, a third web “technology” that is arguably more central and indigenous than the content structures or the interface protocols:

Unlike web data formats, where HTML is an important one, but not the only one, and web protocols, where HTTP has a similar status, there is only one Web naming/addressing technology: URIs.[2]

The Uniform Resource Identifier (URI), formerly known as the Uniform Resource Locator (URL), is perhaps the single most important and innovative aspect of the web, the one “technology” without which the web would not exist. URIs like www.w3.org are easily mistaken for links, but they are not. We might call them the contents of links, but they do not have to be enclosed in link codes to do their work. They identify resources, even when they appear on paper. In this regard, they are not really that different from other unambiguous identifiers, such as ISBN or telephone numbers. They point, but they don’t act. In fact, contrary to W3’s argument above, it should be pointed out that URIs are not exclusive to the web, but can be used to identify other internet resources as well.

Arguably the first type of URI was a convention eventually developed by users of ftp (file transfer protocol), a standard for copying files across the Internet developed by Vint Cerf and other Arpa/Internet pioneers around 1971. (Incidentally, adventure game inventor William Crowther, working for the consultancy BBN, was one of them). This convention consisted of a simple naming structure where the name of the server, pathname and file name would be listed as a string of text, e.g. “louie.udel.edu:portal/mh-6.6.tar.Z”. Since this kind of identifier at first existed in many variants (e.g. “[server]<path>file.ext”) it is not quite correct to call it a properly uniform resource identifier, but during the 80s a standard naming convention developed, using a single text string which contained the server name, the path name and the file name.

A URI, such as http://www.w3.org, is not a hypertext link, but could be used as the content of one. Furthermore, it is possible to imagine the web without hypertext, but not the web without URIs. When Tim Berners-Lee created the World Wide Web in 1989, he invented a more systematic and rigorous scheme for specifying the identifier, including the file- or resource-type, e.g. ftp://, http://, etc. The browser model he created allows two ways of using URIs: either by entering the URI string manually from the keyboard, or by clicking on the “links” embedded in documents.

As a thought experiment, imagine a browser that would only allow the first method of entering URIs manually. The URIs would be embedded in the documents, but they would not respond to clicking. In this cumbersome way, the web could still be accessed, and would therefore still exist. Would such a system still be regarded as hypertext? Probably not, since it would be functionally identical to ftp, which was never viewed as hypertext. But the web itself would not be changed, only our method of accessing it. So the web requires more than just the URI structure to be recognized as hypertext: it also requires a browser that interprets embedded URIs as activatible links. While the URI scheme certainly lends itself to hypertext, it does not define or comprise it, any more than a page number would. Neither is it a necessary condition for hypertext, since other conventions (e.g. automatically generated links) or schemes could be used instead. The hypertext element must be located in the browser, which allows the user to exchange one position in a document for another position in the same or in another document.

The web browser, however, especially in its modern variety, does not restrict itself to file retrieval, or even to hypertextual file retrieval and “navigation.” Browsers can also be used to chat, access USENET Newsgroups, and even play games, typically programmed in Java or JavaScript. They can also be used to engage a reverse form of URI, an act of communication known as email. An email address is a type of uniform resource identifier which is not used to pull, but rather to push information. Other that that, an email address follows the same principle of non-ambiguous pointing which standard URIs do, and this is yet another indication that the URI principle is more general than, and not contingent on, hypertext.

If there is a principle of hypertext (which we here might define as the automation of access to predefined document positions, and not as a general information retrieval interface) then it would seem that this is a relatively minor function of the world wide web. The web’s foremost function is indeed to reference and retrieve other documents, as a type of global, fully automatic library/librarian. But other technologies, like ftp, gopher, and, let us not forget, libraries, have already done this, while most hypertext systems before the web did not. Most importantly, most documents on the web are not internally hypertextual, and most are read in a traditional manner. The web is a gigantic step forward as a global, general-purpose document access system, but it has so far not turned reading and writing into radically different practices of non-sequentiality.

Is Hypertext revolutionary?

In the early stages of digital text editing (1960-1990), a number of techniques, tools and ideas were developed and explored. These “tools for the mind” projected an optimistic view on the writing process: that it could be significantly helped by novel forms of technologically imposed structure. Hypertext, as invented by Ted Nelson, was merely one among many such approaches; in addition there was the word processor, various linguistic “workbench” programs that checked syntactic structure and suggested improvements, and not least, the idea outliners, programs that would allow the easy rearrangement of text documents grouped into points and subpoints. The only eminently successful of these technologies was the word processor, and to some extent it subsumed all the other technologies, even hypertext. The current version of Microsoft Word, for instance, has an outline view, checks grammar, and even allows users to insert cross-references, which it also calls “hyperlinks” and which can be used to access other part of the word processor document.

Even if these formerly experimental methods now are available in standard software, however, it is a safe bet that they are seldom used. (When was the last time that you put a “hyperlinked” cross reference into an MS Word document? When did you last use the outline view?) These methods, once believed to have revolutionary effects on reading and writing, has receded into the background of standardized text production. We still produce the same type of sequential text as we did a hundred, if not a thousand years ago. The typewriter, arguably, had a stronger, more revolutionary impact on personal text production than the word processor, since it for the first time allowed personal “desk-top printing.”

So far, Ted Nelson’s revolutionary brain-child, hypertextual reading and writing, appears to be a semi-unsuccessful invention; an attempt at changing the world that in its most radical aspects has failed. Instead, Tim Berners-Lee took Nelson’s ideas in a different direction, and created a highly successful global information interface on top of the Internet’s transfer control protocol (tcp), but one which falls severely short of Nelson’s original, ambitious vision.

On the web, perhaps the one innovative genre to date that makes full use of hypertext is the “web log”, also known as “blog”. A blog is usually a personal, public diary published by an individual, and updated frequently with dated journal entries containing comments and observations and links to the commented material. Many blogs allow readers to comment on the entries, and some also use a feature known as a “traceback”, which automatically lists other blogs that have linked to an entry. This linking back feature is very much in line with Nelson’s original vision, where two-way linking was an important aspect of hypertext, but one not implemented by Berners-Lee in standard html.

Blogs constitute a fascinating sub-network on the web, where people link to each others’ comments and have passionate exchanges of opinion. Given phenomena like Usenet News and web based discussion groups, this alone is nothing new, but the fact that their writing is under their own editorial control, and the use of hypertext links to connect to the other bloggers’ (as they are called) arguments and to earlier, archived comments, make the blog phenomenon perhaps the strongest cultural implementation of Nelson’s vision yet.

It may still be too early to pass judgment on the cultural success of hypertext, since cultural changes are much slower than technical innovations. Typically, it took fifty years after Gutenberg’s innovation for books to evolve from manuscript simulations into the artifacts we know today. Perhaps Nelson’s idea of nonsequential writing will be picked up by a generation that reads most of its texts online, and for whom print on paper will seem quaint and ornamental, the way stone inscriptions seem to us. However, this generation is already actively here, and the texts they use are digital and interactive in a way Nelson did not imagine. Today, the written language of our youngest generation is shaped not by hypertext but by SMS (short message services) that is the completely unexpected success of digital mobile telephony (GSM). Here again, we find a form of URI (the mobile phone number) that unambiguously identifies its target, and which is used in a textual torrent of messages that previously was only associated with email and chat. The linguistic codes of this medium (e.g. CUL8R for “See you later”) spills over into other text genres, much to the dismay of teachers and parents. But the sociolinguistic success of the SMS medium tells us that here is a real change in writing practice and indeed in the history of writing, which, unlike radical hypertext, and in a much shorter time, already has taken place.

Most probably the mode of writing and reading that hypertext was supposed to structure will survive, not as a dominant mode on the web and elsewhere, but as one among many forms; some older, some yet to come.


--------------------------------------------------------------------------------

[1] http://www.w3.org/Terms

[2] “Naming and Addressing: URIs, URLs, ...” http://www.w3.org/Addressing/