Text Sound Art : A Survey
Richard Kostelanetz

(1980, William Morrow)


The art is text-sound, as distinct from text-print and text-seen, which is to say that texts must be sounded and thus heard to be "read," in contrast to those that must be printed and thus be seen. The art is text-sound, rather than sound-text, to acknowledge the initial presence of a text, which is subject to aural enhancements more typical of music. To be precise, it is by non-melodic auditory structures that language or verbal sounds are poetically charged with meanings or resonances they would not otherwise have. The most appropriate generic term for the initial materials would be "vocables," which my dictionary defines as "a word regarded as a unit of sounds or letters rather than as a unit of meaning." As text-sound is an intermedium located between language arts and musical arts, its creators include artists who initially established themselves as "writers," "poets," "composers," and "painters" in their text-sound works, they are, of course, functioning as text-sound artists. Many do word-image art (or "visual poetry") as well, out of a commitment to exploring possibilities in literary intermedia.

The term "text-sound" characterizes language whose principal means of coherence is sound, rather than syntax or semantics - where the sounds made by comprehensible words create their own coherence apart from denotative meanings A simple example would be this "tongue- twister" familiar from childhood:

If a Hottentot taught a Hottentot tot to talk 'ere the tot could totter, ought the Hottentot to be taught to say ought or naught or what ought to be taught 'er?

The subject of this ditty is clearly neither Hottentots nor pedagogy but the related sounds of "or' and "ought," and what holds this series of words together is not the thought or the syntax but those two repeated sounds. It is those sounds that one primarily remembers after hearing this sentence read aloud. As in other text-sound art, this language is customarily recited in a voice that speaks, rather than sings. Thus, the vocal pitches are non-specific.

The first exclusionary distinction then is that words that have intentional pitches, or melodies, are not text-sound art but song. To put it differently, text-sound art may include recognizable words or phonetic fragments, but once musical pitches are introduced, or musical instruments are added (and once words are tailored to a pre-existing melody or rhythm), the results are music and are experienced as such. Secondly, text-sound art differs from "oral poetry," which is syntactically standard language written to be read aloud. These exclusions give the art a purist definition, I admit, but without these distinctions, there is no sure way of separating text-sound art, the true intermedium, from music on the one side and poetry on the other.

The firmest straddles I know are the records made by a changing group of New York blacks calling themselves "The Last Poets," whose lead voice chants incendiary lyrics to the accompaniment of pitched background voices and a rapid hand drum, which seems to influence verbal rhythm (rather than vice versa, to repeat a crucial distinction), and Philomel (1963), by Milton Babbitt and John Hollander, where the text is syntactically fragmented and aurally multiplied in ways typical of sound poetry, but the sounds in most of the work are specifically pitched, rather than unpitched.

"Text-sound" is preferable to "sound poetry," another term for this art, because I can think of work whose form and texture is closer to fiction or even essays, as traditionally defined, than poetry.

One issue separating work within the art would be whether the sounds are primarily recognizable words or phonetic units. Pieces with audible words usually have something to do with those words, which are meant to be perceived as certain words, rather than as other words. Poems without recognizable words are really closer to our experience of an unfamiliar (i.e., "foreign") language. An example is this passage from Armand Shwerner's The Tablets (1971):

min-na-ne-ne Dingir Eri-lil-ra mun-na-nib-gi-gi
uzu-mu-a-ki dur-an-ki-ge

Such words need not be "translated," because the acoustic experience of them is ideally as comprehensible to one culture as to another.

"Morse Code" is not text-sound art, even though it communicates comprehensible words to those who know its language; it is a code whose rhythm cannot be varied if communication is to be secure

In my opinion, the better work in text-sound art emphasizes identifiable words, rather than phonemes, but it would be foolish, at this point, to establish blanket rules about the viability of this or that material.

One could also distinguish pieces which are performed live from those which can exist only on electronic recording tape, those which are multi-voiced (and thus usually canonical in form) from those which are uni-voiced; those which are texts composed exclusively of words from those which add scoring instructions; those which involve improvisation from those which can be repeated with perceptible precision

Though superficially playful, text-sound art embodies serious thinking about the possibilities of vocal expression and communication, it represents not a substitute for language but an expansion of our verbal powers.

One major factor separating present work from past is the text-sound artist's increasing consciousness of the art's singularity and its particular traditions.


Though text-sound art is, in its consciousness of its singular self, a distinctly new phenomenon, it has roots in the various arts it encompasses. On one hand, it extends back to primitive chanting which, one suspects, was probably developed for worship ceremonies. One extension of this tradition is non-melodic religious declamation in which the same words are repeated over and over again, such as Hebrew prayers which are spoken so rapidly that an observer hears not distinct words but repeated sounds. (Harris Lenowitz calls them "speed mantras.") Modern text-sound art also reflects such folk arts as the U.S tobacco auctioneer's spiel, the evangelical practice of "speaking in tongues," and Ketjak, The Ramayana Monkey Chant, in which several score Indonesian men rapidly chant in and out of the syllable "tjak." (This last, which is available on a Nonesuch record, is a masterpiece of the art.) To Charles Morrow, a contemporary practitioner, these folk text-sound arts exemplify "special languages for special communication." However, one critical difference between these precursors and contemporary practitioners is that the former do not consider themselves "artists."

In the history of modern music, text-sound art draws upon an eccentric vocal tradition, epitomized by Arnold Schoenberg's Sprechgesang, in which the singing voice touches a note but does not sustain the pitch in the course of enunciating the word. In practice, this technique minimizes the importance of musical tone (and, thus, of melody) and, by contrast, emphasizes the word. One measure of this shift in emphasis is the sense that language in Sprechstimme is usually easier to understand than that in music. This technique also appears in Chinese and Korean opera, which may have influenced Schoenberg, and in German cabaret singing, which probably did. Survivors of the latter include Ernst Toch's Geographical Fugue (1930), which is composed of place names spoken in overlapping rhythms, and the patter-song, in which words are spoken while instruments play melody in the background (e g., in My Fair Lady, "I've grown accustomed to her face. . ")

In visual arts, text-sound work draws upon the development of abstraction, or non-representational art, and the initial figures in adapting this aesthetic idea to language were Wassily Kandinsky and Kurt Schwitters. The writer Hugo Ball, himself a prominent practitioner, said in a 1917 lecture that Kandinsky, in his book Der gelbe Klang (1912), was the first to discover and apply the most abstract expression of sound in language, consisting of harmonized vowels and consonants." Schwitters, a Dadaist like Ball, created an imaginary, nonrepresentational, aurally coherent language for his ambitious Ursonate (1922-32), which opens:

Fumms bo wo taa zaa Uu,


kwii Ee.


Dll rrrr beeeee bo

Dll rrrr beeeee bo

rrrr beeeee bo fumms bo,

rrrr beeeee bo fumms bo wo

And he was probably the first to appropriate a musical structure for a totally verbal work. Moholy-Nagy, another sometime visual artist who was also the first perceptive historian of text-sound art, describes Schwitters's masterwork, whose title Moholy translates as "primordial sonata," as "a poem of thirty-five minutes duration, containing four movements, a prelude, and a cadenza in the fourth movement. The words do not exist; rather they might exist in any language ' they have no logical only an emotional context ' they affect the ear with their phonetic vibrations like music."

Within the conscious traditions of modern poetry, text-sound art has a much richer history. Contemporary work initially reflects the neologisms that Lewis Carroll incorporated into syntactically conventional sentences, as in the Jabberwocky, the invented words implicitly minimizing meaning and emphasizing sound.

'Twas brillig, and the slithy toves,

Did gyre and gimble in the wabe:

All mimsy were the borogoves,

And the mome raths outgrabe.

Historical precursors in continental literature include the German poet Paul Scheerbart, whose most notable (and untypical) poem opens, "Kikakoku!//E kora laps!" (1897) or the German poet Christian Morgenstern, whose "Das Grosse Lalula" (1905) opens:

Kroklokwafzi? Semememi!


Bifzi, bafzi hulalemi:

quasti bast; bo…

Lalu lalu lalu lalu la:

In "Zang-Tumb-Tu-Tumb" (1921), Filippo Tommaso Marinetti, initially a poet, invented onomatopoeia to portray the sound of weapons and soldiers. "flic flak zing zing sciaaack hilarious whinmes iiiiiii … pattering tinkling 3 Bulgarian battalions marching croooc-craaac…" Hugo

Ball's most famous poem (1915):

gadji beri birnba

glandricli, lauli lonni cadori

gadjama him beri glassala

glandridi glassala tuffm i zimbrabim

blassa galassasa tuffm i mimbrabim

meant to realize a universal language, exemplified the phonetic-unit poetry of such pioneer Dadaists as Raoul Hausmann and Richard Hulsenbeck.

In Russian literature just before the Revolution, Alexei Kruchenyk created a fictitious language, which he called zaum (a contraction of a longer phrase, zaumnyi jazyk, which can best be translated as "transrational"). Kruchenyk's most audacious manifesto declared, "The word is broader than its meaning." His colleague in Russian futurism, Velemir Klebnikov, by contrast, favored recognizable words for his nonsyntactic poems, rationalizing that "the sound of the word is deeply related to its meaning." In the 1920s, the Frenchman Pierre Albert-Birot added footnotes to specify how his neologisms should be pronounced. He is also credited with the profound adage: ''If anything can be said in prose, then poetry should be saved for saying nothing."

In American literature, the most prominent precursors are Vachel Lindsay, a troubador eccentric, whose most famous poem, "The Congo" (1914), emphasizes heavy alliteration and such refrains as "Boomlay, boomlay, boomlay, boom", and e.e cummings, whose second poem in Viva (1931) begins:

oil tel duh woil doi sez

dooyuh unners tanmih eesez pullih nizmus tash, oi

In American prose, the preeminent precursor is, of course, Gertrude Stein, who wove prose tapestries based upon repetition, rather than syntax and semantics: "In saying what she said she said all she said and she said that she did say what she said when she was saying what she said, and she said that she said what she said in saying that she said and she was saying what she said when she said what she said." ("Two Gertrude Stein and Her Brother," written 1910-1912). One successor to Stein, in Post-WWII American literature, was Jack Kerouac, not in his most famous books, to be sure, but in short prose pieces like "Old Angel Midnight," which initially appeared in the opening issue of Big Table (1959),

Spat- he mat and tried & trickered on the step and oostepped and peppered it a bit with long mouth sizzle reaching for the thirsts of Azmec Parterial alk-lips to mox & bramajambi babac up the Moon Citlapol-settle la tettle la pottle, la lune-Some kind of-Bong!

What unifies this collection of semantically unrelated words is, of course, the repetition of sounds not only in adjacent words but over the paragraph, but one quality distinguishing Kerouac from Stein is that, at least to my ears, the former sounds more literary.

In English literature, the principal progenitor of contemporary work is, of course, James Joyce's polylingual, neologistic masterpiece, Finnegans Wake (1939), which is, incidentally, like Stein's work, closer in form and tone to "prose" than "poetry


One post-WWII development that had a radical effect on text-sound art was the common availability of both the sound amplifier and the tape recorder, and these two technologies together did more than anything else to separate "contemporary" endeavors from earlier "modern" work. That is, after 1955, a verbal artist, now equipped with sound-tuning equipment, could change the volume and texture of his microphone-assisted voice, he could eliminate his high frequencies or his lows, or accentuate them as well as adding reverberation. By varying his distance from the microphone and his angle of vocal attack, he could drastically change the timbre of his voice. With recording technology, the language artist could add present sound to past sound ("overdub"), thereby making a duet, if not a chorus, of himself. He could mix sounds, vary the speed of tape, or change the pitch of his voice. More important, he could also affix on tape a definitive audio interpretation of his own text. By expanding the range of audio experience, these new technologies also implicitly suggested ways of non-technological innovation. As Bob Cobbing judged, ''Where the tape recorder leads, the human voice can follow"

Several Europeans established themselves in the 1950s, each developing a characteristic style. Henri Chopin, a Frenchman presently living in England, records his own vocal phonetic sounds which are then subjected to several elementary tape manipulations, such as overdubbing and speed-changing, usually producing an abrasive aural experience that reminds me less of other text-sound art than John Cage's fifties music for David Tudor. Since Chopin starts not with a verbal text but with a limited range of specified vocables, and then electronically manipulates these initially vocal sounds in ways that disguise their human origins, his work is perceived as music, rather than as text-sound art- more precisely, as a "musique concrete" that uses only natural sounds if only to acknowledge its authors professional origins in poetry, perhaps this might better be classified as sound-text or, as Chopin himself calls it, "poesie sonore" (poetic sound), as distinct from sound poetry.

Francois Dufrene, also a Parisian, is best known for is '' cri- rhythms," which is his term for his art of extreme, hysterical human sounds (rhythmic cries). As Bob Cobbing describes them, these pieces ''employ the utmost variety of utterances, extended cries, shrieks, ululations, purrs, yarrs, yaups and cluckings, the apparently uncontrollable controlled into a spontaneously shaped performance." A piece like Crirhythme pour Bob Cobbing (1970) - the best of the several I have heard - sounds so extraordinary on first hearing that one can scarcely believe a single human being is producing such audio experience, even with the aid of microphones. Perhaps Dufrene's text-less art is really a species of vocal theatre, to introduce yet another categorical distinction.

Bernard Hiedsieck, also a Parisian, works, by contrast, with recognizable words, either spoken emphatically by himself, or collected on the street and off the radio. These words are edited into rapidly paced, rhythmically convulsive aural collages which not only join language with non-verbal noises but also combine linguistic materials not usually found together. His term for this work is "poesie action", and several examples strike my ears as mixing a newscaster or other loud-speaker voice with a more intimate narrator (apparently Heldsieck himself) against a background of miscellaneous noises.

Though his works appear to satirize or editorialize about current events, their syntax is essentially collage, which, though once extremely fertile and also conducive to audiotape, has by now become hackneyed. Nonetheless, Hiedsieck's pieces are more charming than Chopin's or Dufrene's, as well as considerably richer in audio- linguistic texture. Of those I have heard, my favorite is Carrefour de la Chaussee d'Antin (1973).

Another member of the Parisian scene, the Englishman, Brion Gysin, favors linguistic permutations, as with I Am That I Am. All the possible combinations of these five words are then subjected to speeding, slowing and / or superimposition. The verbal text for this work appears in Brion Gysin Let the Mice In (1973), and the audio version, made at the BBC in 1959, is reproduced on the initial Dial-A-Poem record (1972). An intimidating audiovisual rendition of both the text and tape is included in my Camera Three-CBS television program, Poetry To See & Poetry To Hear (1974). I Am That I Am is one of the indisputable classics of textsound art.

Among the other notable contemporary European text-sound artists are the Englishman Bob Cobbing; the Scotsman Edwin Morgan; the Belgian Paul de Vree; the Czech Ladislav Novak; the Frenchmen Gil J. Wolman and Jean-Louis Brau; the Austrian Ernst Jandl; several Swedes associated with Stockholm's Fylkingen group (including Bengt Emil Johnson, Sten Hanson, and Bengt af Klintberg), and the Germans Ferdinand Kriwet and Hans G. Helms. Kriwet has edited U.S. news broadcasts of both the 1969 moonshot and the 1972 American political campaigns into first-rate English- language audio collages and Helms wrote Fa:m' Aniesgwow (1958), a pioneering book-record which resembles Finnegans Wake in realizing linguistic coherence without observing consistently the vocabulary of any particular language. More specifically, through attentiveness to the sound of language, Helms creates the illusion of a modern tongue:

Mike walked in on the : attense of Chiazzus as they sittith softily sipping sweet okaykes H-flowered, purrhushing 'eir goofhearty offan-on-beats, holding moisturize'-palmy sticks clad in clamp dresses of tissue d'arab, drinks in actionem fellandi promoting protolingamations e state of nascendi, completimented go!scene of hifibrow'n…

The most interesting of the others, in my experience, is Jandl, a Viennese high school teacher of English, who works exclusively in unaided live performance (the pre-WWII way), declaiming published phonetic texts, mostly in German but sometimes in English, which are usually inventive in form and witty in language. In New York, Spring 1972, he did an exceptional performance of a long poem, "Teufelsfalle," which also appears in his book, Der Kunstliche Baum (1970). "Beastiarim," the last piece on his record, Laut und Luise (1968), is a vocal tour-de-force. However, in part because of his anti-technological bias, Jandl's work seems to terminate a style, rather than suggest future developments. IV
The key issue dividing North American text-sound practitioners from their European counterparts is the use of electronic machinery, for native text-sound art at its best is either more technological or less technological than European. In the first respect, the text-sound artist uses either multi-tracking, sound-looping and microscopic tape-editing to achieve audio tape effects that technically surpass European work. The principal figures here are Steve Reich, Charles Amirkhanian, Glenn Gould, Charles Dodge, Jerome Rothenberg-Charles Morrow, John Giorno, and myself. The other strain of American text-sound artists consists of those who have largely avoided electronic machinery, except of course to record themselves in permanent form: John Cage, Jackson Mac Low, Norman Henry Pritchard, W. Bliem Kern, Bill Bissett, Emmett Williams, Charles Stein, Michael McClure, and the Four Horsemen, a Canadian group.

Steve Reich studied music composition with Luciano Berio and Darius Milhaud before using language to explore the compositional idea of modular variation. Essentially, a limited phrase, or module, whether musical or verbal, is repeated in a gradually changing way; and with overdubbing, a phrase played at one speed can interact with the same phrase played at another speed, sometimes producing a pulsating sound. Reich's earliest verbal work, It's Gonna Rain, was composed in San Francisco in January 1965. As the artist remarks on the record jacket, "The voice belongs to a young black Pentecostal preacher who called himself Brother Walter. I recorded him along with the pigeons one Sunday afternoon in Union Square in downtown San Francisco. Later at home I started playing with tape loops of his voice and, by accident, discovered the process of letting two identical loops go gradually in and out of phase with each other." That is, the two loops begin in unison; but because of mechanical imprecision, they gradually move completely out of phase with each other and then progressively back into unison, the words in relation to each other creating their own serendipitous rhythms and melodies. The first part of this piece realizes an incantatory intensity without equal in audio language art, as the phrase "It's gonna rain" is repeated into a chorus of itself. At one point, for instance, while one track of the tape has the entire phrase, another has only a pulsing "rain"; at later points, "it's a" becomes a ground bass for the aural assemblage. All this repetition of a few words, needless to say, intensifies the invocatory meanings. At times, It's Gonna Rain sounds like the I ndonesian monkey chant, except that Reich has used electronics to do the aural work of a hundred men; as machine-assisted art, his work exists only on audiotape or record.

The second part of this piece is less dense than the first, and the words are less comprehensible, especially as the language disintegrates into an obscure belching sound. Reich's other recorded text-sound piece, Come Out (1966), suffers from the same hysteria as It's Gonna Rain; the language disintegrates into a puzzling, sweeping sound that goes on too long. As Reicb describes his compositional technique, "The phrase' come out to show them' was recorded on both channels, first in unison and then with channel 2 slowly beginning to move ahead. As the phrase begins to shift, a gradually increasing reverberation is heard which slowly passed into a sort of canon or round. Eventually the two voices divide into four and then into eight." It is the first work, rather than this, which is Reich's text-sound masterpiece.

The earlier works of the San Francisco text-sound artist, Charles Amirkhanian, reflect Reich's influence. A musician who took his BA in literature, Amirkhanian steeped himself in both contemporary composition and high-quality tape recording as "Sound Sensitivity Information Director" (a.k.a., "Music") at KPFA, the Pacifica foundation radio station in Berkeley. In 1971, he produced If In Is, which he characterized as "an eleven-minute tape based on strong rhythmic patterns created through the repetition of three words (inini, bullpup, banjo) arranged in phrases on separate tape loops and played simultaneously on multiple tape machines." When the same words aurally coincide, a pulsingá sound is produced, much as in Reich's modular art; and this pulse becomes a ground bass for continually varying aural-verbal relationships. A similar compositional technique informs Just (1972), which is the best individual piece on the record anthology 10 + 2: 12 American Text Sound Pieces (1975).

In 1973, Amirkhanian developed a more characteristic way of textsound working. Essentially, he takes recorded material and then cuts apart the tape in various ways, so that sentences or even words are broken in the middle, or the beginning of one sentence is spliced or overlaid in the middle of its predecessor, or key words are repeated in varying proximities to each other, or a single voice is multiplied into a duet or chorus of itself. On the 10 + 2 anthology is Heavy Aspirations (1973), which is based on the musicologist Nicholas Sionimsky's lecture on "The Revolution in Twentieth Century Music." From a tape of the whole, Amirkhanian extracted Sionimsky's characteristic phrases and speech-patterns. These are aurally repeated, as the tape moves between doctored sound material and straight transcription, abruptly shifting from one kind of material to another, and from one rhythm to another. Amirkhanian even dwells on Sionimsky's reference to Just and "textsound" art (which he defines as "words alone"). Though Heavy Aspirations is as mocking in detail as its title suggests, the whole is endearing (and appropriate as a 79th birthday present for its subject). Another tighter, better effort in this style is the autobiographical Roussier (not Rouffier) (1973), which ingeniously takes apart the simple phrase, "Charles Amirkhanian, a composer of Armenian extraction:' against a background of his earlier text-sound pieces for four full minutes. Both looping and overlaying come together in Seatbelt, Seatbelt (1973), Amirkhanian's single greatest piece, and perhaps the greatest single text-sound work ever produced in North America. It opens with a male voice regularly repeating the paired words of the title, and then varying the rhythm, as the voice is divided over two tracks and the sibilants become more emphatic. Then one voice repeats "seat" while another says "belt:' each proceeding at its own rhythm. Then two different voices say "seatbelt" at different speeds as more voices enter, saying, in normal speaking voices, either "seat:' "belt:' or "seatbelt." Perhaps all five acknowledged performers are now speaking. Suddenly, the chorus shifts to "chung chung quack quack bone" in unison, and then to" cryptic cryptic quack quack" before dividing into two groups, one pair saying the first sequence, the second pair the second sequence. Arrangements like this continue for nearly fifteen minutes. At one point, all the voices say "quack" in different tempi, their rhythms sometimes coinciding; and the piece runs out with two voices saying "quack quack" at the same pace as the initial "seatbelt seatbelt." This piece is dense and witty and ingenious; it is utterly non-representational of anything except itself and, of course, the innovative powers of human imagination.

The Canadian pianist Glenn Gould created a minor masterpiece of text-sound tape editing in the course of something elsea radio documentary on people who live in Canada's northernmost territories. Entitled The Idea of North (1967) and commissioned by the Canadian Broadcasting Company, this program opens with a woman saying, "I was fascinated by the country as such. I flew north from Churchill ... " Forty-five seconds later, a male voice enters, saying something different-less appreciative and more cynical about the Canadian northwhile the first voice continues undistracted in its characteristic manner. Thirty seconds later, a second male voice enters, saying something yet different. There is perhaps a third male voice in this fugue, all of them articulating themes that are elaborated later in the documentary. The voices change in relative volume, so that one or another predominates at various times, as in a musical fugue; and then they appear to blend evenly into each other, so that one hears not individual expository lines but repetition of the key word "north." And then all four voices slowly fade out, ending this tour-de-force. Gould produced a se<.undtext-sound fugue for a later radio documentary, The Latecomers (1969), which deals with Newfoundland; but perhaps becausE'the voices enter too quickly on each other, and there is no key word to connect their talking together, this later example of" contrapuntal radio," as Gould calls it, sounds comparatively jumbled and pointless.

Charles Dodge has developed a singular text-sound art which others value highly, but I find immature. H is forte is computer-assisted speech synthesis. The most useful description of his extraordinary compositional procedure appears, curiously, not on the single record of his own text-sound works, Synthesized Speech Music (1976), but in the notes to the Amirkhanian anthology:

The computer speech analysis/synthesis technique involves recording a voice speaking the message to be synthesized, digitizing (through an analogue-to-digital converter) the speech, mathematically analyzing the speech to determine its frequency content with time, and syre thesizing the voice (speaking the same passage)from the results of the analysis. On synthesis, any of the components of the analysis (e.g., pitch, speech rate, loud~ss, formats) may be altered independently of the others. Thus, using synthetic speech (unlike manipulation of tape recording) one may change the speed of vocal articulation without changing the pitch contour of the voice (and vice versa).

This procedure requires so much awesome technical competence that it is perhaps gratuitous to note that little of value comes from it. It is true that Dodge can create various voices, both male and femalea testament to his virtuosity, but they sound more like each other than anyone (or anything) else. In the background are non-vocal (or nonpseudo-vocal) pitched sounds that have the obvious aural defect of resembling the vocal ones, and the work at times suggests that Dodge is creating an alternative universe with a single, all-pervasive Dodgian aural style. Then, the voices sing on pitch some of the time, pushing Dodge's art into song; but these singing voices lack the charm (albeit likewise synthetic) of, say, Walter Carlos's Moog-generated chorus on The Wel/Tempered Synthesizer (1970). Dodge draws his language from some trendy poems by Mark Strand, but since there is no perceptible relation between the language and the audio technique, the latter seems as arbitrary as Dodge's freely atonal pitches; and if there is a complementary system, nothing in the commentary suggests a key. Technological invention is so valuable in contemporary art that I am reluctant to dismiss Dodge's work completely, but since the technique itself is suggestive, I bope he knows how far he has to go. A far more successful electronic text-sound adaptation of a poetic text is Charles Morrow's Sound Work (1968), which is based on "The Beadle's Testimony" in Jerome Rothenberg's Poland/1931 (1974). Morrow, as "sound designer" (his own term), reorganized the one-page text so that all its words were grouped with each other-all "the's:' all "jewel's:' all "wall's," and so forth were together. He invited Rothenberg to record these separate lists. Morrow then took the isolated words and spliced them back into the proper order of the original poem, producing, in effect, a tape of Rothenberg reading "The Beadle's Testimony" in a stunning, emphatic style that would be impossible in live performance and probably inconceivable without the example of the tape. Both Rothenberg and Morrow have recently done Amerindian chanting which, to repeat my initial distinction, is not text-sound art but theatrical song.

My own work arose from an invitation to be guest-artist at WXXI-FM in Rochester, New York; and though I had not worked in a sound-studio before, I brought along some of my more experimental verbal texts. The medium, I discovered, lends itself to my truncated (or minimal) fictions, in part because radio is a much faster medium than live performances. For that reason, the same one-word paragraphs of, say, "Milestones in a Life" or "Plateaux:' which seemed terribly rushed in live performance, find a more appropriate temporal format on audiotape. For "Excelsior:' which is a dialogue between two single-word speakers, I used stereo distribution of my voice to enhance the aural experience.

With my own more elaborate experimental texts, the medium offered unforeseen possibilities. Recyclings (1974), for instance, is a non-syntactic prose piece composed from earlier essays of mine. Essentially, I took my own prose and subjected it to a reworking procedure that kept the language but destroyed the syntax. Each earlier essay of mine was reduced to a single page of new, recycled text. The first 64 pages (of 192) were published as a book which can be read vertically and diagonally just as feasibly as it can be read horizontally. To reproduce this ...isual experience aurally, I hit upon the technique of reading each page of Recyclings horizontally, then adding new voices that read the same text a few seconds behind. The result is a non-synchronous canon where words relate to each other in several directions simultaneously. (It also exists on videotape, where the imagery is visually suspended pairs of my lips.) A second non-syntactic text of mine, "The Declaration of Independence:' likewise employs an eight-track recording machine to create an amateur Presbyterian chorus of myself (amplified differently on each track), this time reading (or trying to read) the same text in ragged unison. Since the text is the historic Declaration of Independence read backwards, the ironies multiply as one hears familiar locutions reversed.

After tentative beginnings with a record on which he did not speak at all, Raspberry & Pornographic Poem (1967), John Giorno has become a consummate performer of his own texts. His technique, which has developed considerably in the past decade, consists of chopping apart a prose sentence, so that its words are repeated in different linear arrangements, with different line-breaks, and then duplicated in adjacent columns:

There is
There is
There is nothing
There is nothing there
There is nothing
There is nothing there

Giorno turned to electronic technology for a single capability-echoingso that he need not say the left-hand column (it could be electronically reproduced as a faint replica of his initial voice), thereby increasing the potential for after-sound analogous to the "after-image" of the visual arts. The principal development in his text-sound artistry has primarily been a complication in the echoing. In his sides of the two-record John Giorno / William Burroughs (1975), Giorno developed a double echo that could be varied in quality, becoming more reverberant (and re-echoed) at times and more distorted at other times. The double echo increases not only Giorno's self-replication, which appears to interest him, but also the audiographic impact of his statements. All this technique notwithstanding, Giornds work is built not upon isolated words but upon whole phrases; it depends for coherence not upon sound but syntax, semantics, and prose narrativeall the traditional baggageto evoke his macabre vision. I ndeed, his recent collaboration with Burroughs becomes an implicit acknowledgment of the literary origins of Giorno's sensibility. To be precise, this is not text-sound art at all, but inventively amplified poetry (which is thus more acceptable to "poetry" circles); and that recognition perhaps explains why genuine text-sound work is so sparsely represented in his anthologies. Other Americans making electronic text-sound art include Alvin Lucier, whose I Am Sitting in a Room (1970) begins with him reading a 100-word prose statement which is recorded on tape. The recorded version is then played in the same space in which the original statement was made and recorded on tape at one remove from the initial live statement. This procedure of broadcasting and re-recording is continued through several generations, as feedback progressively obliterates the text that paradoxically becomes less audible. It becomes, thanks to repetition, more familiar. Francis Schwartz's Score-Painting for Julio Cortazar (1974) is a bi-Iingual visualization of an allusive visual text that overdubs the author's voice, saying various things at various speeds, about his subject.


John Cage, one of the key figures in non-electronic American textsound art, has curiously also been a pioneer in electronic music, with tape compositions dating back to his Williams Mix of the early fifties. His text-sound works have consisted largely of his rather formal, unemphatic readings of his own mostly non-syntactic texts. Whereas several earlier Cage pieces incorporated spoken language, such as the funny narratives of Indeterminacy (1958), or the aleatory words that happened to be on the twelve radios in Imaginary Landscape IV (1952), Cage began in the seventies to make works composed exclusively of language; and these turned out to be as structurally non-climactic and non-hierarchic as his musical work. Mureau (1970), the first in this series, is based upon Henry David Thoreau's remarks about music, which Cage then scrambles, via I Ching processes, into a mix of syllables, words and phrases. The result is a verbal pastiche in which one can perceive references to music and nature (and thus to Thoreau's characteristic vocabulary).

Cage has since progressed, as he always does, to a yet more severe language mix that he calls Empty Words. This might best be characterized as a progressive reduction of material from Thoreau. Cage's own typically technical description is useful here: "Part II: A mix of words, syllables, and letters obtained by subjecting the Journal of Henry David Thoreau to a series of I Ching chance operations. Pt. I includes phrases. Pt. III omits words. IV omits sentences, phrases, words and syllables; includes only letters and silences."

The live performance of part IV that I heard Cage do in New York (Spring 1975), could be characterized as the most extreme presentation of its kind. Whereas most text-sound art is much faster than spoken language, this was much, much slower. Indeed, the smallest phonetic fragments, succinctly spoken by Cage, were separated by multi-second silences. Musically, the piece seems an extreme extension of Anton Webern or Morton Feldman. More precisely, it is a kind of inferential art whose impact depends upon the audience's contextual awareness of the work's origins and purposes.

Though initially known as a poet, Jackson Mac Low studied music composition with John Cage in the late fifties and even composed the accompanying music to the 1960 Living Theater production of his play, The Marrying Maiden. Much of Mac Low's live text-sound art reflects Cage's aesthetic influence, particularly in allowing his performers spontaneous choices within pre-defined constraints. Most of his live pieces are "simultaneities:' which is his term for performances that involve more than one voice. In the sub-set of pieces he calls "Matched Asymmetries" (since 1960), several performers are given a multi-part text and asked to read the verbal material at a pace and volume of their own choice, each of them reading the available parts in a preassigned order different from the others. Ideally, the performers should generate individual rhythms and articulations, as well as interacting inventively with each other. An example of this sub-set is the "Young Turtle Asymmetries," which was published as both a record and a text in the eighth issue of Aspen (1969). Here the aural experience is primarily that of five voices repeating the same words and elongated letter-sounds at different timbres and times. The verbal material is then subjected to an aleatory process that Mac Low calls "through acrostic chance generation." In another sub-set of scores that he calls "Numbered Asymmetries:' each of the performers has a completely different text, and the auditory experience is more unrelievedly chaotic.

A third kind of Mac Low score is the "Vocabulary:' which is a noncentered diffuse visual field containing words composed exclusively from the particular letters in a subject's name (e.g., "Sharon Belle Matt lin," "Peter Innisfree Moore"). To declaim these, Mac Low customarily recruits a motley chorus, whose members are instructed to say spon taneously whatever words from the score they wish, at whatever volume a.ndwhatever durations, with whatever pauses. A fourth related strain is the "Gatha:' which is a collection of related words densely written on graph paper, one letter to a square, in a single direction (i.e., vertical, horizontal, or diagonal). Performers are instructed to read the letters in a geometrical path, which may be horizontal, vertical and diagonal, thus producing letter-sounds, phonemes, syllables, words and neologisms. Again, the aural experience is that of occasional repetition and general cacaphony. A fifth kind of live piece is the "Word Event:' where the performers improvise on a single, multi-syllabic word, like "environmentally." They are instructed to take this word apart, uttering letters or phonemes and then words drawn from the letters of the initial word (e.g., ellen, ten, leer, tee, toe, nelly, etc.). From such limited material, Mac Low and his collaborators have been known to spin pieces lasting over one-half hour. He sometimes performs a simultaneity against a background tape of a previous performance (or two or three). Mac Low's best text-sound pieces, however, are not the live ones to which he devotes more of his attention, but his fewer primarily electronic works.

Most of these were realized during 1973-74, when Mac Low had access to the New York University Composers' Workshop. For Threnody for Sylvia Plath (1973), he took tapes of Paul Blackburn, Diane Wakoski, Sonia Sanchez, Gregory Corso and Tom Weatherly reading their own poems. Using a battery of tape machines, he fed selections from these tapes simultaneously into a single second-generation mono tape. Sections from this initial Mac Low tape were then fed non-synchronously into both tracks of a stereo tape (the third generation). Thus, while passagesfrom the live reading were repeated, they related to each other in continuously different ways. Here, too, the aural experience is that of repetition within chaos, and the most memorable sections mix Diane Wakoski and Sonia Sanchez in an inadvertent duet. In "Counterpoint for Candy Cohen" (1973) Mac Low explores tapetechnique possibilities even further. A single announcement of twodozen words, spoken by a concert emcee named Candy Cohen, is repeated with irregular pauses to make an initial tape which is then transferred continuously, one channel at a time, onto a four-track tape, which thus has four separate channels of non-synchronous repetition of the initial verbal material. (All the close echoing at this generation is reminiscent of Giorno.) Then, this tape is itself transferred continuously onto each track of a two-track machine, which then has eight different tracks of the same repeated announcement. Then, this tape is transfered onto each track of the initial four-track machine which thus produces a tape with 32 tracks of sound. This fourth-generation is two-tracked into 64 tracks, which is then four-tracked into 256 tracks. As the final piece incorporates all stages in the incremental process, what we hear is the progressive complication of the initial material (two-dozen words and a pause) through several distinct generations into a verbally incomprehensible, but rhythmically pulsing chorus. The experience is extraordinary, and it is perhaps the culmination of Mac Low's interest in non-synchronous repetition.

The major device of Norman Henry Pritchard's pioneering text-sound art is repetition of the same phrase, so that something other than the original phrase results. In the only conveniently available recorded example, "Gyre's Galax" (1967), the phrase "above beneath" is rapidly repeated with varying pauses between each line. (The reader repeating these words rapidly to himself will get a faint sense of the effect.) The same device informs "Visitary:' which appears in Pritchard's principal collection, The Matrix Poems: 1960-70 (1970). One part of this poem reads as follows:

Dewinged wings
Dewinged wings
wings dewinged
Dewinged wings
wings dewinged
wings dewinged
dewinged wings

Lamentably, Pritchard ceased active publishing around 1971, and his work has not been included in any of the surveys, recordings or exhibitions of language art.

Pritchard's student, W. Bliem Kern, a visual poet as well as a textsound artist, tends to do aural renditions of his visual texts. His printed text ranges from rather "straight" poetry, which is undistinguished, to visual texts of words and letters in page-space to poems that mix familiar with unfamiliar words, the former becoming semantic touchstones for the latter.

psom enu how ek anu
time was prom
enu how ek anu time was
prom enu how ek anu
time was

And yet other poems are entirely in a fictitious language that Kern calls "00100." Whereas most text-sound work is temporally static, Kern's pieces often have an underlying narrative progression. This becomes more pronounced in his long poem, "Dream to Live:' which narrates in words, phrases and phonemes the end of an affair. In the cassette tape accompanying his only book, the piece is movingly read through various kinds of material; and I would classify the piece as "fiction" more than a poem and, as text-sound fiction, an exemplification of its kind. Kern's texts are written to be performed; for whereas most text-sound artists want to create autonomous linguistic structures, Kern's avowed purpose is the communication of personal feeling. "In writing:' he declared in a 1973 manifesto not included in his book, "i am exploring the oral world of non-linear phenomena, the inner speech, the dialogues with myself as a child before i am also concerned with feelings and translating the visual into the verbal."

Bill Bissett is a Canadian poet who taped his visually idiosyncratic texts for a record that accompanies his book, Awake in th Red Desert! (1968). His principal technique is emphatically repeating a single phrase, like that of the title, which remains as it is, rather than, as in Pritchard and Kern, becoming something else. Too many pieces on this record have musical instruments that are unnecessary, if not detrimental; for the record is as widely uneven, and as critically challenging, as Bissett's motley books. One of the most suggestive texts in the book is "o a b a:' which closes:

sheisa sheisa sheayisa heisasheisa saheis sasheisaheisa sheisa
cumisa cumisa th heart isa cumisa isa cumisa cumisa heisa shes

However, in his record, Bissett imposes a rhythm on the words, rather than letting them suggest their own rhythm; and the result sounds inept and unconsidered. In another work, the phrase "supremely massage" is variously repeated as a ground bass, while a lead voice reads an erotic prose text. Perhaps the most wholly successful audio poem is the simplest, which opens:

it be it so be so
it so be so it so
be so it so be so

And this, unlike other Bissett, is as perfect on the record as it is in the book.

Emmett Williams is, like Bissett, a various and inventive experimentalist; but unlike Bissett, he works sparingly, producing only a few works in each direction he pursues. The best text-sound piece is "Duet:' which appears both in his Selected Shorter Poems (1975) and on the initial Diala-Poem record (1972). It opens with every second line in boldface type,

art of my dark
arrow of my marrow
butter of my abutter
bode of my abode
cope of my scope
cu rry of my seu rry

becoming a sequence of sweetly archaic internal rhymes that ends:

ye of my aye
y of my my
zip zap zoff of my o zip of zap of zoff
zim zam zoom of my o zim o zam o zoom

The third anthology record from Giorno Poetry Systems, Biting off the Tongue of a Corpse (1975), closes with a gem by Charles Stein, "A Seen Poem:' which opens:

rage judge raga
mad judge rage
a mad judge rages
a raga rides
a raga judges
a rug
a jug

It evokes several internal rhymes within a few words and is, needless to say, delightfully comic.

Another older poet who publishes texts that he also declaims is Michael McClure; the works collected in Ghost Tantras (1969) tend to mix syntactically conventional phrases with guttural sounds. The Four Horsemen consist of four Canadians of independent literary reputation who came together in early 1970 to jam, much as freelance jazz-men do. Bp Nichol, perhaps the most prominent, has published works in several styles, both avant-garde and "trad:' as he calls it. Steve McCaffery is a younger writer, London-born, who also collaborates with Nichol in a criticism-combine called The Toronto Research Group. Paul Dutton and Rafael Barreto-Rivera I know only from the record; the latter speaks English with an audible Spanish accent. Their initial textsound works were collected on a record called Canadada, which is undated. The best piece here is a fugue, entitled" Allegro 108:' which opens, "Ben den hen ken len men pen ken fen men yet:' with one voice chanting alone on a single note. Then a second voice enters, chanting non-synchronously at first but then in unison with the initial voice, as a third voice enters, chanting separately at first, as before, but then in unison, as the fourth voice enters. The piece develops a steady emphatic rhythm, as the voices are clearly accustomed to working with each other. I take "Allegro 108" to be the most persuasive example of the possibilities of leaderless text-sound collaboration.

Other North Americans doing interesting live text-sound work include Armand Schwerner, whose great long poem, The Tablets (1967 to the present), incorporates a multitude of techniques, both traditional and advanced, typically including both word-imagery and text-sound; Toby Lurie, whose prosey statements make sentimental appeals; Ernest Ro~ son, who has developed a sophisticated method for notating vocal techniques in his syntactically conventional texts; Geoffrey Cook, whose "Jabberwocky" is a modest gem; Beth Anderson, whose "If I Were a Poet" sensitively exploits repetition of choice phrases; Henry Rasof, who prefers a non-syllabic poetry closer to the European example; Peter Harleman, who produced the periodical record Out Loud; A.F. Caldiero, a powerful performer of vocables both pitched and unpitched; Lawrence Weiner, a well-known conceptual artist who has done records of gerunds in two languages; Dick Higgins, whose "Glasslass" exploits the sibilants that others try to avoid; and Larry Wendt, who creates long, ambitious pieces that I find less interesting than the remarkable prose notes accompanying them.

Of course, text-sound is an open art. There are many roads to be explored, many virgins to be seduced, many alternatives to be re-thought, many combinations to be discovered. I suspect as well that there are many more North Americans working independently, unaware not only of what their colleagues are doing, but also of how their own works might be "distributed." In a situation like this, a newcomer could become (and be considered) a major artist quite rapidly. Also, whereas sophisticated Europeans tend to regard text-sound as a familiar form, with an established canon of prominent practitioners, it is open terrain in America; and this perhaps accounts for why American work is already. more varied than European.


Text-sound art, it is clear, is interesting and consequential-it is a distinct artistic category, with a small army of practitioners; but the greatest threat to its survivalnot to speak of its developmentis, simply, its unavailability. If the reader of this essay wanted to hear Amirkhanian's Seatbelt, Seatbelt, for example, the only way he could satisfy his or her curiosity (or challenge my critical judgment) would be to write Amirkhanian himself, asking the artist for a copy; and if he wrote back that he was reluctant to go through the rigamarole of getting the master from a safe storing place, and then lining up two machines for a dubbing (and that he wanted fifty dollars for the tape copy), no one could blame him. Copying audiotapes is neither as easy nor as cheap as copying manuscripts. One reason why the work of Tony Gnazzo is not discussed in this essay is that Gnazzo wrote that he was, not unreasonably, tired of making copies, even for likely supporters, such as myself.

What is needed at the beginning, of course, are selective anthologies, not only to make everyone aware of what is being done, but also to prompt current practitioners to move onto something else. For another thing, it might force artists to make individual pieces more various; too much work so far is based upon a single audio idea, which is introduced at the beginning and then sustained to the piece's conclusion. Except for the ones mentioned earlier, there are no more anthologies of North American work. Some text-sound art has appeared in the periodical Black Box and on the Giorno Poetry Systems records, but no one subscribing to either of these publications can expect a steady stream of text-sound gems. (The former's publisher has announced a cassette periodical devoted exclusively to experimental work, but nothing has yet appeared.) In Europe, the government-funded radio stations take responsibility for the creation and programming of text-sound work; but here, no public radio station, aside from WXXI-FM in Rochester, has supported the art, while literature directors of National Public Radio have never been interested. I have myself written to the larger record companies proposing to edit and introduce a text-sound record; but none of them has accepted my offer. One possible route for American work would involve public funding, but here the new, intermediumistic art becomes a round peg, unable to fit the square holes of funding agencies. Since the program director of NEA's literature department cannot accept visual poetry as "literature:' there is no reason to believe he will be any more accepting of sound poetry; and music departments are often reluctant to accept text-sound art as "music composition."

Until records and various printed materials become readily available, North American text-sound will remain a private art that will have public existence only in second-hand forms, such as this essay; and that unavailability becomes, to be frank, an example of de facto censorship that is no longer tolerable.

