Rereading Writing

Even if there is no actual conscious thought behind documents generated by current language models, they do get things right a lot, and are very often able to successfully simulate passages that feel as if they were written by a human. If you read one of these documents not knowing that it was written by an algorithm, you can “read” it in the grammaleptic sense described by Cayley, generating language from that text even though there was never any actual intention or meaning there before. This happens when the algorithm was able to successfully predict which surface forms of text were most likely to synthetically evoke language in human readers. The voice that comes from these texts might be a bit perverse or unnatural, but I would argue that it still does count as language once it is grasped or understood by someone. In these cases, the reader creates language from nothing when they are able to read and understand a text that was “written” by an algorithm.

As these computer-generated documents become less and less coherent, things get even more interesting. Take this sentence, for example:

In the West, time for comparison between a savage or shining personality and an encyclopedic knowledge of a subject is greatly devalued, both in the sense that it is mistaken to assume that everything in literature is gory, and in the sense that it is merely another example of the myth of time.

What happens in a case like this completely depends on how much faith a reader might have that the author of the text they are reading is a human who is trying to communicate a message; that there is ideality to be found somewhere in the text, which could just be more difficult to grasp at first. If you don’t know it’s a possibility that there is no author when reading these sorts of documents, you may (or may not) try a lot harder to read into the given text. You could even be successful in gleaning some sort of voice from it, which again could be considered a form of language, however unnatural and warped. This was something I had to do as an annotator on a daily basis—even if a document seemed unintelligible at first, I still had to at least try to grasp some sort of meaning from it before giving up, because to not try at all would have been risking marking a human-written document as algorithm-written just because I personally couldn’t understand what that person might have been trying to say.

In these incoherent cases, are humans still creating language from nothing? Is this “language” fundamentally different than the voice we hear when we read more coherent AI-generated documents?

next ->