Final Project...

A Portrait of the Text as a Young Line-Fingerprinting Language

Our initial studies in the Mediascapes studio involved creating/mapping profiles. As such, I thought it would be of interest if one could create a visual profile of a particular text-a kind of fingerprint if you will...

1. Initial Grid-I designed a grid of the English alphabet according to letter frequency - most frequent letters placed in the middle of grid and least frequent expanding outward with the letter "Z" placed in the corner.
initialgrid.jpg



2. Initial Mapping
The Processing code parses any text letter by letter creating lines in space (based on my grid settings) mapping letter proximities. Each word then would have its own particular two-dimensional geometry. For example as the code parses through the text, its mapping of the word "the" would consist of a line from "T" to "H" and then from "H" to "E".
_the.jpg



3. Kafka's Metamorphosis
I started by making a mapping of Kafka's Metamorphosis text in English as a primer. By coding in a transparency setting to the geometries created by the text parser, one can determine how often letters are connected to one another. The picture below, shows the first mapping of Kafka's text. Notice that the heavy lines tend to be near the center of the mapping-verifying that "E, T, A, O, and I" are the most common used letters in the English language. What this particular grid fails to show however (due to poor initial grid settings) is just how often for example, "L" is before or after "T, E, I, or U" as all of these particular letters have the same X=cordinate. The same holds true for "F, S, O, H and V" and other groupings; a similiar problem occurs for letter groupings that contain the same Y-coordinate.
meta_eng_grid1.png



2. Second Grid-Informed by the initial grid design, this particular grid reflects a move away from "the grid", so as to be able to visualize the letter connections more clearly.
secondgrid2.jpg



3. Kafka's Metamorphosis V2.0
Same text, new grid. A much more precise fingerprint can be made of the text. We observe that the pairings of T/H, H/E, E/R, S/E, and H/A occur frequently. In other words, the word "the" is used a lot.
me_180.jpg



4. Ernest Vincent Wright's Gadsby
Notice the heavy reliance on the "N-A, N-I, N-O, R-O, T-A, T-I" pairings... A much more distributed network of lines reveals a glaring linguistic geometry; the novel does not contain one instance of the letter "E".
gad180.jpg



4. Kafka's Metamorphosis in Portuguese.
What initially started out as a desire to visualize a text, now turns into a potentially interesting project. It can be hypothesized that given a fingerprint like the one above one can determine what language a given text (not revealed to the audience) is written in. I chose to translate Kafka's text to Portuguese-known for its consonant pairings-similiar to Polish. I expected less density in the E, I, O, U vertices...
me_port180.jpg
While my initial hypotheses was not necessarily correct, the letters in the outer ring "V", "Q", "B", "Z", "F" and "X" (consonants) do appear with much more frequency than in the previous English mappings.



5. 2 X 4
Two languages, four texts. The top row is English and the second row is Portuguese. The texts from left to right are, Metamorphosis, Gadsby, Huckleberry Finn and The Raven. A pattern emerges...
grids2.jpg



6. The English/Portuguese Fingerprint
The four English and Portuguese texts are superimposed on top of one another.
english.jpg

portuguese.jpg

composite.jpg
This is by no means a scientific analysis, but rather a cursory attempt to map the frequency of letters in a language visually. More data can be collected (i.e. more texts can be used), a better grid can be programmed (perhaps a dynamic one-the grid collapsing towards the letters which are used more frequently), and of course the visualization itself can be changed (perhaps colored boxes instead of black lines, etc.)...