Are We There Yet?; The Pursuit of the American Dream
This piece serves as a kind of open-ended barometer of the state of mind of the U.S., “measuring” how close we are today to our understanding of the American Dream.
The piece makes this assessment based on the headlines of the day on the New York Times. The applet was trained on the NYTimes headlines from January 2009, and I classified each headline as evidence that we’re living the American Dream (“good”), or as evidence that the American Dream is not a reality yet (“bad”). TF-IDF was used to find the most important words of each headline and to compare to the day’s headlines.
Everyday the headlines are downloaded and classified by finding the labels of the nearest neighbors (most similar training headlines). The ratio of “good” vs “bad” headlines is displayed as a line across the digital canvas (up = American Dream). Currently the applet is hard coded to use the headlines of the presentation day (March 17th). The code for finding the current date was found, but there wasn’t time to put the date into the needed form.
The open-ended appearance intends to encourage viewers to interpret the messages “not there yet” or “almost there” according to their own beliefs and ideals.
Interesting ways to grow:
– create a web widget for NYTimes where people would get to rate headlines, maybe with a scale. This would multiply the training data received and “sharpen” the classifier. Also the classifier then would reflect a collective notion of the American Dream instead of only mine.
– Visualize significant words that are associated or in conflict with the American Dream
The classifier works in the sense that it is generating labels, but it’s not very meaningful. Several problems are occurring:
– NYTimes data has many bugs, so lots of data is getting lost
– The training data is very biased because it is so limited in time (1 month) and it’s mostly 90% “bad”
– The headlines were chosen as the material to observe because each word in a headline is very carefully chosen but it might be that it’s not enough data to accurately predict the content of the piece. An alternative is to use the body of the article instead.
Download zip here: http://joanaricou.com/nytimes_tutorial4.zip