Final Visualizations

Finally we decide to go with two visualizations:

  • SpamCloud: The interesting or playful one
  • SoundCloud RADAR: The informative one

We believe both work well in unison with each other!

This was our final set-up for DemoDay:


SoundCloud is the world’s leading social sound platform where anyone can create sounds and share them. SoundCloud claims that their commenting system connects artists and fans and brings them closer together.

But is this premise actually fulfilled?

We analyzed comments posted on “Petal to the Maxx” by K.L. for the month of April and found that spam and SoundCloud user promotion (users posting links to their SoundCloud profiles) vastly outnumber actual comments. In reality, any real connection that could arise between artist and fans is instantly blocked by BS comments pushing real commenters out of view in the comment section.

SpamCloud is our visual representation of this circumstance.

Hovering over a comments displays the name of the commenter, the posting time on the song and the text of the comment.

SoundCloud RADAR

SoundCloud RADAR is an attempt to understand comments posted on SoundCloud. It is an exploratory tool that helps in finding and understanding patterns of commenting and spamming.

The visualization shows comments posted on “Petal to the Maxx” by K.L for the month of April. As they are posted in time, comments appear on the radial 24 h clock/calendar, revealing patterns in posting time and posting behaviour, but also highlights unusual, suspicious clusters of comments.

All comments are colour coded according to 3 categories:

  • Regular Comments
  • SoundCloud (SC) User Promotion
  • Spams

Lines connect comments that are posted by the same user, bigger circles indicate multiple comments of the same category at the same minute.

A small histogram on top of the music progress bar shows the number of comments posted at a particular time in the song, which is a key feature of SoundCloud’s commenting system. The bottom of the screen displays the ratio of regular comments to spam.

Hovering over a comment highlights it (and all connected comments) and displays the name of the commenter, the posting time on the song and the text of the comment.
To filter comments according to their category, click on the comment categories in the key.
Hovering over the progress bar histogram highlights the comments that were posted at that particular moment in the song.

Visualization no.2

Screen recording of the (pretty much) finished visualization no.2.

Since we hit a wall when working on the other visualization, we decided to try and finalize this visualization. What still needs to be done in this one is writing the explanatory text and find understandable terms for the different comment types.

Bildschirmfoto 2016-05-18 um 23.32.55.png

(note: we were unable to record the sound in the screen recording, but if you want to enjoy it with the original soundtrack ☞

Another Visualization

Aside from the “bubbles” visualization we were working on another idea for showing our data, which would, as we hope, allow a easier exploration of the data.

To show not only the posting time on the song but also the actual posting date, we plotted the comments on concentric circles, each standing for a specific day. Their appearance in the visualization is dependent on the posting time on the song.
The comments are color-coded according to comment type and comments from the same user are connected by lines.

first functional try-out in processing ☟


It became apparent to us that most comments are posted in the second half of the day (12:00-24:00), as well that insert text here

We further iterated on the visuality and functionality:

color scheme try-outs ☟

Bildschirmfoto 2016-05-09 um 15.20.24.png01Bildschirmfoto 2016-05-09 um 15.27.15.png

try-outs on how to show that several comments are posted on the same point in time☟


highlighting and hover functionality ☟





First Functional Demo

A first try-out for the concept in processing: (8)

The comments stream in from the sides according to the time they are posted on the song itself and are attracted to the center of the screen.

First Improvements

Visual Encoding

In this test the size of the bubbles was random, we decided that we wanted the size to represent the length of the comments. We noticed that real comments tended to be pretty short, while spams were generally much longer, in a way cluttering up the comment space. The visualization should represent that.
To choose good breakpoints/sizes/ratios for the circle sizes, we looked at the distribution of the comment lengths and afterwards refined the results we got from that manually.

Bildschirmfoto 2016-05-04 um 14.41.05.pngBildschirmfoto 2016-05-09 um 13.56.07.png


We were also not quite happy with the bubbles concentrating in the center, so we tried out different movements:
(the bubbles are already roughly color-coded according to their category) (9).gif
Here spams (blue and green) comes in from the outside, like a foreign body, while the real comments (red) are generated inside the screen. The spams seem to bombard the real comments and disturb their formation. We felt however that it was difficult to follow the appearance of bubbles from two different locations in with different motions. (10).gif
Here the bubbles once again center in the middle, but like in the above version, spams fly in from outside, while comments are generated in the middle. This gives the appearance of the real comments being trapped in a fog of spam.
visual mock-up ☟

Bildschirmfoto 2016-05-09 um 13.23.21.png

This however posed the same problem as the above version, and didn’t quite result in the anticipated look and feel.

Ultimately we felt it worked best when comments/spams were coming in from opposing sides and then form clusters next to each other. This gives it a feeling of a struggle or fight and enables a quick comparison between the amount/size of both.


a couple of illustrator mock-ups ☟


We further added the actual comment text in the background, so it would be easier to understand what the visualization was about.

Bildschirmfoto 2016-05-09 um 14.55.20.png

Current State

color variations in the actual processing environment ☟


Current state of the visualization: (12)
After clusters have formed (11)

Furthermore we added additional features to allow a deeper exploration of the data:
stop/start on mouse-click
highlighting of comments from the same user on mouse-over
display of the actual comment text on mouse-over (13).gif



Visual Ideation and Data Encoding

After last weeks class we started again on the visual ideation for out new focus. Of the ideas we generated we decided to go with a visualization that would represent the comments/spams as a kind of “virtual audience”, a room where spams and comments would interact with each other, according to their type and properties.
We imagined that the spams would come in to disturb the real comments, which were happily and peacefully “dancing” to the music.


For this visualization we again scraped and encoded the data, this time with additional dimensions, which we might want to show in the final thing:

Bildschirmfoto 2016-05-09 um 13.49.42Bildschirmfoto 2016-05-09 um 13.48.41

Visualization Design according to Munzner’s Model


We decide to approach our visualization in more systematic way. We went back a step and applied Tamara Munzner’s model of visualization design to our data:
Tamara Munzner. Visualization Analysis and Design. CRC Press, 2014. [library ebook]

Bildschirmfoto 2016-04-17 um 16.02.06

Munzner divides the process of visualization into 3 steps:

  1. Data Abstraction – What type of data are you using?
  2. Task Abstraction – Why is the user looking at it?
  3. Idiom – How do you show the data, based on the data type and tasks?

Following these steps, we defined our data and tasks according to her terms.

1. Data Abstraction

First we categorized our data into items, which themselves have attributes which acn either be nominal, ordinal or quantitative. In our case our items are the (library)words from the comments, which have the attributes of
“word” (the actual english word e.g “awesome”, “song”),
“category” (grammar type, or maybe also affinity to a specific genre)
“time” (the point in time at which the word appears),
the derived attribute of “frequency”,
and also the text of the comment it originates from.
We did the same for the item “song” and finally classified all the attributes.

This left us with a nice and clear overview of the data we are actually dealing with and which requirements in visualization these data types might carry in them.

Bildschirmfoto 2016-04-17 um 16.03.16.png



2. Task Abstraction

Next, task abstraction. Task abstraction helps to define what the actual goal of the visualization is supposed to be. What do we want the user to do? What should he achieve?

Munzner gives a set of actions and targets, which are combined into target-actions pairs to formulate user tasks.

It helped us immensely to think about our goals in terms of target.action pairs. We defined our highest ranked task as “discover-features”, namely the “word-composition” of a song. The user should be able, looking at the visualization, to discover which words are used to describe the song, in which frequency they appear, at which point in time etc.

Other, secondary tasks are
“look up – distribution” (find a specific word and see it’s distribution in the song), “compare-distribution” (is the word frequency pattern for a specific song following the trend/distribution of the genre in whole? Could be separate, small visualization.)
and “enjoy”.

Bildschirmfoto 2016-04-17 um 18.49.05.png



3. Idiom

Based on the data types and tasks we defined in the first 2 steps we tried to choose a fitting visual encoding. Which “idiom” (visualization technique) suits our tasks, how can we “map” our attributes?

Through this though process we realized that the visualization types we had previously pivoting towards might actually not work with the tasks we wanted the user to fulfill. We therefore continued to ideate, now with Munzners model in mind, and narrowed down the possibilities to two idioms that we felt to be more appropriate:

  • Steamgraph
    visual example:
  • Unit chart
    visual example:
    Bildschirmfoto 2016-04-17 um 23.52.32.png

While the Streamgraph provides the user with a better overview of the distribution of the words (also in time) is better for comparison between different songs, and is visually more appealing (especially since it resembles the frequency visualization that is commonly used for music), the unit chart is better suited for the exploration of the data, as we could show the word in the context of the comment through interactivity. Also the attributes “word” and “category” would be more easy to map in the unit chart, since position could be used as a visual variable.



Afterthoughts / Questions

  • How do we handle the element of time in our visualization? We want the data to update as the respective song is playing, which means we have an element of animation. We are not sure however how to best present that. Should we mask the Streamgraph and reveal it when the song plays? Would that be too boring? Should it organically grow?
    Were and how do the markers in the unit chart appear? Do they pop up or come in from a specific time?
  • How can we make our visualization visually interesting enough to be enjoyable and novel enough for the user without it being totally arbitrary and unreadable?
  • The unit markers give us an opportunity to play with – can we do something “fancy” with them?

Working 21 – 31 March

Working on data acquisition and clean-up.

We started to scrape some test data (comments), clean, sort and rank the included words through Python to create our “library” of words used in music comments.

It became apparent that we would need to further clean/filter the data. We made a list of stop words ([..] stop words are words which are filtered out before or after processing of natural language data (text). […] stop words usually refer to the most common words in a language […] source: wikipedia) by combining several stop word list which we found online. Next we created a “spam filter” list, to exclude words that originated from spam comments. After using those “filters” we refined the them again. We repeated this process several times.
Ultimately we modified the clean-up to exclude spam comments before ranking, which immensely improved the resulting list.

When the clean-up code was working sufficiently, we enlarged the data set to check the code’s scalability. For that we chose 3 songs from Soundcloud’s charts for 3 genres (Rap/Hip-Hop, Pop, Metal) which we decided on beforehand.


Meeting 5. March

Meeting on 5. March:

Discussing and further brainstorming about topic. Decided to split idea into two main (separate) parts:

  1. Emotion visualization of music
  2. Visualization of different (music) subcultures’ lingo based on Soundcloud comments

Sketches / Brainstorming: