Final Visualizations

Finally we decide to go with two visualizations:

  • SpamCloud: The interesting or playful one
  • SoundCloud RADAR: The informative one

We believe both work well in unison with each other!

This was our final set-up for DemoDay:


SoundCloud is the world’s leading social sound platform where anyone can create sounds and share them. SoundCloud claims that their commenting system connects artists and fans and brings them closer together.

But is this premise actually fulfilled?

We analyzed comments posted on “Petal to the Maxx” by K.L. for the month of April and found that spam and SoundCloud user promotion (users posting links to their SoundCloud profiles) vastly outnumber actual comments. In reality, any real connection that could arise between artist and fans is instantly blocked by BS comments pushing real commenters out of view in the comment section.

SpamCloud is our visual representation of this circumstance.

Hovering over a comments displays the name of the commenter, the posting time on the song and the text of the comment.

SoundCloud RADAR

SoundCloud RADAR is an attempt to understand comments posted on SoundCloud. It is an exploratory tool that helps in finding and understanding patterns of commenting and spamming.

The visualization shows comments posted on “Petal to the Maxx” by K.L for the month of April. As they are posted in time, comments appear on the radial 24 h clock/calendar, revealing patterns in posting time and posting behaviour, but also highlights unusual, suspicious clusters of comments.

All comments are colour coded according to 3 categories:

  • Regular Comments
  • SoundCloud (SC) User Promotion
  • Spams

Lines connect comments that are posted by the same user, bigger circles indicate multiple comments of the same category at the same minute.

A small histogram on top of the music progress bar shows the number of comments posted at a particular time in the song, which is a key feature of SoundCloud’s commenting system. The bottom of the screen displays the ratio of regular comments to spam.

Hovering over a comment highlights it (and all connected comments) and displays the name of the commenter, the posting time on the song and the text of the comment.
To filter comments according to their category, click on the comment categories in the key.
Hovering over the progress bar histogram highlights the comments that were posted at that particular moment in the song.

Finalizing Visualization and final changes

Visualization 1: Colliding Particles

For the final visualization we made some changes to make the visualization a bit more communicative and emotive:

  1. We decided to change the interaction to show how spams and soundcloud user promotion push the regular comments out to make a wall of BS between the artist and the regular comments.

    Screen Shot 2016-05-26 at 21.03.49

  2. We added one more category of comment apart from the initial three. The new category is the SoundCloud promotion which mask them self as real comment. Those are the comments like “Really cool track. Please check me out!”
  3. We changed the circles to talking face GIFs to make it more playful and emotive.
    We designed the regular comment as smiley faces and the normal SoundCloud Profile promotion as very talkative and attention grabbing faces.
    The SoundCloud profile promotions which mask themselves as real comments as a face between regular comment and normal soundcloud profile promotion.
    Spams are robot faces.

Visualization 2: Concentric Circles

For the final visualization we changed the comments to appear in the sequence they were posted on soundcloud to make it more clear. To still see the patterns in the commenting based on posting time on song, we introduced a micro-visualization: a histogram over the the music progress bar on the left top of the screen. The user can hover and see the comments posted on that particular time on the song.

We also did some minor visual changes to make the whole visualization a bit more readable: changing the date format from 01/05 to 01 Apr; increasing the font size by 2 points; making every 5th circle in the concentric circle brighter than others to make it more readable and change the colour scheme of the comment category.


Visualization no.2

Screen recording of the (pretty much) finished visualization no.2.

Since we hit a wall when working on the other visualization, we decided to try and finalize this visualization. What still needs to be done in this one is writing the explanatory text and find understandable terms for the different comment types.

Bildschirmfoto 2016-05-18 um 23.32.55.png

(note: we were unable to record the sound in the screen recording, but if you want to enjoy it with the original soundtrack ☞

Another Visualization

Aside from the “bubbles” visualization we were working on another idea for showing our data, which would, as we hope, allow a easier exploration of the data.

To show not only the posting time on the song but also the actual posting date, we plotted the comments on concentric circles, each standing for a specific day. Their appearance in the visualization is dependent on the posting time on the song.
The comments are color-coded according to comment type and comments from the same user are connected by lines.

first functional try-out in processing ☟


It became apparent to us that most comments are posted in the second half of the day (12:00-24:00), as well that insert text here

We further iterated on the visuality and functionality:

color scheme try-outs ☟

Bildschirmfoto 2016-05-09 um 15.20.24.png01Bildschirmfoto 2016-05-09 um 15.27.15.png

try-outs on how to show that several comments are posted on the same point in time☟


highlighting and hover functionality ☟





First Functional Demo

A first try-out for the concept in processing: (8)

The comments stream in from the sides according to the time they are posted on the song itself and are attracted to the center of the screen.

First Improvements

Visual Encoding

In this test the size of the bubbles was random, we decided that we wanted the size to represent the length of the comments. We noticed that real comments tended to be pretty short, while spams were generally much longer, in a way cluttering up the comment space. The visualization should represent that.
To choose good breakpoints/sizes/ratios for the circle sizes, we looked at the distribution of the comment lengths and afterwards refined the results we got from that manually.

Bildschirmfoto 2016-05-04 um 14.41.05.pngBildschirmfoto 2016-05-09 um 13.56.07.png


We were also not quite happy with the bubbles concentrating in the center, so we tried out different movements:
(the bubbles are already roughly color-coded according to their category) (9).gif
Here spams (blue and green) comes in from the outside, like a foreign body, while the real comments (red) are generated inside the screen. The spams seem to bombard the real comments and disturb their formation. We felt however that it was difficult to follow the appearance of bubbles from two different locations in with different motions. (10).gif
Here the bubbles once again center in the middle, but like in the above version, spams fly in from outside, while comments are generated in the middle. This gives the appearance of the real comments being trapped in a fog of spam.
visual mock-up ☟

Bildschirmfoto 2016-05-09 um 13.23.21.png

This however posed the same problem as the above version, and didn’t quite result in the anticipated look and feel.

Ultimately we felt it worked best when comments/spams were coming in from opposing sides and then form clusters next to each other. This gives it a feeling of a struggle or fight and enables a quick comparison between the amount/size of both.


a couple of illustrator mock-ups ☟


We further added the actual comment text in the background, so it would be easier to understand what the visualization was about.

Bildschirmfoto 2016-05-09 um 14.55.20.png

Current State

color variations in the actual processing environment ☟


Current state of the visualization: (12)
After clusters have formed (11)

Furthermore we added additional features to allow a deeper exploration of the data:
stop/start on mouse-click
highlighting of comments from the same user on mouse-over
display of the actual comment text on mouse-over (13).gif



Visual Ideation and Data Encoding

After last weeks class we started again on the visual ideation for out new focus. Of the ideas we generated we decided to go with a visualization that would represent the comments/spams as a kind of “virtual audience”, a room where spams and comments would interact with each other, according to their type and properties.
We imagined that the spams would come in to disturb the real comments, which were happily and peacefully “dancing” to the music.


For this visualization we again scraped and encoded the data, this time with additional dimensions, which we might want to show in the final thing:

Bildschirmfoto 2016-05-09 um 13.49.42Bildschirmfoto 2016-05-09 um 13.48.41

Visualization Design according to Munzner’s Model


We decide to approach our visualization in more systematic way. We went back a step and applied Tamara Munzner’s model of visualization design to our data:
Tamara Munzner. Visualization Analysis and Design. CRC Press, 2014. [library ebook]

Bildschirmfoto 2016-04-17 um 16.02.06

Munzner divides the process of visualization into 3 steps:

  1. Data Abstraction – What type of data are you using?
  2. Task Abstraction – Why is the user looking at it?
  3. Idiom – How do you show the data, based on the data type and tasks?

Following these steps, we defined our data and tasks according to her terms.

1. Data Abstraction

First we categorized our data into items, which themselves have attributes which acn either be nominal, ordinal or quantitative. In our case our items are the (library)words from the comments, which have the attributes of
“word” (the actual english word e.g “awesome”, “song”),
“category” (grammar type, or maybe also affinity to a specific genre)
“time” (the point in time at which the word appears),
the derived attribute of “frequency”,
and also the text of the comment it originates from.
We did the same for the item “song” and finally classified all the attributes.

This left us with a nice and clear overview of the data we are actually dealing with and which requirements in visualization these data types might carry in them.

Bildschirmfoto 2016-04-17 um 16.03.16.png



2. Task Abstraction

Next, task abstraction. Task abstraction helps to define what the actual goal of the visualization is supposed to be. What do we want the user to do? What should he achieve?

Munzner gives a set of actions and targets, which are combined into target-actions pairs to formulate user tasks.

It helped us immensely to think about our goals in terms of target.action pairs. We defined our highest ranked task as “discover-features”, namely the “word-composition” of a song. The user should be able, looking at the visualization, to discover which words are used to describe the song, in which frequency they appear, at which point in time etc.

Other, secondary tasks are
“look up – distribution” (find a specific word and see it’s distribution in the song), “compare-distribution” (is the word frequency pattern for a specific song following the trend/distribution of the genre in whole? Could be separate, small visualization.)
and “enjoy”.

Bildschirmfoto 2016-04-17 um 18.49.05.png



3. Idiom

Based on the data types and tasks we defined in the first 2 steps we tried to choose a fitting visual encoding. Which “idiom” (visualization technique) suits our tasks, how can we “map” our attributes?

Through this though process we realized that the visualization types we had previously pivoting towards might actually not work with the tasks we wanted the user to fulfill. We therefore continued to ideate, now with Munzners model in mind, and narrowed down the possibilities to two idioms that we felt to be more appropriate:

  • Steamgraph
    visual example:
  • Unit chart
    visual example:
    Bildschirmfoto 2016-04-17 um 23.52.32.png

While the Streamgraph provides the user with a better overview of the distribution of the words (also in time) is better for comparison between different songs, and is visually more appealing (especially since it resembles the frequency visualization that is commonly used for music), the unit chart is better suited for the exploration of the data, as we could show the word in the context of the comment through interactivity. Also the attributes “word” and “category” would be more easy to map in the unit chart, since position could be used as a visual variable.



Afterthoughts / Questions

  • How do we handle the element of time in our visualization? We want the data to update as the respective song is playing, which means we have an element of animation. We are not sure however how to best present that. Should we mask the Streamgraph and reveal it when the song plays? Would that be too boring? Should it organically grow?
    Were and how do the markers in the unit chart appear? Do they pop up or come in from a specific time?
  • How can we make our visualization visually interesting enough to be enjoyable and novel enough for the user without it being totally arbitrary and unreadable?
  • The unit markers give us an opportunity to play with – can we do something “fancy” with them?

Working 21 – 31 March

Working on data acquisition and clean-up.

We started to scrape some test data (comments), clean, sort and rank the included words through Python to create our “library” of words used in music comments.

It became apparent that we would need to further clean/filter the data. We made a list of stop words ([..] stop words are words which are filtered out before or after processing of natural language data (text). […] stop words usually refer to the most common words in a language […] source: wikipedia) by combining several stop word list which we found online. Next we created a “spam filter” list, to exclude words that originated from spam comments. After using those “filters” we refined the them again. We repeated this process several times.
Ultimately we modified the clean-up to exclude spam comments before ranking, which immensely improved the resulting list.

When the clean-up code was working sufficiently, we enlarged the data set to check the code’s scalability. For that we chose 3 songs from Soundcloud’s charts for 3 genres (Rap/Hip-Hop, Pop, Metal) which we decided on beforehand.