Friday, January 31, 2014

Ideas for detecting separate dancers in a tango couple

1. Face detection

2. Skin detection -- used to supplement #1 & also to detect ladies' bare legs

3.  Shoe/Foot detection - template matching? look for shoe-like blobs in segmented image. The ones near skin are ladies. The other ones are men.

4. Background subtraction is going to be useful even having depth ( I KNEW IT)

5. Thinning to get skeletal shapes? If have head & feet, could get CoM & possibly extrapolate which blobs are legs...

6. Optical flow to estimate where things are if they go out of frame.

Thursday, January 30, 2014

Slim hope that Kinect could detect skeletons on tango dancers dashed...






DSP, Audio, & Motion Capture

I was thinking about the connections between DSP for audio and audio processing / synthesis techniques and motion capture. Because you can think of both video & audio in terms of signals, they have a lot of similarities and can use the same techniques. You often have to transform signals into feature spaces... (ie, extract features) then work from there.

One universal problem is defining a perceptual quality (whether it is an action, how how an action is done, or a timbre color or pitch) within a computational space. Sometimes there seems this cruel quality in both practices: after all my human brain can track the motion. I understand what the timbre is and when it occurs. But my software doesn't have access to the tools that my brain does (yet). Nor has it been exposed to the years and years of training my brain has had to distinguish these qualities. This is very obvious to anyone in my field, but still, when I step back, it seems a bit poignant.

Of course, my brain can't generate a real-time control signal from movement to send to my audio synthesis routines, so there's that. :) Although I can use the information I have in the form that I have it to make noise via my physical body.


Friday, January 24, 2014

EXC_BAD_ACCESS in OSC + 64-bit Cinder with Xcode 4.6.2 in OS X 10.7

This was enough of a headache that it deserves a post.

I'm using cinder on OSX, 64-bit in order to run the OpenNI 2 block. The official release and everything of Cinder is still 32-bit, but most of the libraries will work in 64-bit -- but it is a pain. I am using the osc block that comes with the official 0.85 release of Cinder.

So I kept getting this weird EXC_BAD_ACCESS (code=13) error in OutboundPacketStream whenever I tried to use the osc::Sender object to send anything.

Unfortunately, I was convinced this was something weird I was doing myself... since all the examples worked & I have been using this library in 32-bit for a very long time...  But I wasn't thinking that about the fact they were in 32-bit & my code was compiling to 64-bit... in any case, it looks like it is fixed in the current version of the oscpack library (someone else had this problem, though not with the Cinder implementation in particular), so I replaced the /ip & /osc parts of the library. It solved my problem. Case closed!

Btw, I should probably move to OS X 10.9, I know. But you can see from the above mess that upgrading anything is generally a big hassle...

Cinder does not get as much curating as other libraries, like say Processing and has a much smaller user base. However, Cinder is leaner, meaner, and faster. And I've been coding in C++ for so long that I can just code in it -- unlike Java where I'm still occasionally looking up the syntax for something relatively basic or I'm doing something illegal since hey, you can do whatever I'm trying in C++... plus, its pretty easy to incorporate outside libraries.  I looked at OpenFrameworks for a little, too, but it looks messier to me....

Wednesday, January 22, 2014

Musical Analysis -- Moving Beyond the technical...


So, the main chunk of my analysis is actually not computational. I match up the musical and movement events in the various performances of excerpts Variations V (see tables below) & then highlight them on the similarity matrices of the music and movement (see similarity matrices below). I have conclusions and remarks in my paper, which I have a complete first draft of (!). 

Structure in Miller's Cage Centennial Performance of Variations V

Sample had two main sections,
1 louder and is movement and music are less similar – running through sensors
2. Starts with a transitional relative silence, is softer, much higher movement & audio similarity, especially at the end.



Event #
Movement
Music
1
Stillness
High sustaining tone, no onsets
2
Dancers break stillness and all move at once
Low honking sounds resume
3
Two dancers still, one dancer running in a wide circle
Noisy, percussive synth sounds that seem triggered by the running, at end glissandos start when all 3 are on stage
4
All three dancers fall down, dancers turn in unison
High downward glissando in sync with fall, silence for unison turn
5
Two dancers circle arms in sync, third dancer still
Silence
6
Two dancers jump & change levels, third dancer still
Low honking sounds in response to level changes, lower sounds that seem to be in response to dancers’ movements
7
Duo unison ends, all dancers change levels
Loud honking sounds
8
One dancer rocks back and forth, others move very little
Low sounds roughly in sync with rocking



Structure in decibel's Performance of Variations V


Less obvious relationships in music and audio than in Miller's version. The work starts out with one dancer moving in front of a sensor like (1), and these moments have the most audio & movement similarity. Then, she breaks up this by running around the space, and setting off a loud burst of sound. The last part involves some dancers moving very near to a sensor, while the other dancer moves away from the triggers. Thus, there are some indications of similarity between the matrices, but they are much less clear than in the beginning.


Event #
Movement
Music
1
1 dancer moving arms, in front of projection and sensor, 2 other dancers moving minimally
Glissando sine waves that seem to be affected by the dancer arm’s movements
2
2a – dancer in front of the projection moves away
all dancers briefly pause or move very little
Much softer noise, soft phased, saw wave  sounds glissando
3
1 dancer runs around the stage twice, two dancers move in unison in different corners of the stage area
Quieter noises continue
3a – loud siren-ish burst, which sounds during the first circle of the dancer
When the dancer passes again, another, slightly different and softer noise triggers
4
Running dancer stops, unison dancing continues.
One of the unison dancers stops & begins to move towards her unison partner for a duo
High glissando sine waves continue, seemly following the movements of unison dancers
5
1 duo, 1 solo (previously the runner) all moving and changing levels, no unison
Again, high glissando continues seemingly related to duo movements, but not the solo dancers
6
Solo dancer moves to the side off camera, duo breaks up, and everyone is changing position, perhaps away from sensors
Less glissando until there is a moment of sustain
7
Two dancers exit from view of the camera, off the stage area
High glissandos begin again




Music / Relationship Structure in 1966 Hamburg Cage and Cunningham performance

The video analysis of this work contained much more noise than in the previous two samples, since there were small camera movements and images obscuring the main viewpoint of the dancers. However, the similarity matrices show less connection between audio and music, except for the last moments of the clip, when a third dancer enters, and they all seem close to sensors as they move. All the SSMs show this moment. (Interaction #1)

Also, while it is not clear in the similarity matrix, the dancer on the left seems to be controlling a white noise bursts by his movements for a short time, exhibiting interaction #1.

Four different sections
            +opening – 2 men, each near a pole
            +2nd man exists
            +man & woman
            + short section when a second man enters, very apparent on similarity matrices


Cage –1st part, structure and correspondences

Event #
Movement
Music
1
One dancer stays in position, but arms move. A second dancer enters.
Radio static, feedback sounds, one clank
2
First dancer lunges & turns, second stays at the same location, but moves back and forth quickly
More sounds of radio static tuning, more feedback
3
First dancer stays still and balances, 2nd dancer still moves from side to side
Rhythmic clanking sounds enter over the sounds of radios tuning
4
First dancer collapses then starts jumping up and down and occasionally turning. Second dancer continues in his side to side figures but also occasionally jumps
The clanking changes rhythmic pattern, and the radio sounds get louder. Occasionally there is a burst of white noise which seems related to the movements of the first dancer’s movements




Cage – 2nd part, structure and correspondences

Event #
Movement
Music
1
Left dancer on one leg holds still. On right, dancer also on one leg, moves arms while turns
Quiet white noise with some quiet tones beneath
X
Close-up obscures view, disregard
Disregard
2
Dancer spins moving slowly left until out of camera view. Meanwhile, dancer in front slowly turns and moves one arm out and back
Soft buzzy clicks
3
Dancer extends leg out and back
Low bell sound that  seems controlled by leg
4
Dancer from left enters again. Meanwhile, dancer in front leans and lunges
Mostly quiet noises. Loud honk as left dancer takes a backward step
5
Third dancer enters. All dancer move.
More loud, intermittent honks. Radio static-type noise crescendos.


Monday, January 13, 2014

More results


Amplitudes instead of MFCC's. Interestingly, the giselle music & movement are closer together using this measure... perhaps hinting that timbre wasn't such a structural element in the work, while variation in amplitude (beats?) were. Makes sense. Using self-similarity matrices since the recurrence plots were too much tweaking.

I used gifs as the inputs instead of text files, which seemed to increase the accuracy. Using text files, two matrices of the same data (movement data), which did look extremely similar, were not clustered together. Switching to gifs solved the problem. Will probably redo the MFCC's that way as well. Seems writing the files to matlab data might not be the best thing for this process.

Tuesday, January 7, 2014

More recurrence plots

I spent most of the day creating a reasonable recurrence plot for the MFCC's. Tweaking parameters. I think basically I got into a rut.  Here's the Cage 100 Festival movement data & the MFCC recurrence
plot:

MFCC's, 10 bands of MFCCs, all the samples, e = 0.125, t=9, w=2
















Movement data,  e = 0.1, t=0, w=2

















Four more to go... in theory, it should now go faster since I learned all the tricks.

I don't know if its gonna work. Maybe I should just stick to SSM's. I mean, I spent so much detailed time understanding those. And the paper just needs to be written.


Recurrence Plots, Smoothing

So, I played around and found the best windows for my smoothing functions for the movement data. Then, I finally got around to creating some recurrence plots. I had received my password in the morning, but it turns out that the command-line rp tool is actually more useful since I'm using relatively large datasets. Plus, it didn't install in octave and since there was already a command-line tool...

One thing I have to figure out is how to combine the 22 MFCCs that I have from the audio data. I was just going to add the data, but this was... INCREDIBLY naive I realized once I thought about it for about thirty seconds. The recurrence plots I'm using are binary. Dur! Anyways, turns out I need to just have them all in one file, and have them as embedded vectors. I am going to do that tomorrow as it is almost 3am. I thought I was so close to more results. Le sigh.

In any case, here is the recurrence plot of the Cage 100 Festival's Variations V movement data:





















And here is the audio data (represented by MFCC's) -- okay so I had to try out the embedding...

Ok, it is still running. My CPU is around 100%, and it is still 0% done. Going to get some tea. I may have to finally sign up for a supercomputer account... or maybe I need to downsample. We see if a miracle happens. It is a lot of data. Like around (4000 * 22)^2 things need to get juggled around.

Tomorrow I will:
Get all the recurrence plots for the MFCCs done
Redo the movement ones, since I need to set the threshold lower
FINALLY run the NCD on all this.

I hope it was all worth it. Because then I need to start my by hand, eye, and ear musical analysis and pull out interesting tidbits from the mass of data. AND THEN, then I need to write the prose and spend hours and hours on my citations and references. I kid you not because I am writing in Chicago style because this is a music history paper, people. And people in the humanities do NOT kid around with citation styles, just compare the detail between IEEE and Chicago -- it is a whole world of detail.

UPDATE @ 3:30am:
3% Done. Well, maybe I'll let this one run the night?

Sunday, January 5, 2014

Actual Progress!!! Clustering Self-Similarity Matrices via Normalized Compression Distance (NCD)

See last post for awesome paper that led me to this!


Ok, so what does that MEAN??! you may ask. Well, this is the result of my clustering analysis (via NCD) of all the similarity matrices I have so far, and they roughly correspond with how similar I perceive them to be. I used the NCD command from CompLearn.  I highly recommend it! Simple to use! You have to install graphviz, for the neato command -- which they seem to assume you already have in the CompLearn documentation.

***The closer the ovals are to each other, more similar they are perceived to be.

Why is .DS_Store there? BC I haven't taken the time to figure out how to exclude it. Grrrr!

Abbreviations are:
v5cover_move - Movement data from Cage 100 Festival @ Chicago performance of Variations V
v5cover_music - MFCC (audio) data from Cage 100 Festival @ Chicago performance of Variations V
giselle_move - Movement data from Giselle ballet
giselle_music - MFCC (audio) data from Giselle ballet
v5_original_move - Movement data from Cage/Cunningham's Variations V (1966) Hamburg perf.
v5_original_music - MFCC (audio) data from Cage/Cunningham's Variations V (1966) Hamburg perf.
decibel_v5_move - Movement data from decibel's Variations V
decibel_v5_music - MFCC (audio) data from decibel's Variations V

So, the Cage 100 movement/music data pair (ie, v5cover_), in which the movement corresponds heavily to the music appear super close to each other. The other music/movement are sort of spread out, but stilll kinda close... y'know. AND the furthest away music/movement pair is the Giselle data, the only non-interactive work in the analysis (the only work that isn't a performance of Variations V). So, this is good! If this is the only computational measure of similarity I have for my paper for the class, I think that would be good enough!

So, clearly, this is a good start. I would like to try this with recurrence plots, like the previous paper mentioned. However, apparently they screen people who have access to CRP toolkit, so I'll have to wait for the download link email (or reapply in 4 days?). If, on the off-chance I can't get access, I'll just make sure my similarity maps are up to snuff. These results aren't publication-ready, by any means, but OMG! a start. I definitely would also like more data for comparison.

Maybe I will be able to publish on some of my results and methods. I was feeling very gloomy about that prospect these past couple days until now. 

Finally! A paper for self-similarity matrix comparison in the Music Information Retrieval literature -- doing what I want it to do!

Measuring Structural Similarity in Music - Juan P. Bello

I can definitely use this to compare music to movement.... It must be sensitive to small variations since he is using it to compare different performances of the same piece of music. He is using it for tonal music, but I can just use MFCCs (& maybe amplitude or STSMPS?) instead of the chroma-based features and CENs he's using, since (except for the Giselle null comparison example, the music is not necessarily pitch-based that I am analyzing) But anyways, that won't matter. I just need the comparison methods.

He is also using Recurrence Plots instead of self-similarity matrices, but they are similar measures. He then uses the Normalized Comparison Difference (NCD) -- something that is common to use in bioinformatics for genetics comparison -- to come up with a similarity score between the recurrence plots. Then, it looks like he finds threshold values for pairwise similarity and then uses the measure to retrieve performances of the same work of music. The measure is not tolerant of global structure changes (eg, if someone repeats a section and another does not) but that actually doesn't even matter for my application.

Tools that he uses:

Toolbox for Recurrence plots:
http://tocsy.pik-potsdam.de/CRPtoolbox/

Normalized Compression Distance:
http://www.complearn.org/ncd.html

I also had this idea in the shower that perhaps I could define the relationship between the structure of dance and music by their distance matrices. So if I had enough data, I could do ANOTHER NCD on the NCD data and see if the different performances of Variations V music and movement relationship (ie, the distance matrix between their similarity measures  (whether RP or SSM) ) cluster together -- as opposed to other interactive and non-interactive dance.

I found this link which provides a lot of good information about music information retrieval -- mostly because I accessed the textbook for the course through my university's library (he had a lot of papers on structural analysis of audio via SSM so I just looked up his name). I thought about using the method described in the textbook for the segmentation of self-similarity matrices to determine what the actual repetitive structure is but I think that Bello's approach is more apropo to my musical analysis problem. I would definitely apply the segmentation / path method as well if I had time... but man, do I need this paper to be over.