1. Face detection
2. Skin detection -- used to supplement #1 & also to detect ladies' bare legs
3. Shoe/Foot detection - template matching? look for shoe-like blobs in segmented image. The ones near skin are ladies. The other ones are men.
4. Background subtraction is going to be useful even having depth ( I KNEW IT)
5. Thinning to get skeletal shapes? If have head & feet, could get CoM & possibly extrapolate which blobs are legs...
6. Optical flow to estimate where things are if they go out of frame.
Research Blog about experimental music, dinosaurs, motion capture, Argentine Tango, music information retrieval (MIR), and accordions
Friday, January 31, 2014
Thursday, January 30, 2014
DSP, Audio, & Motion Capture
I was thinking about the connections between DSP for audio and audio processing / synthesis techniques and motion capture. Because you can think of both video & audio in terms of signals, they have a lot of similarities and can use the same techniques. You often have to transform signals into feature spaces... (ie, extract features) then work from there.
One universal problem is defining a perceptual quality (whether it is an action, how how an action is done, or a timbre color or pitch) within a computational space. Sometimes there seems this cruel quality in both practices: after all my human brain can track the motion. I understand what the timbre is and when it occurs. But my software doesn't have access to the tools that my brain does (yet). Nor has it been exposed to the years and years of training my brain has had to distinguish these qualities. This is very obvious to anyone in my field, but still, when I step back, it seems a bit poignant.
Of course, my brain can't generate a real-time control signal from movement to send to my audio synthesis routines, so there's that. :) Although I can use the information I have in the form that I have it to make noise via my physical body.
One universal problem is defining a perceptual quality (whether it is an action, how how an action is done, or a timbre color or pitch) within a computational space. Sometimes there seems this cruel quality in both practices: after all my human brain can track the motion. I understand what the timbre is and when it occurs. But my software doesn't have access to the tools that my brain does (yet). Nor has it been exposed to the years and years of training my brain has had to distinguish these qualities. This is very obvious to anyone in my field, but still, when I step back, it seems a bit poignant.
Of course, my brain can't generate a real-time control signal from movement to send to my audio synthesis routines, so there's that. :) Although I can use the information I have in the form that I have it to make noise via my physical body.
Friday, January 24, 2014
EXC_BAD_ACCESS in OSC + 64-bit Cinder with Xcode 4.6.2 in OS X 10.7
This was enough of a headache that it deserves a post.
I'm using cinder on OSX, 64-bit in order to run the OpenNI 2 block. The official release and everything of Cinder is still 32-bit, but most of the libraries will work in 64-bit -- but it is a pain. I am using the osc block that comes with the official 0.85 release of Cinder.
So I kept getting this weird EXC_BAD_ACCESS (code=13) error in OutboundPacketStream whenever I tried to use the osc::Sender object to send anything.
Unfortunately, I was convinced this was something weird I was doing myself... since all the examples worked & I have been using this library in 32-bit for a very long time... But I wasn't thinking that about the fact they were in 32-bit & my code was compiling to 64-bit... in any case, it looks like it is fixed in the current version of the oscpack library (someone else had this problem, though not with the Cinder implementation in particular), so I replaced the /ip & /osc parts of the library. It solved my problem. Case closed!
Btw, I should probably move to OS X 10.9, I know. But you can see from the above mess that upgrading anything is generally a big hassle...
Cinder does not get as much curating as other libraries, like say Processing and has a much smaller user base. However, Cinder is leaner, meaner, and faster. And I've been coding in C++ for so long that I can just code in it -- unlike Java where I'm still occasionally looking up the syntax for something relatively basic or I'm doing something illegal since hey, you can do whatever I'm trying in C++... plus, its pretty easy to incorporate outside libraries. I looked at OpenFrameworks for a little, too, but it looks messier to me....
I'm using cinder on OSX, 64-bit in order to run the OpenNI 2 block. The official release and everything of Cinder is still 32-bit, but most of the libraries will work in 64-bit -- but it is a pain. I am using the osc block that comes with the official 0.85 release of Cinder.
So I kept getting this weird EXC_BAD_ACCESS (code=13) error in OutboundPacketStream whenever I tried to use the osc::Sender object to send anything.
Unfortunately, I was convinced this was something weird I was doing myself... since all the examples worked & I have been using this library in 32-bit for a very long time... But I wasn't thinking that about the fact they were in 32-bit & my code was compiling to 64-bit... in any case, it looks like it is fixed in the current version of the oscpack library (someone else had this problem, though not with the Cinder implementation in particular), so I replaced the /ip & /osc parts of the library. It solved my problem. Case closed!
Btw, I should probably move to OS X 10.9, I know. But you can see from the above mess that upgrading anything is generally a big hassle...
Cinder does not get as much curating as other libraries, like say Processing and has a much smaller user base. However, Cinder is leaner, meaner, and faster. And I've been coding in C++ for so long that I can just code in it -- unlike Java where I'm still occasionally looking up the syntax for something relatively basic or I'm doing something illegal since hey, you can do whatever I'm trying in C++... plus, its pretty easy to incorporate outside libraries. I looked at OpenFrameworks for a little, too, but it looks messier to me....
Wednesday, January 22, 2014
Musical Analysis -- Moving Beyond the technical...
So, the main chunk of my analysis is actually not computational. I match up the musical and movement events in the various performances of excerpts Variations V (see tables below) & then highlight them on the similarity matrices of the music and movement (see similarity matrices below). I have conclusions and remarks in my paper, which I have a complete first draft of (!).
Structure in Miller's Cage Centennial Performance of Variations V
Sample had two main sections,
1 louder and is movement and music are less similar – running through sensors
Event #
|
Movement
|
Music
|
1
|
Stillness
|
High sustaining tone, no onsets
|
2
|
Dancers break stillness and all move at once
|
Low honking sounds resume
|
3
|
Two dancers still, one dancer running in a wide circle
|
Noisy, percussive synth sounds that seem triggered by the
running, at end glissandos start when all 3 are on stage
|
4
|
All three dancers fall down, dancers turn in unison
|
High downward glissando in sync with fall, silence for
unison turn
|
5
|
Two dancers circle arms in sync, third dancer still
|
Silence
|
6
|
Two dancers jump & change levels, third dancer still
|
Low honking sounds in response to level changes, lower
sounds that seem to be in response to dancers’ movements
|
7
|
Duo unison ends, all dancers change levels
|
Loud honking sounds
|
8
|
One dancer rocks back and forth, others move very little
|
Low sounds roughly in sync with rocking
|
Less obvious relationships in music and audio than in Miller's version.
The work starts out with one dancer moving in front of a sensor like (1), and
these moments have the most audio & movement similarity. Then, she breaks
up this by running around the space, and setting off a loud burst of sound. The
last part involves some dancers moving very near to a sensor, while the other
dancer moves away from the triggers. Thus, there are some indications of
similarity between the matrices, but they are much less clear than in the
beginning.
Event #
|
Movement
|
Music
|
1
|
1 dancer moving arms, in front of projection and sensor, 2
other dancers moving minimally
|
Glissando sine waves that seem to be affected by the
dancer arm’s movements
|
2
|
2a – dancer in front of the projection moves away
all dancers briefly pause or move very little
|
Much softer noise, soft phased, saw wave sounds glissando
|
3
|
1 dancer runs around the stage twice, two dancers move in
unison in different corners of the stage area
|
Quieter noises continue
3a – loud siren-ish burst, which sounds during the first
circle of the dancer
When the dancer passes again, another, slightly different
and softer noise triggers
|
4
|
Running dancer stops, unison dancing continues.
One of the unison dancers stops & begins to move
towards her unison partner for a duo
|
High glissando sine waves continue, seemly following the
movements of unison dancers
|
5
|
1 duo, 1 solo (previously the runner) all moving and
changing levels, no unison
|
Again, high glissando continues seemingly related to duo
movements, but not the solo dancers
|
6
|
Solo dancer moves to the side off camera, duo breaks up,
and everyone is changing position, perhaps away from sensors
|
Less glissando until there is a moment of sustain
|
7
|
Two dancers exit from view of the camera, off the stage
area
|
High glissandos begin again
|
The video analysis of this work contained much more noise
than in the previous two samples, since there were small camera movements and
images obscuring the main viewpoint of the dancers. However, the similarity
matrices show less connection between audio and music, except for the last
moments of the clip, when a third dancer enters, and they all seem close to
sensors as they move. All the SSMs show this moment. (Interaction #1)
Also, while it is not clear in the similarity matrix, the
dancer on the left seems to be controlling a white noise bursts by his
movements for a short time, exhibiting interaction #1.
Four different sections
+opening –
2 men, each near a pole
+2nd
man exists
+man &
woman
+ short
section when a second man enters, very apparent on similarity matrices
Cage –1st part, structure and correspondences
Event #
|
Movement
|
Music
|
1
|
One dancer stays in position, but arms move. A second
dancer enters.
|
Radio static, feedback sounds, one clank
|
2
|
First dancer lunges & turns, second stays at the same
location, but moves back and forth quickly
|
More sounds of radio static tuning, more feedback
|
3
|
First dancer stays still and balances, 2nd
dancer still moves from side to side
|
Rhythmic clanking sounds enter over the sounds of radios
tuning
|
4
|
First dancer collapses then starts jumping up and down and
occasionally turning. Second dancer continues in his side to side figures but
also occasionally jumps
|
The clanking changes rhythmic pattern, and the radio
sounds get louder. Occasionally there is a burst of white noise which seems
related to the movements of the first dancer’s movements
|
Cage – 2nd part, structure and correspondences
Event #
|
Movement
|
Music
|
1
|
Left dancer on one leg holds still. On right, dancer also
on one leg, moves arms while turns
|
Quiet white noise with some quiet tones beneath
|
X
|
Close-up obscures view, disregard
|
Disregard
|
2
|
Dancer spins moving slowly left until out of camera view.
Meanwhile, dancer in front slowly turns and moves one arm out and back
|
Soft buzzy clicks
|
3
|
Dancer extends leg out and back
|
Low bell sound that
seems controlled by leg
|
4
|
Dancer from left enters again. Meanwhile, dancer in front
leans and lunges
|
Mostly quiet noises. Loud honk as left dancer takes a
backward step
|
5
|
Third dancer enters. All dancer move.
|
More loud, intermittent honks. Radio static-type noise
crescendos.
|
Monday, January 13, 2014
More results
I used gifs as the inputs instead of text files, which seemed to increase the accuracy. Using text files, two matrices of the same data (movement data), which did look extremely similar, were not clustered together. Switching to gifs solved the problem. Will probably redo the MFCC's that way as well. Seems writing the files to matlab data might not be the best thing for this process.
Tuesday, January 7, 2014
More recurrence plots
I spent most of the day creating a reasonable recurrence plot for the MFCC's. Tweaking parameters. I think basically I got into a rut. Here's the Cage 100 Festival movement data & the MFCC recurrence
plot:
MFCC's, 10 bands of MFCCs, all the samples, e = 0.125, t=9, w=2
Movement data, e = 0.1, t=0, w=2
Four more to go... in theory, it should now go faster since I learned all the tricks.
I don't know if its gonna work. Maybe I should just stick to SSM's. I mean, I spent so much detailed time understanding those. And the paper just needs to be written.
plot:
MFCC's, 10 bands of MFCCs, all the samples, e = 0.125, t=9, w=2
Movement data, e = 0.1, t=0, w=2
Four more to go... in theory, it should now go faster since I learned all the tricks.
I don't know if its gonna work. Maybe I should just stick to SSM's. I mean, I spent so much detailed time understanding those. And the paper just needs to be written.
Recurrence Plots, Smoothing
So, I played around and found the best windows for my smoothing functions for the movement data. Then, I finally got around to creating some recurrence plots. I had received my password in the morning, but it turns out that the command-line rp tool is actually more useful since I'm using relatively large datasets. Plus, it didn't install in octave and since there was already a command-line tool...
One thing I have to figure out is how to combine the 22 MFCCs that I have from the audio data. I was just going to add the data, but this was... INCREDIBLY naive I realized once I thought about it for about thirty seconds. The recurrence plots I'm using are binary. Dur! Anyways, turns out I need to just have them all in one file, and have them as embedded vectors. I am going to do that tomorrow as it is almost 3am. I thought I was so close to more results. Le sigh.
In any case, here is the recurrence plot of the Cage 100 Festival's Variations V movement data:
And here is the audio data (represented by MFCC's) -- okay so I had to try out the embedding...
Ok, it is still running. My CPU is around 100%, and it is still 0% done. Going to get some tea. I may have to finally sign up for a supercomputer account... or maybe I need to downsample. We see if a miracle happens. It is a lot of data. Like around (4000 * 22)^2 things need to get juggled around.
Tomorrow I will:
Get all the recurrence plots for the MFCCs done
Redo the movement ones, since I need to set the threshold lower
FINALLY run the NCD on all this.
I hope it was all worth it. Because then I need to start my by hand, eye, and ear musical analysis and pull out interesting tidbits from the mass of data. AND THEN, then I need to write the prose and spend hours and hours on my citations and references. I kid you not because I am writing in Chicago style because this is a music history paper, people. And people in the humanities do NOT kid around with citation styles, just compare the detail between IEEE and Chicago -- it is a whole world of detail.
UPDATE @ 3:30am:
3% Done. Well, maybe I'll let this one run the night?
One thing I have to figure out is how to combine the 22 MFCCs that I have from the audio data. I was just going to add the data, but this was... INCREDIBLY naive I realized once I thought about it for about thirty seconds. The recurrence plots I'm using are binary. Dur! Anyways, turns out I need to just have them all in one file, and have them as embedded vectors. I am going to do that tomorrow as it is almost 3am. I thought I was so close to more results. Le sigh.
In any case, here is the recurrence plot of the Cage 100 Festival's Variations V movement data:
And here is the audio data (represented by MFCC's) -- okay so I had to try out the embedding...
Ok, it is still running. My CPU is around 100%, and it is still 0% done. Going to get some tea. I may have to finally sign up for a supercomputer account... or maybe I need to downsample. We see if a miracle happens. It is a lot of data. Like around (4000 * 22)^2 things need to get juggled around.
Tomorrow I will:
Get all the recurrence plots for the MFCCs done
Redo the movement ones, since I need to set the threshold lower
FINALLY run the NCD on all this.
I hope it was all worth it. Because then I need to start my by hand, eye, and ear musical analysis and pull out interesting tidbits from the mass of data. AND THEN, then I need to write the prose and spend hours and hours on my citations and references. I kid you not because I am writing in Chicago style because this is a music history paper, people. And people in the humanities do NOT kid around with citation styles, just compare the detail between IEEE and Chicago -- it is a whole world of detail.
UPDATE @ 3:30am:
3% Done. Well, maybe I'll let this one run the night?
Sunday, January 5, 2014
Actual Progress!!! Clustering Self-Similarity Matrices via Normalized Compression Distance (NCD)
See last post for awesome paper that led me to this!
Ok, so what does that MEAN??! you may ask. Well, this is the result of my clustering analysis (via NCD) of all the similarity matrices I have so far, and they roughly correspond with how similar I perceive them to be. I used the NCD command from CompLearn. I highly recommend it! Simple to use! You have to install graphviz, for the neato command -- which they seem to assume you already have in the CompLearn documentation.
***The closer the ovals are to each other, more similar they are perceived to be.
Why is .DS_Store there? BC I haven't taken the time to figure out how to exclude it. Grrrr!
Abbreviations are:
v5cover_move - Movement data from Cage 100 Festival @ Chicago performance of Variations V
v5cover_music - MFCC (audio) data from Cage 100 Festival @ Chicago performance of Variations V
giselle_move - Movement data from Giselle ballet
giselle_music - MFCC (audio) data from Giselle ballet
v5_original_move - Movement data from Cage/Cunningham's Variations V (1966) Hamburg perf.
Ok, so what does that MEAN??! you may ask. Well, this is the result of my clustering analysis (via NCD) of all the similarity matrices I have so far, and they roughly correspond with how similar I perceive them to be. I used the NCD command from CompLearn. I highly recommend it! Simple to use! You have to install graphviz, for the neato command -- which they seem to assume you already have in the CompLearn documentation.
***The closer the ovals are to each other, more similar they are perceived to be.
Why is .DS_Store there? BC I haven't taken the time to figure out how to exclude it. Grrrr!
Abbreviations are:
v5cover_move - Movement data from Cage 100 Festival @ Chicago performance of Variations V
v5cover_music - MFCC (audio) data from Cage 100 Festival @ Chicago performance of Variations V
giselle_move - Movement data from Giselle ballet
giselle_music - MFCC (audio) data from Giselle ballet
v5_original_move - Movement data from Cage/Cunningham's Variations V (1966) Hamburg perf.
v5_original_music - MFCC (audio) data from Cage/Cunningham's Variations V (1966) Hamburg perf.
decibel_v5_move - Movement data from decibel's Variations V
decibel_v5_music - MFCC (audio) data from decibel's Variations V
So, the Cage 100 movement/music data pair (ie, v5cover_), in which the movement corresponds heavily to the music appear super close to each other. The other music/movement are sort of spread out, but stilll kinda close... y'know. AND the furthest away music/movement pair is the Giselle data, the only non-interactive work in the analysis (the only work that isn't a performance of Variations V). So, this is good! If this is the only computational measure of similarity I have for my paper for the class, I think that would be good enough!
So, clearly, this is a good start. I would like to try this with recurrence plots, like the previous paper mentioned. However, apparently they screen people who have access to CRP toolkit, so I'll have to wait for the download link email (or reapply in 4 days?). If, on the off-chance I can't get access, I'll just make sure my similarity maps are up to snuff. These results aren't publication-ready, by any means, but OMG! a start. I definitely would also like more data for comparison.
Maybe I will be able to publish on some of my results and methods. I was feeling very gloomy about that prospect these past couple days until now.
So, the Cage 100 movement/music data pair (ie, v5cover_), in which the movement corresponds heavily to the music appear super close to each other. The other music/movement are sort of spread out, but stilll kinda close... y'know. AND the furthest away music/movement pair is the Giselle data, the only non-interactive work in the analysis (the only work that isn't a performance of Variations V). So, this is good! If this is the only computational measure of similarity I have for my paper for the class, I think that would be good enough!
So, clearly, this is a good start. I would like to try this with recurrence plots, like the previous paper mentioned. However, apparently they screen people who have access to CRP toolkit, so I'll have to wait for the download link email (or reapply in 4 days?). If, on the off-chance I can't get access, I'll just make sure my similarity maps are up to snuff. These results aren't publication-ready, by any means, but OMG! a start. I definitely would also like more data for comparison.
Maybe I will be able to publish on some of my results and methods. I was feeling very gloomy about that prospect these past couple days until now.
Finally! A paper for self-similarity matrix comparison in the Music Information Retrieval literature -- doing what I want it to do!
Measuring Structural Similarity in Music - Juan P. Bello
I can definitely use this to compare music to movement.... It must be sensitive to small variations since he is using it to compare different performances of the same piece of music. He is using it for tonal music, but I can just use MFCCs (& maybe amplitude or STSMPS?) instead of the chroma-based features and CENs he's using, since (except for the Giselle null comparison example, the music is not necessarily pitch-based that I am analyzing) But anyways, that won't matter. I just need the comparison methods.
He is also using Recurrence Plots instead of self-similarity matrices, but they are similar measures. He then uses the Normalized Comparison Difference (NCD) -- something that is common to use in bioinformatics for genetics comparison -- to come up with a similarity score between the recurrence plots. Then, it looks like he finds threshold values for pairwise similarity and then uses the measure to retrieve performances of the same work of music. The measure is not tolerant of global structure changes (eg, if someone repeats a section and another does not) but that actually doesn't even matter for my application.
Tools that he uses:
Toolbox for Recurrence plots:
http://tocsy.pik-potsdam.de/CRPtoolbox/
Normalized Compression Distance:
http://www.complearn.org/ncd.html
I also had this idea in the shower that perhaps I could define the relationship between the structure of dance and music by their distance matrices. So if I had enough data, I could do ANOTHER NCD on the NCD data and see if the different performances of Variations V music and movement relationship (ie, the distance matrix between their similarity measures (whether RP or SSM) ) cluster together -- as opposed to other interactive and non-interactive dance.
I found this link which provides a lot of good information about music information retrieval -- mostly because I accessed the textbook for the course through my university's library (he had a lot of papers on structural analysis of audio via SSM so I just looked up his name). I thought about using the method described in the textbook for the segmentation of self-similarity matrices to determine what the actual repetitive structure is but I think that Bello's approach is more apropo to my musical analysis problem. I would definitely apply the segmentation / path method as well if I had time... but man, do I need this paper to be over.
I can definitely use this to compare music to movement.... It must be sensitive to small variations since he is using it to compare different performances of the same piece of music. He is using it for tonal music, but I can just use MFCCs (& maybe amplitude or STSMPS?) instead of the chroma-based features and CENs he's using, since (except for the Giselle null comparison example, the music is not necessarily pitch-based that I am analyzing) But anyways, that won't matter. I just need the comparison methods.
He is also using Recurrence Plots instead of self-similarity matrices, but they are similar measures. He then uses the Normalized Comparison Difference (NCD) -- something that is common to use in bioinformatics for genetics comparison -- to come up with a similarity score between the recurrence plots. Then, it looks like he finds threshold values for pairwise similarity and then uses the measure to retrieve performances of the same work of music. The measure is not tolerant of global structure changes (eg, if someone repeats a section and another does not) but that actually doesn't even matter for my application.
Tools that he uses:
Toolbox for Recurrence plots:
http://tocsy.pik-potsdam.de/CRPtoolbox/
Normalized Compression Distance:
http://www.complearn.org/ncd.html
I also had this idea in the shower that perhaps I could define the relationship between the structure of dance and music by their distance matrices. So if I had enough data, I could do ANOTHER NCD on the NCD data and see if the different performances of Variations V music and movement relationship (ie, the distance matrix between their similarity measures (whether RP or SSM) ) cluster together -- as opposed to other interactive and non-interactive dance.
I found this link which provides a lot of good information about music information retrieval -- mostly because I accessed the textbook for the course through my university's library (he had a lot of papers on structural analysis of audio via SSM so I just looked up his name). I thought about using the method described in the textbook for the segmentation of self-similarity matrices to determine what the actual repetitive structure is but I think that Bello's approach is more apropo to my musical analysis problem. I would definitely apply the segmentation / path method as well if I had time... but man, do I need this paper to be over.
Subscribe to:
Posts (Atom)