开发者

Matching Kinect Audio with Video

I have a project dealing with video conferencing using the Kinect (or, more likely, four of them). Right now, my company uses these stupidly expensive cameras for our VTC rooms. The hope is, using a couple Kinects linked together, we can reduce the costs. The plan is to have four/five of them covering a 180 degree arc so the Kinects can see the entire room/table (still a lot cheaper than our current cameras!). The applications would choose a video stream coming from a Kinect based on who at the table is talking. Plan is fine in theory, but I've run into a snag.

As far as I can tell, there is no way to tell which microphone array corresponds to Kinect Runtime object. I can get an object representing each Kinect using:

Device device = new Device();
Runtime[] kinects = new Runtime[device.Count];
for( int i = 0; i < kinects.Length; i ++ )
    kinects[i] = new Runtime(i);

And every microphone array using:

var source = new KinectAudioSource();
IEnumerable<AudioDeviceInfo> devices = source.FindCaptureDevices();
foreach( AudioDeviceInfo in device in devices)
{
    KinectAudioSource devSpecificSource = new KinectAudioSource();
    devSpecificSource.MicrophoneIndex = (short)device.DeviceIndex;
}

but I cannot find any way to know that Runtime A corresponds to KinectAudioSource B. This isn't a huge problem for the two Kinects I'm using (I'll just guess which is which, and switch them if they're wrong), but when we get up to four or five Kinects, I don't want to need to do any kind of calibration every time the application runs. I've considered assuming that the Runtime and KinectAudioSource objects will be in the same order (Runtime index 0 corresponds to the first AudioDeviceInfo in devices), but that seems risky.

So, the question: is there any way to match a Runtime object with its KinectAudioSource? If not, is it guaranteed that they will be in the correct order so I can match Runtime 0 with the first KinectAudioSource microphone index in devices?

UPDATE: Finally slammed my face against WPF's single thread apartment requirement and the Kinect audio's multiple thread apartment requirement enough to get the two to behave together. Problem is, as far as I can tell, the order of the Kinect Runtime objects and KinectAudioSources do not line up. I'm in a rather loud lab (I'm one of.. maybe 40 interns in the room), so it's hard to test, but I'm fairly certain that the order is switched for the two Kinects I have plugged in. I have two Runtime objects and two KinectAudioSource objects. When the first KinectAudioSource reports that a sound is coming from directly in front of it, I'm actually standing in front of the Kinect associated with the second Runtime object. So there's no guarantee that the orders of the two will line up. So now, to repeat the question: how do I match up the KinectAudioSource object with the Nui.Runtime object? Right now, I only have two Kinects hooked up, but since the goal is four or five.. I need a concrete way to do this.

UPDATE 2: Brought the two Kinects I have at work back home to play with. Three Kinects, one computer. Fun stuff (it was actually a pain to get them all installed at once, and one of the video feeds doesn't seem to be working, so I'm back to 2 for now). musefan's answer got me hoping that I had missed something in the AudioDeviceInfo objects that would shed some light on this problem, but no luck. I found an interesting looking field in Runtime objects called NuiCamera.UniqueDeviceName, but I can't find any link between that and anything in AudioDeviceInfo.

Output from those fields, in the hopes Sherlock Holmes sees the thread and notices a connection:

Console.WriteLine("Nui{0}: {1}", i, nuis[i].NuiCamera.UniqueDeviceName);
//Nui0: USB\VID_0409&PID_005A\6&1F9D61BF&0&4
//Nui1: USB\VID_0409&PID_005A\6&356AC357&0&3

Console.WriteLine("AudioDeviceInfo{0}: {1}, {2}, {3}", audios.IndexOf(audio), device.DeviceID, device.DeviceIndex, device.DeviceName);
//AudioDeviceInfo0: {0.0.1.00000000}.{1945437e-2d55-45e5-82ba-fc3021441b17}, 0, Microphone Array (Kinect USB Audio)
//AudioDeviceInfo1: {0.0.1.00000000}.{6002e98f-2429-459a-8e82-9810330a8e25}, 1, Microphone Array (2- Kinect USB Audio)

UPDATE 3: I'm not looking for calibration techniques. I'm looking for a way to match the Kinect camera with its microphone array within the application at runtim开发者_如何学Pythone, with no previous set up required. Please stop posting possible calibration techniques. The entire point of posting the question was to find a way to avoid needing the user to do set up.

UPDATE 4: WMI definitely seems like the way to go. Unfortunately, I haven't had a lot of time to work on it, as I've been struggling just to get 3 Kinects to play nice with each other. Something about USB hubs not being able to handle the bandwidth? I've informed my boss that there doesn't seem to be any easy way to connect 3+ Kinects to a regular computer and not blue screen. I might still try to work on this in my free time, but as far as work goes.. it's pretty much a dead end.

Thanks for the answers guys, sorry I couldn't post a working solution.


The API provided by Microsoft Research doesn't actually provide this capability. Kinect is essentially multiple cameras, and a microphone array with each sensor having a unique driver stack so there is no linkage to the physical hardware device. The best way to achieve this would be to use the Windows API instead, by way of WMI and take the device ID's you get for the NUI camera, and microphones, and use WMI to find which USB bus they are attachted to (as each Kinect sensor has to be on its own bus) then you'll know which device matches what. This will be an expensive operation, so I would recommend you do this on start-up, or detection of the devices and keep the information persisted until a time you know the hardware configuration changes, or the application is reset. Using WMI through .NET is pretty well documented, but here is one article that specifically talks about USB devices through WMI/.NET: http://www.developerfusion.com/article/84338/making-usb-c-friendly/.


Mannimarco,

the only link I see is that a camera's UniqueDeviceName property equals it's 'device instance path'.

Doing a little research in the device manager on my computer I can tell that the last 2 numbers at the end of the camera's UniqueDeviceName (0&3, 0&4) are incrementing values (based on controller + port?).

My suggestion is that you sort your list of cameras based on those last digits, and sort your audiodevices on their DeviceID property. This way i suppose when you iterate over your camera list, you can use the corresponding index in the audiodevice list to match the 2 together.

Btw, this is my first post so please be gentle if I'm wrong...


I have had a look at the SDK documentation and it is not great in all honesty. Further more I do not have any Kinect devices to test this on.

The first thing I would do thou is to create an output list of all useful property values for each device, then I would start to look for matches across the two that look like they can be used for links. For each one I find, I would test to see if it does the job.

So I would have a simple console application to output the following property values:

For Each AudioDeviceInfo

  • DeviceID = X
  • DeviceIndex = X
  • DeviceName = X

For Each KinectAudioSource

  • MicrophoneIndex = X

For Each Runtime

  • InstanceIndex = X

then look for any matches in values. Nothing else in the SDK seems really useful. But there must be internal logic to the SDK when it return arrays of AudioDeviceInfo and Runtime.

Anyway, I hope you get it right somehow


I would get the audio stream from all of them and then compare volume levels. Once you have that you can determine the "object" or person in the kinects 3d space that is actually speaking.

From there you need to determine which cameras this object / person is visible in ...

yeh this is one complex project ... kinect is pretty awesome though ... I don't know much about the API but does it not give you distances and such of people?

good luck with it :)


I would just calibrate the kinects one by one, writing the unique device identifier pairs (camera id, microphone id) to a file. In your application you can then use that file at startup time to synchronize mircophone instances and camera instances (ie. create a table that relates one camera instance to one microphone instance). As camera and microphone inside the kinect probably have their own usb interface ic each (connected via an interal usb hub), there is technically no way to relate the two without prior calibration, as the two device identifier are probably completely unrelated. Also you might want to put labels on the Kinect units and reference these labels inside your initialization file.


Sounds interesting, maybe you need some "automatic calibration".

Maybe with some "remote power switches for each usb connection" (io card connected to the usb powerlines). So you could power-on one Kinect after the other automatically and now you know which microphone belongs to which camera.

Or something like that...

Regards! Stefan

0

上一篇:

下一篇:

精彩评论

暂无评论...
验证码 换一张
取 消

最新问答

问答排行榜