Patrick Godwin's Blog-O-Rama!

Developers, developers, developers!

Possible Kinect Fun Labs Dev Kit in the works?

If you haven’t heard, Microsoft announced yesterday that the Imagine Cup will now have a “Kinect Fun Labs Challenge” for this years competition. Students are tasked to “solve some of the worlds toughest problems” through the help of technology, in this case Kinect. I personally think this is incredibly exciting, because it gives students a chance to build innovative solutions with the Kinect Sensor.

First thing I did was browse the rules of the contest, wanting to get as much information as possible. I learned that all participants who move on to Round 2 of the competition will receive a free Kinect for Windows Sensor… and something a bit interesting. The following is taken from the official rules for the contest:

GAK

I’m not reading much into this, but it’s interesting to see that students will be getting a new SDK for Kinect Fun Labs. Perhaps we’ll see something come out for all developers in the near future? Only time will tell.

In the meantime, if you are a student passionate about changing the world through technology, sign up for the Imagine Cup today. Not only can you win a Kinect for Windows Sensor, but if your idea is great enough, you could win a free trip to Australia, as well as up to $8,000 USD. Sign up today!

Intro to the Kinect SDK–Adding Speech Recognition

For those of you who frequent this blog, you know a few days ago I wrote an introductory article on Kinect and XNA (link). In that article, I modified the Primitive 3D Sample from App Hub to render Joints from Kinect as Primitive Spheres. I’ve decided to build upon that sample, and leverage the Kinect’s NUI Microphone and the Microsoft Speech Recognition SDK to replace touch/keyboard input in the sample. I also refactored the previous sample a bit.

Before we get started, you need to make sure you have some pre-requisites installed:

(Note: The SDK and Runtime are x86, as the Kinect Language Pack is only x86 for now)

Now let’s dive into it. First thing we need to do is add a few more using statements to the project:

using Microsoft.Research.Kinect.Audio;
using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;
using System.IO;

Next, add these variables under the variables we created last time:

KinectAudioSource kinectSource;
SpeechRecognitionEngine speechEngine;
Stream stream;
string RecognizerId = "SR_MS_en-US_Kinect_10.0";
bool speechNotRecognized;

These variables go with the rest of the fields we declared in the last tutorial. I’ll explain what each one of these does later. Next, take the Kinect code from our LoadContent function:

nui = new Runtime();
nui.Initialize(RuntimeOptions.UseSkeletalTracking);
nui.SkeletonFrameReady += new EventHandler<SkeletonFrameReadyEventArgs>(nui_SkeletonFrameReady);
nui.NuiCamera.ElevationAngle = 0;

And move it to a new function called InitalizeKinect. Your LoadContent function should look like this now:

protected override void LoadContent()
{
    spriteBatch = new SpriteBatch(GraphicsDevice);
    spriteFont = Content.Load<SpriteFont>("hudfont");

    primitives.Add(new CubePrimitive(GraphicsDevice));
    primitives.Add(new SpherePrimitive(GraphicsDevice));
    primitives.Add(new CylinderPrimitive(GraphicsDevice));
    primitives.Add(new TorusPrimitive(GraphicsDevice));
    primitives.Add(new TeapotPrimitive(GraphicsDevice));

    wireFrameState = new RasterizerState()
    {
        FillMode = FillMode.WireFrame,
        CullMode = CullMode.None,
    };


    InitalizeKinect();

}

Let’s dive into that new InitalizeKinect function:

private void InitalizeKinect()
{

    nui = new Runtime();
    nui.Initialize(RuntimeOptions.UseSkeletalTracking);
    nui.SkeletonFrameReady += new EventHandler<SkeletonFrameReadyEventArgs>(nui_SkeletonFrameReady);
    nui.NuiCamera.ElevationAngle = 0;

    kinectSource = new KinectAudioSource();

    kinectSource.FeatureMode = true;
    kinectSource.AutomaticGainControl = false;
    kinectSource.SystemMode = SystemMode.OptibeamArrayOnly;

    var rec = (from r in SpeechRecognitionEngine.InstalledRecognizers() where r.Id == RecognizerId select r).FirstOrDefault();

    speechEngine = new SpeechRecognitionEngine(rec.Id);

    var choices = new Choices();
    choices.Add("color");
    choices.Add("shape");
    choices.Add("wireframe");
    choices.Add("exit");

    GrammarBuilder gb = new GrammarBuilder();
    gb.Culture = rec.Culture;
    gb.Append(choices);

    var g = new Grammar(gb);

    speechEngine.LoadGrammar(g);
    speechEngine.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(sre_SpeechHypothesized);
    speechEngine.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);
    speechEngine.SpeechRecognitionRejected += new EventHandler<SpeechRecognitionRejectedEventArgs>(sre_SpeechRecognitionRejected);

    Console.WriteLine("Recognizing Speech");

    stream = kinectSource.Start();

    speechEngine.SetInputToAudioStream(stream,
                  new SpeechAudioFormatInfo(
                      EncodingFormat.Pcm, 16000, 16, 1,
                      32000, 2, null));


    speechEngine.RecognizeAsync(RecognizeMode.Multiple);

}

This is a lengthy function, but each part is important. Notice at the top, we initialize our Runtime and Skeletal Tracking features like last time.

The next thing to notice is the KinectAudioSource, kinectSource. This is how we access the four microphone array on the Kinect Sensor in code. Right here we create a new instance of KinectAudioSource. We also turn off Automatic Gain and Echo Cancellation, so the speech recognition capabilities can work properly.

Now we need to grab our Kinect Language Recognizer. We do that by checking what speech recognizers exist on the machine, and grab the Kinect Language Recognizer.

We then use the recognizer information we grabbed to create a new instance of SpeechRecognitionEngine. We then create a new Choices object, and add all of the words we want the speech recognition engine to recognize.

Next, we need to create a GrammarBuilder that will help us build the Grammar object used by the SpeechRecognitionEngine object. We set the GrammarBuilder’s Culture using the Speech Recognizer we got earlier, and add our words to the GrammarBuilder.

Next we’re going to wire up a few event handlers. I’m not going to dive into each event handler here, as the only one that’s really important is the SpeechRecognized handler. I’ll explain that in a bit.

Finally, we want to set our stream variable equal to the Kinect’s audio stream. We then tell to use the speech engine to use this stream for audio recognition. The last thing this function does is call the RecognizeAsync function, which tells the SpeechRecognitionEngine object to start looking for recognized words.

Let’s dive into our SpeechRecognized event handler:

void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    speechNotRecognized = false;
    if (e.Result.Text == "color")
    {
        currentColorIndex = (currentColorIndex + 1) % colors.Count;
    }
    else if (e.Result.Text == "wireframe")
    {
        isWireframe = !isWireframe;
    }
    else if (e.Result.Text == "shape")
    {
        currentPrimitiveIndex = (currentPrimitiveIndex + 1) % primitives.Count;
    }
    else if (e.Result.Text == "exit")
    {
        Exit();
    }
    Console.Write("rSpeech Recognized: t{0} n", e.Result.Text);
}

This function is where the processing work of the recognized audio happens. The SpeechRecognizedEventArgs contains the result of the recognized speech. In the Result property, there is a Text property which you can use to compare against your choices. The code in here is pretty self explanatory. We simply change our rendering properties, currentPrimitiveIndex, isWireframe, and currentColorIndex, to be used later in the Draw function.

Those are most of the important changes. Take a look at the attached sample to see the rest of the minor changes to the Draw function and the other Event Handlers. You can download the entire sample here.

Let me know if you have any feedback on these samples, or if there is anything you can add. I’m always looking for advice, so I’m happy to hear from my readers.

Until next time,

Patrick Godwin

Intro to the Kinect SDK–Drawing Joints in XNA

It’s been a few days since Microsoft released the Kinect for Windows SDK, and we’re already seeing a lot of work being done. I decided to get my hands dirty and try out the fancy new skeletal tracking. First things first, you’re going to need to make sure you have some pre-requisites:

With all of these requirements satisfied, you can get started.

The first thing you need to do is add a few variables to the project:

SkeletonData skeleton;
Runtime nui;

These will be used to track data from the Skeleton provided by the Kinect SDK. These will be used later.

Next, in the LoadContent function of the Primitives3DGame class, add the following code:

nui = new Runtime();
nui.Initialize(RuntimeOptions.UseSkeletalTracking);
nui.SkeletonFrameReady += new EventHandler<SkeletonFrameReadyEventArgs>(nui_SkeletonFrameReady);
nui.NuiCamera.ElevationAngle = 0;

In these five lines of code we’ve initialized the Kinect Runtime engine to use the Skeletal Tracking feature of the Kinect, created an event handler for the SkeletonFrameReady event, and made sure the Kinect is not elevated from a previous use.

Next we’ll want to add some code to that nui_SkeletonFrameReady event handler:

void nui_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
{
    foreach (SkeletonData s in e.SkeletonFrame.Skeletons)
    {
        if (s.TrackingState == SkeletonTrackingState.Tracked)
        {
            skeleton = s;
        }
    }
}

This function is called every time the program gets skeleton data from the Kinect. We make sure that the SkeletonEngine is currently tracking a skeleton, and then we make a reference to that skeleton so it can be rendered later.

Now we want to modify the Draw function of the sample:

protected override void Draw(GameTime gameTime)
{
    GraphicsDevice.Clear(Color.CornflowerBlue);

    if (isWireframe)
    {
        GraphicsDevice.RasterizerState = wireFrameState;
    }
    else
    {
        GraphicsDevice.RasterizerState = RasterizerState.CullCounterClockwise;
    }

    Matrix view = Matrix.CreateLookAt(new Vector3(0, 0, -20), new Vector3(0, 0, 100), Vector3.Up);
    Matrix projection = Matrix.CreatePerspectiveFieldOfView(MathHelper.PiOver4,
                                                GraphicsDevice.Viewport.AspectRatio,
                                                1.0f,
                                                100);

    // Draw the current primitive.
    GeometricPrimitive currentPrimitive = primitives[currentPrimitiveIndex];
    Color color = colors[currentColorIndex];

    DrawPrimitveSkeleton(currentPrimitive, view, projection, color);

    // Reset the fill mode renderstate.
    GraphicsDevice.RasterizerState = RasterizerState.CullCounterClockwise;

    // Draw overlay text.
    string text = "A or tap top of screen = Change primitiven" +
                  "B or tap bottom left of screen = Change colorn" +
                  "Y or tap bottom right of screen = Toggle wireframe";

    spriteBatch.Begin();
    spriteBatch.DrawString(spriteFont, text, new Vector2(48, 48), Color.White);
    spriteBatch.End();

    base.Draw(gameTime);
}

There are a few things to note here, as this function is quite different than the draw function of the sample. The first thing is the view and projection matrices. We need to move the View Matrix further back from the Origin to get a better view of the skeleton. Then we simply pass any relevent data to the DrawPrimitiveSkeleton function, and allow that to draw all of the joints:

private void DrawPrimitveSkeleton(GeometricPrimitive primitive, Matrix view, Matrix projection, Color color)
{
    try
    {
        if (skeleton != null)
        {
            if (skeleton.TrackingState == SkeletonTrackingState.Tracked)
            {
                foreach (Joint joint in skeleton.Joints)
                {
                    var position = ConvertRealWorldPoint(joint.Position);
                    Matrix world = new Matrix();
                    world = Matrix.CreateTranslation(position);
                    primitive.Draw(world, view, projection, color);
                }
            }
        }
    }
    catch
    {

    }
}

So all we do in this function is check to see if the current Skeleton exists, and then we enumerate through each joint of the skeleton, drawing it to the screen. Note the ConvertRealWorldPoint function:

private Vector3 ConvertRealWorldPoint(Vector position)
{
    var returnVector = new Vector3();
    returnVector.X = position.X * 10;
    returnVector.Y = position.Y * 10;
    returnVector.Z = position.Z;
    return returnVector;
}

All we’re doing here is taking the postion from the Kinect SDK and scaling it up to be used in a 3D World. There are probably better approaches to this, so I’d like to see how anyone else does this. We then take that point and create a World Matrix to draw the primitve for each joint.

Once you’re done, run the sample and stand in front of your Kinect Sensor. The result should look something like this:

screen1

screen2

I’ve uploaded my version of the sample here so you can take a look and compare. Credit for the 3D Primitive Sample goes to Microsoft and the XNA Community Team. Let me know what you think in the comments, and please give me any feedback.

Enjoy,

Patrick Godwin

Microsoft releases Kinect for Windows SDK Beta

That’s right, the long awaited Kinect for Windows SDK, first demonstrated at MIX 2011, was released today during a Channel 9 Live event. My fellow Microsoft Student Insider, Dennis Delimarsky, was a member of the presentation, helping them show off the beta tools for Kinect.

The SDK for Windows has many features, from hundreds of pages of API Documentation to multiple samples, the Kinect for Windows SDK is a great way to interact with your technology. The Kinect SDK supports skeletal tracking and audio processing, as well as raw access to the data from the Kinect Sensor.

You can go ahead and download the Kinect SDK for free, educational use from here.

Until next time,
Patrick Godwin