Patrick Godwin's Blog-O-Rama!

Developers, developers, developers!

Spotlight Imagine Cup–Team A41 (Software Design)

TeamA41Photo

The first team I met with as a part of my Imagine Cup 2011 Coverage was Team A41, a Software Design team from Puerto Rico. Team A41 consists of Francisco Fernández, Amarilys Méndez, Roberto Durand, and Juan Martí of Universidad Metropolitana in Puerto Rico. They are mentored by Pedro Maldonado.

The team has developed a project called All 4 One Solutions, which is a suite of software that would allow scientists and organizations to crowd source research for environmental issues. It’s a mobile and web based solution designed for collecting as much data as possible.

The team identified an issue in the local ecosystem, and developed their project with that in mind. They feel that the database of information created by the crowd sourced efforts can be used to help scientists who face wildlife that destroy ecosystems around the world. An example they gave me was Coral Reef Bleaching. The data gathered can be sent to scientists around the world, allowing them to work without being around.

You can follow and root Team A41 on Twitter and Facebook. You can also support them in the People’s Choice Voting here. Be sure to follow me on Twitter as I get ready to go to New York for the World Wide Imagine Cup Finals next week. And follow the MSPSMT Hashtag here.

Until next time,
Patrick Godwin

Intro to the Kinect SDK–Adding Speech Recognition

For those of you who frequent this blog, you know a few days ago I wrote an introductory article on Kinect and XNA (link). In that article, I modified the Primitive 3D Sample from App Hub to render Joints from Kinect as Primitive Spheres. I’ve decided to build upon that sample, and leverage the Kinect’s NUI Microphone and the Microsoft Speech Recognition SDK to replace touch/keyboard input in the sample. I also refactored the previous sample a bit.

Before we get started, you need to make sure you have some pre-requisites installed:

(Note: The SDK and Runtime are x86, as the Kinect Language Pack is only x86 for now)

Now let’s dive into it. First thing we need to do is add a few more using statements to the project:

using Microsoft.Research.Kinect.Audio;
using Microsoft.Speech.AudioFormat;
using Microsoft.Speech.Recognition;
using System.IO;

Next, add these variables under the variables we created last time:

KinectAudioSource kinectSource;
SpeechRecognitionEngine speechEngine;
Stream stream;
string RecognizerId = "SR_MS_en-US_Kinect_10.0";
bool speechNotRecognized;

These variables go with the rest of the fields we declared in the last tutorial. I’ll explain what each one of these does later. Next, take the Kinect code from our LoadContent function:

nui = new Runtime();
nui.Initialize(RuntimeOptions.UseSkeletalTracking);
nui.SkeletonFrameReady += new EventHandler<SkeletonFrameReadyEventArgs>(nui_SkeletonFrameReady);
nui.NuiCamera.ElevationAngle = 0;

And move it to a new function called InitalizeKinect. Your LoadContent function should look like this now:

protected override void LoadContent()
{
    spriteBatch = new SpriteBatch(GraphicsDevice);
    spriteFont = Content.Load<SpriteFont>("hudfont");

    primitives.Add(new CubePrimitive(GraphicsDevice));
    primitives.Add(new SpherePrimitive(GraphicsDevice));
    primitives.Add(new CylinderPrimitive(GraphicsDevice));
    primitives.Add(new TorusPrimitive(GraphicsDevice));
    primitives.Add(new TeapotPrimitive(GraphicsDevice));

    wireFrameState = new RasterizerState()
    {
        FillMode = FillMode.WireFrame,
        CullMode = CullMode.None,
    };


    InitalizeKinect();

}

Let’s dive into that new InitalizeKinect function:

private void InitalizeKinect()
{

    nui = new Runtime();
    nui.Initialize(RuntimeOptions.UseSkeletalTracking);
    nui.SkeletonFrameReady += new EventHandler<SkeletonFrameReadyEventArgs>(nui_SkeletonFrameReady);
    nui.NuiCamera.ElevationAngle = 0;

    kinectSource = new KinectAudioSource();

    kinectSource.FeatureMode = true;
    kinectSource.AutomaticGainControl = false;
    kinectSource.SystemMode = SystemMode.OptibeamArrayOnly;

    var rec = (from r in SpeechRecognitionEngine.InstalledRecognizers() where r.Id == RecognizerId select r).FirstOrDefault();

    speechEngine = new SpeechRecognitionEngine(rec.Id);

    var choices = new Choices();
    choices.Add("color");
    choices.Add("shape");
    choices.Add("wireframe");
    choices.Add("exit");

    GrammarBuilder gb = new GrammarBuilder();
    gb.Culture = rec.Culture;
    gb.Append(choices);

    var g = new Grammar(gb);

    speechEngine.LoadGrammar(g);
    speechEngine.SpeechHypothesized += new EventHandler<SpeechHypothesizedEventArgs>(sre_SpeechHypothesized);
    speechEngine.SpeechRecognized += new EventHandler<SpeechRecognizedEventArgs>(sre_SpeechRecognized);
    speechEngine.SpeechRecognitionRejected += new EventHandler<SpeechRecognitionRejectedEventArgs>(sre_SpeechRecognitionRejected);

    Console.WriteLine("Recognizing Speech");

    stream = kinectSource.Start();

    speechEngine.SetInputToAudioStream(stream,
                  new SpeechAudioFormatInfo(
                      EncodingFormat.Pcm, 16000, 16, 1,
                      32000, 2, null));


    speechEngine.RecognizeAsync(RecognizeMode.Multiple);

}

This is a lengthy function, but each part is important. Notice at the top, we initialize our Runtime and Skeletal Tracking features like last time.

The next thing to notice is the KinectAudioSource, kinectSource. This is how we access the four microphone array on the Kinect Sensor in code. Right here we create a new instance of KinectAudioSource. We also turn off Automatic Gain and Echo Cancellation, so the speech recognition capabilities can work properly.

Now we need to grab our Kinect Language Recognizer. We do that by checking what speech recognizers exist on the machine, and grab the Kinect Language Recognizer.

We then use the recognizer information we grabbed to create a new instance of SpeechRecognitionEngine. We then create a new Choices object, and add all of the words we want the speech recognition engine to recognize.

Next, we need to create a GrammarBuilder that will help us build the Grammar object used by the SpeechRecognitionEngine object. We set the GrammarBuilder’s Culture using the Speech Recognizer we got earlier, and add our words to the GrammarBuilder.

Next we’re going to wire up a few event handlers. I’m not going to dive into each event handler here, as the only one that’s really important is the SpeechRecognized handler. I’ll explain that in a bit.

Finally, we want to set our stream variable equal to the Kinect’s audio stream. We then tell to use the speech engine to use this stream for audio recognition. The last thing this function does is call the RecognizeAsync function, which tells the SpeechRecognitionEngine object to start looking for recognized words.

Let’s dive into our SpeechRecognized event handler:

void sre_SpeechRecognized(object sender, SpeechRecognizedEventArgs e)
{
    speechNotRecognized = false;
    if (e.Result.Text == "color")
    {
        currentColorIndex = (currentColorIndex + 1) % colors.Count;
    }
    else if (e.Result.Text == "wireframe")
    {
        isWireframe = !isWireframe;
    }
    else if (e.Result.Text == "shape")
    {
        currentPrimitiveIndex = (currentPrimitiveIndex + 1) % primitives.Count;
    }
    else if (e.Result.Text == "exit")
    {
        Exit();
    }
    Console.Write("rSpeech Recognized: t{0} n", e.Result.Text);
}

This function is where the processing work of the recognized audio happens. The SpeechRecognizedEventArgs contains the result of the recognized speech. In the Result property, there is a Text property which you can use to compare against your choices. The code in here is pretty self explanatory. We simply change our rendering properties, currentPrimitiveIndex, isWireframe, and currentColorIndex, to be used later in the Draw function.

Those are most of the important changes. Take a look at the attached sample to see the rest of the minor changes to the Draw function and the other Event Handlers. You can download the entire sample here.

Let me know if you have any feedback on these samples, or if there is anything you can add. I’m always looking for advice, so I’m happy to hear from my readers.

Until next time,

Patrick Godwin

Intro to the Kinect SDK–Drawing Joints in XNA

It’s been a few days since Microsoft released the Kinect for Windows SDK, and we’re already seeing a lot of work being done. I decided to get my hands dirty and try out the fancy new skeletal tracking. First things first, you’re going to need to make sure you have some pre-requisites:

With all of these requirements satisfied, you can get started.

The first thing you need to do is add a few variables to the project:

SkeletonData skeleton;
Runtime nui;

These will be used to track data from the Skeleton provided by the Kinect SDK. These will be used later.

Next, in the LoadContent function of the Primitives3DGame class, add the following code:

nui = new Runtime();
nui.Initialize(RuntimeOptions.UseSkeletalTracking);
nui.SkeletonFrameReady += new EventHandler<SkeletonFrameReadyEventArgs>(nui_SkeletonFrameReady);
nui.NuiCamera.ElevationAngle = 0;

In these five lines of code we’ve initialized the Kinect Runtime engine to use the Skeletal Tracking feature of the Kinect, created an event handler for the SkeletonFrameReady event, and made sure the Kinect is not elevated from a previous use.

Next we’ll want to add some code to that nui_SkeletonFrameReady event handler:

void nui_SkeletonFrameReady(object sender, SkeletonFrameReadyEventArgs e)
{
    foreach (SkeletonData s in e.SkeletonFrame.Skeletons)
    {
        if (s.TrackingState == SkeletonTrackingState.Tracked)
        {
            skeleton = s;
        }
    }
}

This function is called every time the program gets skeleton data from the Kinect. We make sure that the SkeletonEngine is currently tracking a skeleton, and then we make a reference to that skeleton so it can be rendered later.

Now we want to modify the Draw function of the sample:

protected override void Draw(GameTime gameTime)
{
    GraphicsDevice.Clear(Color.CornflowerBlue);

    if (isWireframe)
    {
        GraphicsDevice.RasterizerState = wireFrameState;
    }
    else
    {
        GraphicsDevice.RasterizerState = RasterizerState.CullCounterClockwise;
    }

    Matrix view = Matrix.CreateLookAt(new Vector3(0, 0, -20), new Vector3(0, 0, 100), Vector3.Up);
    Matrix projection = Matrix.CreatePerspectiveFieldOfView(MathHelper.PiOver4,
                                                GraphicsDevice.Viewport.AspectRatio,
                                                1.0f,
                                                100);

    // Draw the current primitive.
    GeometricPrimitive currentPrimitive = primitives[currentPrimitiveIndex];
    Color color = colors[currentColorIndex];

    DrawPrimitveSkeleton(currentPrimitive, view, projection, color);

    // Reset the fill mode renderstate.
    GraphicsDevice.RasterizerState = RasterizerState.CullCounterClockwise;

    // Draw overlay text.
    string text = "A or tap top of screen = Change primitiven" +
                  "B or tap bottom left of screen = Change colorn" +
                  "Y or tap bottom right of screen = Toggle wireframe";

    spriteBatch.Begin();
    spriteBatch.DrawString(spriteFont, text, new Vector2(48, 48), Color.White);
    spriteBatch.End();

    base.Draw(gameTime);
}

There are a few things to note here, as this function is quite different than the draw function of the sample. The first thing is the view and projection matrices. We need to move the View Matrix further back from the Origin to get a better view of the skeleton. Then we simply pass any relevent data to the DrawPrimitiveSkeleton function, and allow that to draw all of the joints:

private void DrawPrimitveSkeleton(GeometricPrimitive primitive, Matrix view, Matrix projection, Color color)
{
    try
    {
        if (skeleton != null)
        {
            if (skeleton.TrackingState == SkeletonTrackingState.Tracked)
            {
                foreach (Joint joint in skeleton.Joints)
                {
                    var position = ConvertRealWorldPoint(joint.Position);
                    Matrix world = new Matrix();
                    world = Matrix.CreateTranslation(position);
                    primitive.Draw(world, view, projection, color);
                }
            }
        }
    }
    catch
    {

    }
}

So all we do in this function is check to see if the current Skeleton exists, and then we enumerate through each joint of the skeleton, drawing it to the screen. Note the ConvertRealWorldPoint function:

private Vector3 ConvertRealWorldPoint(Vector position)
{
    var returnVector = new Vector3();
    returnVector.X = position.X * 10;
    returnVector.Y = position.Y * 10;
    returnVector.Z = position.Z;
    return returnVector;
}

All we’re doing here is taking the postion from the Kinect SDK and scaling it up to be used in a 3D World. There are probably better approaches to this, so I’d like to see how anyone else does this. We then take that point and create a World Matrix to draw the primitve for each joint.

Once you’re done, run the sample and stand in front of your Kinect Sensor. The result should look something like this:

screen1

screen2

I’ve uploaded my version of the sample here so you can take a look and compare. Credit for the 3D Primitive Sample goes to Microsoft and the XNA Community Team. Let me know what you think in the comments, and please give me any feedback.

Enjoy,

Patrick Godwin

Microsoft releases Kinect for Windows SDK Beta

That’s right, the long awaited Kinect for Windows SDK, first demonstrated at MIX 2011, was released today during a Channel 9 Live event. My fellow Microsoft Student Insider, Dennis Delimarsky, was a member of the presentation, helping them show off the beta tools for Kinect.

The SDK for Windows has many features, from hundreds of pages of API Documentation to multiple samples, the Kinect for Windows SDK is a great way to interact with your technology. The Kinect SDK supports skeletal tracking and audio processing, as well as raw access to the data from the Kinect Sensor.

You can go ahead and download the Kinect SDK for free, educational use from here.

Until next time,
Patrick Godwin

Introducing the 2011 Imagine Cup MSP Social Media Team!

For those of you who follow the Imagine Cup on Facebook, you may have noticed that a new tab appeared today. The Inside Track, a new feature on the Imagine Cup Facebook Page, exists to highlight the new Microsoft Student Partner Social Media Team that will be covering this year’s Imagine Cup Worldwide Finals in New York City.

You may be asking, what is the Microsoft Student Partner Social Media Team? The SMT is a group of enthusiastic Microsoft Student Partners who will be attending the Worldwide Finals in New York City with the purpose of sharing the story of different teams at the competition.

I will be one of the members of the team in attendance. I plan on bringing you detailed blog posts about each team and the over all experience of the competition like I have with past Imagine Cup events. Feel free to check out the rest of the team here.’

Until Next Time,
Patrick Godwin