Teensy 3.0 Serial Lag

For Adafruit customers who seek help with microcontrollers

Moderators: adafruit_support_bill, adafruit

Please be positive and constructive with your questions and comments.
redfieldp
 
Posts: 11
Joined: Thu Apr 11, 2013 4:15 pm

Teensy 3.0 Serial Lag

Post by redfieldp »

Hello!

I am currently running a really basic cinder sketch, and then sending colors to 250 (10 strands) of the WS2801 LED strips. The strips are wired to a Teensy 3.0.

The issue I'm having is the lag between the computer/cinder sketch and the teensy over serial. If you code pixel changes in the teensy, it's super fast, and the cinder sketch is very basic/fast. However, when I start sending pixel data (index, r, g, b) over serial I cap out at 5 fps before the teensy crashes under the load.

I'm trying to figure out if there's an optimization I could use on the Teensy, or an alternate protocol I could use. Currently the cinder sketch writes a string such as this at a rate of 5 fps:
0,78,78,78 1,78,78,78 2,78,78,78 3,78,78,78 4,78,78,78 5,78,78,78 6,78,78,78 7,78,78,78 8,78,78,78 9,78,78,78 10,78,78,78 11,78,78,78 12,78,78,78 13,78,78,78 14,78,78,78 15,78,78,78 16,78,78,78 17,78,78,78 18,78,78,78 19,78,78,78 20,78,78,78 21,78,78,78 22,78,78,78 23,78,78,78 24,78,78,78 25,78,78,78 26,78,78,78 27,78,78,78 28,78,78,78 29,78,78,78 30,78,78,78 31,78,78,78 32,78,78,78 33,78,78,78 34,78,78,78 35,78,78,78 36,78,78,78 37,78,78,78 38,78,78,78 39,78,78,78 40,78,78,78 41,78,78,78 42,78,78,78 43,78,78,78 44,78,78,78 45,78,78,78 46,78,78,78 47,78,78,78 48,78,78,78 49,78,78,78 50,78,78,78 51,78,78,78 52,78,78,78 53,78,78,78 54,78,78,78 55,78,78,78 56,78,78,78 57,78,78,78 58,78,78,78 59,78,78,78 60,78,78,78 61,78,78,78 62,78,78,78 63,78,78,78 ... 249,78,78,78 E
(Ellipses added to reduce size of data snippet)

Any faster than that, and the teensy crashes. On the Teensy side I am simply doing a Serial.available() check every loop and then reading in the data until it terminates with the E.

Any thoughts would be very much appreciated!

redfieldp
 
Posts: 11
Joined: Thu Apr 11, 2013 4:15 pm

Re: Teensy 3.0 Serial Lag

Post by redfieldp »

One addendum: when I yank any sort of delay from the Teensy sketch, I can actually get up to about 10fps, but an attempt to get higher than that still causes the Teensy to crash...

User avatar
westfw
 
Posts: 2010
Joined: Fri Apr 27, 2007 1:01 pm

Re: Teensy 3.0 Serial Lag

Post by westfw »

I'm trying to figure out if there's an optimization I could use on the Teensy, or an alternate protocol I could use.
Well, you could certainly speed things up by using some sort of binary-based protocol instead of the ascii you're using now. 3:1 compression on the serial link, plus getting rid of the decimal->binary conversion...

But first, you ought to figure out why it's "crashing." Merely overloading the cpu ought not cause a "crash."

tldr
 
Posts: 466
Joined: Thu Aug 30, 2012 1:34 am

Re: Teensy 3.0 Serial Lag

Post by tldr »

westfw wrote:But first, you ought to figure out why it's "crashing." Merely overloading the cpu ought not cause a "crash."
i'd guess buffer overflow.

redfieldp
 
Posts: 11
Joined: Thu Apr 11, 2013 4:15 pm

Re: Teensy 3.0 Serial Lag

Post by redfieldp »

I hear what you're saying about buffer overflow. The problem is that when I input the same data to the Teensy directly, there is a lag as it receives it, but the parsing is super-quick. It seems like the "read" is what's taking time, not the actual parsing. That's why I'm trying to figure out a way to get it over the wire faster.

One thought i had was that I could pre-buffer 10-15 frames, and send them all at once, since the USB packet would handle that in one go, and maybe that would reduce the I/O lag. Regardless, when I hard code it on either side (teensy or cinder) it's super fast, so the serial I/O is definitely what's slowing me down.

Thanks again.

User avatar
paulstoffregen
 
Posts: 444
Joined: Sun Oct 11, 2009 11:23 am

Re: Teensy 3.0 Serial Lag

Post by paulstoffregen »

How are you transmitting the data? And are you using Windows?

The main problem that usually comes up is attempting to transmit data 1 byte at a time. Only Macintosh OS-X is smart enough to recognize you've queued up hundreds or even thousands of single-byte transfers and combine them together. Windows and Linux will happily put each individual byte into its own USB transfer. The result is a USB packet with only a single byte, and the computer's host controller producing an interrupt and requiring service from the host controller driver and the serial device driver. It's horribly inefficient.

When you transmit a large block of data as a single write, as in a single call to WIN32 WriteFile() or Linux/Mac write(), the entire block of data is given to the host controller chip (in your computer) as a single transfer. The host controller automatically partitions the data into 64 byte USB packets and sends them as fast as possible (which depends on many factors, but it is usually very fast). Your computer doesn't spend any CPU time on each packet. The data is all moved automatically by DMA from the controller chip and your computer gets a single interrupt when all the data has been sent.

The single-byte-at-a-time approach makes terrible use of USB bandwidth and requires a lot more CPU time on your computer, but it's actually not too bad on Linux and Mac. However, Windows has a terrible deficiency in its USB drivers. Windows can only schedule a single USB transaction from the serial driver in each 1ms USB frame. If you write 1 byte at a time, that 1 byte uses up your only opportunity to send a transfer until the next frame. You can never get more than 1 kbyte/sec speed on Windows this way. Linux and Mac can run about 50 to 100 times faster (which is still far short of what USB can do with large block transfers).

So that's my guess. Had you posted a link to your project's actual code, I would have spent the time writing this message actually figuring out why it's so slow. Without any info, all I can do is guess. But pretty much every time these horribly slow speed issues come up, it's almost always delays in the code or 1-byte-at-a-time on Windows.

redfieldp
 
Posts: 11
Joined: Thu Apr 11, 2013 4:15 pm

Re: Teensy 3.0 Serial Lag

Post by redfieldp »

I am on a Mac, and I am sending the data all as one chunk, identical to the one quoted in my above code. I read a frame for each cycle of loop() that goes by. I have tried an alternate version where I read all the data in the buffer each time, and the giant reads simply slow it down so much that it's the same result for a different reason. Since it is frame-based data, my Teensy code, which is below, processes one frame at a time. Anything more than 14.5fps, and the serial on the Teensy crashes...

Code: Select all

#include "SPI.h"
#include "Adafruit_WS2801.h"

int RGB[4]; // RGB Values: First is pixel ID, Next 3 are R,G,B

uint8_t dataPin  = 2;    // Yellow wire on Adafruit Pixels
uint8_t clockPin = 3;    // Green wire on Adafruit Pixels

int stripLength = 250;

boolean batchComplete = false;

String inputString = "";

Adafruit_WS2801 strip = Adafruit_WS2801(stripLength, dataPin, clockPin);

void setup()
{
  Serial.begin(115200); // USB is always 12 Mbit/sec
  
  
  strip.begin();
  // Update LED contents, to start they are all 'off'
  strip.show();
}

void loop()
{
  if (batchComplete) {
    int ledIndex = 0;
    int firstLEDIndex = 0;
    String currentLED;

    ledIndex = inputString.indexOf(" ");

    //Serial.println(inputString.length());

    while (ledIndex != -1){

      //Serial.println(ledIndex);
      currentLED = inputString.substring(firstLEDIndex, ledIndex);

      // Parse commas out of string
      //Serial.print("Parsing string: ");
      //Serial.println(currentLED);

      int data[4];
      int numArgs = 0;

      int beginIdx = 0;
      int idx = currentLED.indexOf(",");
      int endPoint = currentLED.indexOf("E");

      String arg;
      char charBuffer[16];

      while (idx != -1)
      {
        arg = currentLED.substring(beginIdx, idx);
        arg.toCharArray(charBuffer, 16);
        RGB[numArgs++] = atoi(charBuffer);
        beginIdx = idx + 1;
        idx = currentLED.indexOf(",", beginIdx);
      }

      // Grab last arg
      arg = currentLED.substring(beginIdx);
      arg.toCharArray(charBuffer, 16);
      RGB[numArgs++] = atoi(charBuffer);

//      for (int i = 0; i < 4; i++) {
//        Serial.print("Parsed: ");
//        Serial.println(RGB[i]);
//      }
//      Serial.print("Setting Pixel ");
//      Serial.println(RGB[0]);
//     
//      Serial.print("To Color: ");
//      Serial.print(RGB[1]);
//      Serial.print(", ");
//      Serial.print(RGB[2]);
//      Serial.print(", ");
//      Serial.print(RGB[3]);
//      Serial.println("");
      
      strip.setPixelColor(RGB[0], Color(RGB[1],RGB[2], RGB[3]));

      firstLEDIndex = ledIndex+1;
      ledIndex = inputString.indexOf(" ", firstLEDIndex);

      if (ledIndex >= endPoint) {
        //Serial.print("Breaking at LED Index ");
        //Serial.println(ledIndex);
        break;
      }
    }

    //Serial.println("Resetting input String");
    inputString = "";

    strip.show();
    batchComplete = false;
  }

  while (Serial.available()) {
    char inChar = (char)Serial.read();
    if (!batchComplete) {
      if (inChar == 'E') {
        inputString += inChar;
        batchComplete = true;
        //Serial.println(inputString);
      }
      else {
        inputString += inChar;
        //Serial.println(inputString);
      }
    }
  }
}

/* Helper functions */
// Create a 24 bit color value from R,G,B
uint32_t Color(byte r, byte g, byte b)
{
  uint32_t c;
  c = r;
  c <<= 8;
  c |= g;
  c <<= 8;
  c |= b;
  return c;
}



User avatar
paulstoffregen
 
Posts: 444
Joined: Sun Oct 11, 2009 11:23 am

Re: Teensy 3.0 Serial Lag

Post by paulstoffregen »

But what software is transmitting from your Mac to Teensy? Are you sure it's written efficiently?

Perhaps you should give this latency test code a try?

http://forum.pjrc.com/threads/7826-USB- ... -I-O-delay

As you can see from the test results, 8000 bytes to Teensy 3.0 should take about 45 ms. If you run this benchmark and it is fast, perhaps you can use its source code as a model for your application?

redfieldp
 
Posts: 11
Joined: Thu Apr 11, 2013 4:15 pm

Re: Teensy 3.0 Serial Lag

Post by redfieldp »

The transmitting code is an extremely efficient Cinder program that can run at 60fps if necessary, and does a send for each frame.

Perhaps that's the issue then? Based on the sample I provided, I'm transmitting 2750 bytes per frame. If the teensy can only do 8000 bytes/45 sec, that's about 22.2 8000 byte packets, or bytes, 177,777 per second if you're doing nothing else. This would mean at absolute maximum I could get 64 FPS (or 177,777/2750). However, then factor in that there's probably overhead from my smaller packet size, and that I have to set all the LED's, it seems conceivable that I could lose that much speed. Is there any other (faster) way to communicate with the teensy and set values?

User avatar
westfw
 
Posts: 2010
Joined: Fri Apr 27, 2007 1:01 pm

Re: Teensy 3.0 Serial Lag

Post by westfw »

String inputString = "";
Rewrite the code to not use Strings!
I'll assume that Teensy-3 has all the needed fixes for the String class, since Paul found most of those bugs, but it's still going to be a relatively inefficient way to get data in and out of your comm protocol.
If the teensy can only do ... 177,777 bytes per second
177kBytes/sec is 1.4Mbps, which is an awful lot to expect from a "Serial port", even if it's been implemented "virtually" over USB.

User avatar
paulstoffregen
 
Posts: 444
Joined: Sun Oct 11, 2009 11:23 am

Re: Teensy 3.0 Serial Lag

Post by paulstoffregen »

I am going to work on this speed issue. 177 kbytes/sec is pretty good compared to normal Arduino boards, but it really ought to be possible to get about 1 Mbyte/sec if no other USB devices are consuming much bandwidth.

A big part of the problem is reading the data 1 byte at a time.

String is the Arduino functionality that always get the blame for every problem. I fixed the bugs years ago, but the Arduino Team did not accept the malloc/realloc/free fix I submitted until Arduino 1.0.4 (they did merge most of my String fixes, but with the malloc bugs remaining it was unusable). Until recently, everyone who's tried String on official Arduino has found it terribly unstable, so String has a horrible reputation.

String does add some overhead, especially when you grow the length of the string. However, in a program like this where the String object is global scope, it will retain the memory from previous runs, so this slowness is only suffered when you receive a longer string than you've previously ever received. String has a reserve() function (which I added, but the Arduino Team never docmented it on their website) which lets you reserve the memory ahead of time. It will eliminate most of String's slowness for a problem like this.

But processing 1 byte at a time is always going to be slow. The usual approach of calling Serial.available() first and then Serial.read() is doubly slow. They're both virtual functions, so there's extra overhead calling them, and both have to do pretty much the same work. You can get a minor speedup by calling only Serial.read(), storing the value into an "int" (on Teensy 3.0, 32 bit int and 8 bit char are the same speed) and check only use the integer if it's positive. Negative values indicate no data was actually available.

The big performance problem, and the main reason why I haven't done much on this so far, is because Arduino's API lacks a block read function. It has block write, so you can send data quickly. That makes a huge difference.

I've brought up the block read API on the Arduino developers mail list a few times. That's the official path to building consensus for Arduino API changes. So far, there's been very little interest. Few people have ever replied, and none of the Arduino Team have. I'm usually happy to add small extensions on top of the normal Arduino functions, but big API changes are pretty serious. I prefer to work with the Arduino folks and contribute the improvements back. But in this case, there seems to be nearly zero interest.

Stream does have a ReadBytes() function defined. Currently Stream's implementation just calls read() one byte at a time. I'm not a C++ expert, but I think this function might be overridable with an optimized version. The other ones that are overridden are virtual, but maybe this is possible without using virtual, as long as it's not used through the base Stream class.

I really do care about building a fast platform than can let you access the full USB speed (in practive, max is about 1 Mbyte/sec) with easy Arduino functions. I will work on improving this speed. Ultimately, when I do improve this, you'll need to use ReadBytes or something similar to fetch the incoming data. Processing 1 byte at a time will always be slow.

redfieldp
 
Posts: 11
Joined: Thu Apr 11, 2013 4:15 pm

Re: Teensy 3.0 Serial Lag

Post by redfieldp »

I totally get you on the communications issues, and it sounds like it may be a combination of the single-byte read, as well as the limitations of the serial interface.

Will I do better if I connect over ethernet? I'm happy to use any options the board offers, my main concern is just getting the data across the wire at a speedy data rate, and doing so reliably. If the teensy is the wrong board for the job, I can change platforms too, but it seems like there should be a viable solution...

Thanks for taking all the time to look into this Paul!

User avatar
westfw
 
Posts: 2010
Joined: Fri Apr 27, 2007 1:01 pm

Re: Teensy 3.0 Serial Lag

Post by westfw »

Doesn't Teensy3 support enumerating as other types of USB device? Ones that wouldn't be constrained by the APIs developed for low-speed serial interfaces? Fundamentally, if you're trying to update 250 pixel values at significant frame-rates, shouldn't you be using some sort of bulk transfer protocol? Maybe Base Class 6 ("Still image" devices)?

(alas, using less common device types can make the host-side software much more complicated.)

(You mentioned Ethernet. I think a full-fledged ethernet interface would be a disaster, but having the teensy enumerate as a "USB Ethernet" device, and treating each "packet" as a display update might provide a usable (on both sides) system.)

User avatar
paulstoffregen
 
Posts: 444
Joined: Sun Oct 11, 2009 11:23 am

Re: Teensy 3.0 Serial Lag

Post by paulstoffregen »

USB virtual serial does indeed use USB bulk protocol. The underlying protocol is actually very fast, with the caveat that you must transmit in large blocks (especially on Windows). The problem is Arduino's API, not the communication protocol.

Teensy3 does support several USB protocols, but the others currently supported are all based on USB interrupt type, which does not allow using the entire USB bandwidth. I have been considering adding USB ethernet (RNDIS) support, but at this point it's only in the planning phase. USB ethernet also uses the bulk protocol. The RNDIS protocol adds many bytes between packets, and of course TCP/IPv4 adds 40 bytes of overhead per packet or UDP/IPv4 adds 28 bytes. So at the procotol level, RNDIS+IPv4+UDP is not as fast as CDC-ACM based virtual serial which has no protocol overhead (other than the low-level USB protocol itself). CDC-ACM simply sends all the data directly using USB bulk protocol.

USB ethernet is really compelling to support latency sensitive applications like OSC and E1.31 (DMX lighting), where the software is already written to use UDP/IP networking. Especially for OSC, people currently use proxy applications to convert serial protocols, but those proxies are subjected to non-realtime operating system scheduling latency. That could also be solved by kernel-level driver programming, and possibly custom protocols, but that is difficult programming. USB ethernet (hopefully) would allow a better low-latency path on the PC side. But implementing USB ethernet on a microcontroller and integrating it nicely with Arduino's APIs isn't exactly easy either.....

But the real speed problem here is the Arduino API. The blocking nature of Adafruit's WS2801 library also doesn't help, but that too is related to the lack of a non-blocking SPI library (which isn't feasible on 8 bit AVR without DMA, but could be created on ARM). Making 2 vtable-based function calls (Serial.available and Serial.read) for each individual byte, both of which incur the overhead of managing USB packet details, is always going to be slow.

The protocol is fast. The hardware is capable. But software design is the limitation. Even on the PC side, just look the terribly slow 1 kbyte/sec speed Windows achieves with applications that write 1 byte at a time, despite running many-GHz 64 bit multi-core CPU with gigabytes of RAM and incredibly fast buses. Likewise on the microcontroller side, software is the limitation. Arduino just doesn't define an efficient many-byte read API.

I'm going to have to add this (realistically, sometime after Maker Faire) to support these types of applications where people need more PC-to-Teensy speed. Maybe I'll give it another try for discussion on the Arduino developers mail list? But I've already tried a few times. Maybe I should just do it, and hope it doesn't cause all sorts of compatibility trouble later if the Arduino Team ever adds sometime similar to official Arduino?

User avatar
westfw
 
Posts: 2010
Joined: Fri Apr 27, 2007 1:01 pm

Re: Teensy 3.0 Serial Lag

Post by westfw »

at the procotol level, RNDIS+IPv4+UDP is not as fast as CDC-ACM based virtual serial which has no protocol overhead (other than the low-level USB protocol itself). CDC-ACM simply sends all the data directly using USB bulk protocol.
I wasn't expecting IP/TCP or even real ethernet; just something with an API along the lines of "here's an array of bytes" and having it show up at the other end looking the same (raw packet interface, essentially.) Is there a CDC-ACM sub-class that implements a message structure rather than a byte stream? My general experience is that you can piddle away a lot of performance when you embed a block of data in a bytestream, just by having to deal with the parsing of the bytestream to find the beginning and end of the block. It's hard to imagine that a block read function for arduino's serial (whose byte API is pretty lightweight, all things considered, at least for pure serial) would result in much overall improvement when the core user code still looks like:
****EDIT**** I left out the bad code that I was complaining about. Here it is!

Code: Select all

      while (idx != -1)
      {
        arg = currentLED.substring(beginIdx, idx);
        arg.toCharArray(charBuffer, 16);
        RGB[numArgs++] = atoi(charBuffer);
        beginIdx = idx + 1;
        idx = currentLED.indexOf(",", beginIdx);
      }
The next segment of code is the suggested improvement!
****End of EDIT****

Code: Select all

void loop()
{
    if (!batchcomplete) {
        /*
         * Read new data as it comes in, without blocking
         * (note: don't read new data if we haven't used the last frame.)
         */
        int c = Serial.read();
        if (c > 0) {
            switch (readstate) {
            case READ_LEN:
                paklen = paklen<<8 + c;  // assemble multi-byte length
                if (--datalen == 0) {  // end of length field?
                    readstate = READ_DATA;
                    datalen = paklen;
                    datap = &newdata[0];
                }
                break;
            case READ_DATA:
                *datap++ = (uint8_t) c;
                if (--datalen == 0) { // end of data?
                    newcount = paklen;
                    batchcomplete = 1;
                    paklen = 0;   // reset length
                    datalen = 2;  // two bytes of "length"
                    readstate = READ_LEN;
                }
            }
        }
    } else {
        /*
         * Batch is complete; copy the new frame to the display 'buffer'
         */
        bcopy(&newdata[0], &RGB[0], newcount);
        batchcomplete = 0;    // ready for next frame
    }

    display_strip_nonblocking(RGB, ...);
}
A nice thing about this code is that if you DO get a block-oriented replacement for serial.read, it would drop right in to the READ_DATA state code...
Last edited by westfw on Sat May 04, 2013 8:07 pm, edited 1 time in total.

Locked
Please be positive and constructive with your questions and comments.

Return to “Microcontrollers”