Friday, January 29, 2016

Arduino 32-bit Speed Test Shootout!

I benchmarked some Arduino 32-bit boards, including an Arduino 101, Arduino Due, Arduino Zero clone, and a chipKIT Uno32. I threw an Arduino Uno into the mix for comparison. Here are the results.

Spoiler: The chipKIT Uno32 won. :)

Test Setup

To benchmark these boards, I used an excerpt from the Arduino Show Info program. The sketch includes several speed tests that examine GPIO manipulation speed and raw computational speed. While each of the 32-bit boards tested is very fast, optimization in the IDE also comes into play when measuring performance. Even a 84MHz processor won't make up for poor code structure and inefficient register manipulation in the libraries. So instead of just comparing specs, let's put each board through tests that measure completion time of commonly used program functions.

For the Arduino boards, I used Arduino IDE 1.6.7. For the chipKIT Uno32, I used Mpide 0023.

Benchmarking Program Download: Shared on Github

Please examine this program to see exactly how each  test is structured.


Results

Here are the benchmarking results. Each data point is the time to completion of the test in microseconds. Faster is better. The fastest board in each test is highlighted in green. I also highlighted two concerning results. Two of the tests could not be run on two of the boards due to compilation problems, as indicated.


The results of the speed tests.

Analysis

The Arduino Due and chipKIT Uno32 were the clear winners here. Despite the release of newer Arduino boards, the Due can still hold its own in raw computational speed. There is no doubt that it will be relevant for some time to come with its SAM3X8E microcontroller and large number of GPIO. The dark horse in the test, the chipKIT Uno32, did extremely well. In fact, I have to say it was the winner. It had very fast and consistent GPIO manipulation speed, awesome floating point performance, and it was nearly the equal of the Due in integer math. How much of this is due to the microcontroller versus IDE optimization? I cannot say. But hats off to the creators of that nice little board. It seems to be discontinued, but you can still find it for sale at about $15 cheaper than a Due!

The Arduino 101 did pretty well considering the lower clock speed. It is not a "Due killer" and was never intended to be. While the raw computational speed isn't as fast as I was hoping for, we may see improvements for this board as the IDE is optimized and we get access to the RTOS under the hood. Also, keep in mind that speed is only part of the story. Do any of the other boards have Bluetooth and an accelerometer on board? No. As a potential Arduino Uno successor, the Arduino 101 is a good addition to the lineup.

Speaking of an Arduino Uno successor, the Arduino Zero has been billed as just such a board. I don't have an official Zero, but I did test Sparkfun's take on it, their SAMD21 Dev Board. Sadly, I was a bit disappointed. Sure the SAMD21 chip turned in some good numbers, but I feel that the Arduino Zero is too expensive for what it offers. It is slightly more expensive than a Due with lower performance and less GPIO. With the Arduino 101 out at $25 less in price, where does the Zero fit in? Less expensive Zero-compatible boards might be good alternatives if you want to explore the Cortex M0+ SAMD21 microcontroller.

The Arduino Uno rocked it! Well, ok, it looks pretty slow compared to the other boards. However, remember that it is extremely unfair to pit an 8-bit microcontroller against modern 32-bit devices in speed tests. The Uno still has enough processing power for the majority of hobbyist projects. It is a classic board that will be around for years to come.

There were two concerning results that I highlighted on the table. Analog read performance on the Arduino Zero clone is abysmal! There has to be something wrong in the IDE there. Nearly half a millisecond to read an analog pin is unacceptable. Hopefully it will be fixed in the near future. Also, dstostrf() speed on the Arduino 101 was far too slow. Once again, this has to be due to optimization problems in the IDE. That is not a commonly used function so I doubt many people will notice.

Conclusion

That's it! I hope you found this shootout useful. Download the sketch above and benchmark your own boards and microcontrollers, it's a lot of fun.

If you would like to see a similar shootout with small AVR, PIC and ARM microcontrollers, such as the ATtiny and STM32F030 devices, please post in the comments below. Also post any comments and corrections.

Thanks for reading!

- Dan W.

6 comments:

  1. nice work! by the way, do u know how many floating point precision of the Arduino 101?

    ReplyDelete
  2. Thanks for your benchmark! I just ran it on the teensy 3.6 at default 180mhz.


    Speed test
    ----------
    F_CPU = 180000000 Hz
    1/F_CPU = 0.0056 us
    nop : 0.006 us
    digitalRead : 0.084 us
    digitalWrite : 0.192 us
    pinMode : 0.192 us
    multiply byte : 0.039 us
    divide byte : 0.047 us
    add byte : 0.039 us
    multiply integer : 0.033 us
    divide integer : 0.039 us
    add integer : 0.033 us
    multiply long : 0.032 us
    divide long : 0.049 us
    add long : 0.033 us
    multiply float : 0.044 us
    divide float : 0.124 us
    add float : 0.044 us
    itoa() : 0.279 us
    ltoa() : 1.099 us
    dtostrf() : 15.474 us
    random() : 0.399 us
    y |= (1<<x) : 0.027 us
    bitSet() : 0.028 us
    analogReference() : 0.122 us
    analogRead() : 6.599 us
    analogWrite() PWM : 0.534 us
    delay(1) : 1000.499 us
    delay(100) : 100000.000 us
    delayMicroseconds(2) : 2.001 us
    delayMicroseconds(5) : 5.004 us
    delayMicroseconds(100) : 100.049 us
    -----------

    ReplyDelete
  3. ESP32 at 160mhz with analog code removed and no analog read/write as that party of the SDK is not finished yet.

    Speed test
    ----------
    F_CPU = 160000000 Hz
    1/F_CPU = 0.0062 us
    nop : 0.006 us
    digitalRead : 0.216 us
    digitalWrite : 0.168 us
    pinMode : 0.526 us
    multiply byte : 0.056 us
    divide byte : 0.056 us
    add byte : 0.050 us
    multiply integer : 0.080 us
    divide integer : 0.083 us
    add integer : 0.080 us
    multiply long : 0.078 us
    divide long : 0.073 us
    add long : 0.080 us
    multiply float : 0.078 us
    divide float : 1.398 us
    add float : 0.078 us
    itoa() : 1.083 us
    ltoa() : 1.098 us
    dtostrf() : 17.198 us
    random() : 0.673 us
    y |= (1<<x) : 0.067 us
    bitSet() : 0.067 us
    delay(1) : 999.998 us
    delay(100) : 100000.000 us
    delayMicroseconds(2) : 2.010 us
    delayMicroseconds(5) :

    ReplyDelete
  4. ESP8266 at 160mhz - analogReference commented out - WDT triggers on longer delay check

    Speed test
    ----------
    F_CPU = 160000000 Hz
    1/F_CPU = 0.0062 us
    nop : 0.006 us
    digitalRead : 0.299 us
    digitalWrite : 0.216 us
    pinMode : 0.781 us
    multiply byte : 0.050 us
    divide byte : 0.201 us
    add byte : 0.050 us
    multiply integer : 0.074 us
    divide integer : 0.229 us
    add integer : 0.067 us
    multiply long : 0.074 us
    divide long : 0.224 us
    add long : 0.068 us
    multiply float : 0.369 us
    divide float : 1.874 us
    add float : 0.344 us
    itoa() : 0.634 us
    ltoa() : 4.599 us
    dtostrf() : 22.674 us
    random() : 1.274 us
    y |= (1<<x) : 0.055 us
    bitSet() : 0.056 us
    analogRead() : 0.399 us
    analogWrite() PWM : 5.314 us
    delay(1) : 1009.999 us
    delay(100) : 100025.000 us
    delayMicroseconds(2) :
    Soft WDT reset

    ReplyDelete
  5. Hi Dan, thanks for your great speed test! I was wondering if you still have a copy of the spreadsheet you used to create your comparison chart? Could you maybe upload it to Google docs and share? Maybe send it to me and I could do that? Thanks!

    ReplyDelete
  6. ESP32 @240mhz with 11-29-2016 Arduino Version


    Speed test
    ----------
    F_CPU = 240000000 Hz
    1/F_CPU = 0.0042 us
    nop : 0.004 us
    digitalRead : 0.154 us
    digitalWrite : 0.111 us
    pinMode : 2.559 us
    multiply byte : 0.037 us
    divide byte : 0.036 us
    add byte : 0.033 us
    multiply integer : 0.053 us
    divide integer : 0.054 us
    add integer : 0.053 us
    multiply long : 0.051 us
    divide long : 0.049 us
    add long : 0.053 us
    multiply float : 0.051 us
    divide float : 0.924 us
    add float : 0.054 us
    itoa() : 0.719 us
    ltoa() : 0.699 us
    dtostrf() : 11.449 us
    random() : 0.449 us
    y |= (1<<x) : 0.045 us
    bitSet() : 0.045 us
    delay(1) : 999.999 us
    delay(100) : 100000.000 us
    delayMicroseconds(2) : 2.006 us
    delayMicroseconds(5) : 4.999 us
    delayMicroseconds(100) : 99.999 us
    -----------

    ReplyDelete