14 June 2023

Energy efficiency of programming languages on ARM and Intel

Currently FinOps is gaining traction. It’s all about running your software infrastructure in a cost effective way, and oftentimes this is in line with running software in an energy friendly way. Both are interesting from a technical efficiency point of view. Since I’m active in the field of software development I was wondering how this would translate to programming language efficiency. Surely there’s a lot of research out there already, like this extensive paper https://greenlab.di.uminho.pt/wp-content/uploads/2017/10/sleFinal.pdf by Rui Pereira et al. from 2017. It circulates on social media quite heavily.

This is an interesting study, and it runs a nice set of tests. And if you’re interested: the C language was the winner in the test, both in speed and energy efficiency. Java was about two times less energy efficient. But,,, it only uses the Intel architecture (Haswell Intel(R) Core(TM) i5-4460 CPU @ 3.20GHz to be specific).

I thought it would be really interesting to see how this would translate to ARM architectures. Luckily I was planning to go on a hackathon, so I set up a little research lab to investigate. Since I was lacking a research team and backing of a University I went for a poor man’s benchmarking approach with the following programming languages and hardware:

Languages & Architectures

  • Java 20 (Azul OpenJDK)
  • Python 3.11
  • Javascript running on node v20.3 (V8)
  • C (C11 Intel & C18 ARM)

I used Visual Studio Code for all but Java, where I used IntelliJ.

Reason for this selection: Java, Python and Javascript have been in the top 5 of several programming language indexes like the one from RedMonk that’s based on StackOverflow and Github data https://redmonk.com/sogrady/2022/03/28/language-rankings-1-22/

Hardware

  • Apple MacBook Pro M1 Pro (3,20 GHz) 32 GB RAM
  • Apple MacBook Pro M2 Max* (3,7 Ghz) 96 GB
  • Apple MacBook Pro 2,8 GHz Quad Core Intel i7

Reason for this hardware selection is that these MacBooks were available during the hackathon and they met my criteria.

Note: I couldn’t find info on the Apple site about the clock speed of the M1 and M2 directly. Probably because it’s no longer the key indicator of performance anyway. Clock speeds mentioned are based on info from several websites.

Energy consumption measurement approach

To measure the energy consumption I bought Smart Plugs from Iqonic after a little bit of research. This is one of the Smart Plugs brands that is based on Tuya hardware, and I took the gamble that I could use the Tuya developer platform to read the power usage of this plug via an API that’s normally only accessible via the Smart Life app. Turned out it worked as expected, mostly because of the excellent (Python based) TinyTuya project (https://github.com/jasonacox/tinytuya).

What I did in the measurements in order to get a fair reading is to first measure the idle power usage of the laptops after they were fully charged with minimal background applications running. After getting multiple readings (at least 5) with a consistent value I would start the test, which I would also repeat several times. Interesting finding: M1 and M2 MacBooks use just a few Watt in rest. During a test I would do around 50 power consumption readings after which I took the average value. I turned out all readings were very consistent, giving me confidence they were representative.

The actual benchmark test

Maybe this part is the weakest part of my research. I created just one test, but I did use a classic one: calculating prime numbers. In this case up to 200 Million in a straightforward way, see below (example is C code, calculation is same for all languages):

#include <stdio.h>
#include <stdlib.h>
#include <time.h>

int is_prime(int n) {
if (n <= 1) {
return 0;
}
for (int i = 2; i * i <= n; i++) {
if (n % i == 0) {
return 0;
}
}
return 1;
}

int main() {
time_t start = time(NULL);
for (int i = 2; i < 200000000; i++) {
is_prime(i);
}
time_t end = time(NULL);
double execution_time = (end - start) * 1000.0;
printf("Execution time: %.2f ms\n", execution_time);
return 0;
}

I used the number 200 million because the different programming languages all needed at least 10 seconds to complete the test and were able to finish within 10 minutes, making it possible to do meaningful readings while not taking up too much hackathon time. However…. Python took just waaaaaay too long for this specific test case. On the M1 it needed almost an hour to complete one test run. So we already have a clear loser in terms of performance. I decided to let Python therefore calculate primes up to 20 million, so I could at least get some data. Funny note is that all my test scripts were written in Python, already indicating speed is not everything.

After running a bunch of tests, I also tested Java in a Multi threaded set up, here’s the code for that:

public static void main(String[] args) throws InterruptedException {
 int n = 200_000_000;
 int numberOfThreads = Runtime.getRuntime().availableProcessors();
 Thread[] threads = new Thread[numberOfThreads];

 long startTime = System.currentTimeMillis(); 
 for (int i = 0; i < numberOfThreads; i++) {
     int finalI = i;
     threads[i] =
       new Thread(
           new Runnable() {
             @Override
             public void run() {
               for (int j = finalI * n / numberOfThreads; j < (finalI + 1) * n / numberOfThreads; j++) {
                 boolean testPrime = isPrime(j);
               }
             }
           });
 }
 for (Thread thread : threads) { thread.start();}
 for (Thread thread : threads) { thread.join();}
 System.out.println("Execution time: " + (System.currentTimeMillis() - startTime) + " milliseconds");
}

Test results

With that out of the way, let’s share the outcomes after lots of measurements:

Architecture Language Time (s) Consumption (W) Total (J) Test
M1 Java 102 6,1 622,2 Prime <200m
Javascript 140 4,6 644 Prime <200m
C 159 5,7 906,3 Prime <200m
Python* 360 5,6 2016 Prime <20m
Java parallel 17 38 646 Prime <200m
M2 C 145 9 1305 Prime <200m
Java 93 10 930 Prime <200m
Javascript 121 6 726 Prime <200m
Python* 125 16 2000 Prime <20m
Java parallel 12 38 456 Prime <200m
Intel C 451 19 8569 Prime <200m
Javascript 557 15 8355 Prime <200m
Java 462 14 6468 Prime <200m
Python* 305 23 7015 Prime <20m
Java parallel 119 43 5117 Prime <200m

* Watt is average reading during the test minus idle usage of the system ** Joule is the total energy used by multiplying W * Time to run the test

And put into graphs:

Conclusions

Okay Python is inefficient, because it’s an interpreted language right…. Well, Javascript is interpreted as well, but on the M1 and M2 it’s quite energy efficient.
And what happened to C on Intel? It runs faster, but in my measurements it wasn’t very energy efficient. And for those wondering: I used gcc to compile it (shipped in MacOs) with compiler flag -O2 for optimization. If I ran it with -O3 the compiler concluded it could skip the prime number calculations altogether since the output wasn’t used. Seems a bit unfair, so I didn’t do that.

But what is really fascinating to see is the power of the ARM architecture. It’s an order of magnitude faster and more energy efficient compared to Intel. The relative performance can be expressed by calculating efficiency relative to M1 by adding all energy consumption for all languages per platform. M2 then scores 1,12 and Intel a whopping 7,35 times less efficiently. Javascript and Java have comparable energy consumption on M1. The M2 is less energy efficient compared to the M1 for all languages, except for Javascript, which is just as efficient. However, if I use multi-threading the equation changes dramatically. M2 wins. It not only completes the test extremely fast, in 12 seconds and 30% faster compared to M1, it’s also the most energy efficient from all readings.

This last point made me extra happy, since I’m biased toward Java and I was a little bit shocked to see Javascript performing comparable or better than Java. Surely I understand multithreading isn’t always easy or achievable, but when push comes to shove, there is a way for Java to make a difference. I’m also sure that C programmers could perhaps point out that C can also run more efficiently. I was already happy that I got it to run, since I had no prior experience.

In general it’s clear that using an ARM architecture is very beneficial for both speed and energy efficiency. As long as you don’t use Python that matters more than which program language you’re using. So, if you need an excuse to get that new MacBook from your boss, here you have it.

bonus

Our CTO Bert Jan Schrijver rewrote the parallel processing Java code as follows:

IntStream.range(0, n).parallel().forEach(ParallelFindPrime::isPrime);

About the same performance, just one line ;-)


Roy Wasse

Co-Founder and Director at OpenValue. Passionate about software development.