Performance when Scripting with Kotlin

Jul 3, 2023 · 5 minute read
code
kotlin

Not too long ago, I came across Kaushik’s blog post and repo about Kotlin scripting. Around the same time, I was trying to automate a process of cleaning up some images I had. Given a set of transparent pngs, I wanted to find out if the last line was a solid line, and, if so, and if nothing is touching that line, remove it. While typically, I turn to using Pillow and Python for image manipulation scripts, I decided to give writing a Kotlin script a try instead.

I wrote the script (using javax.imageio) and tested it and everything was great. I then proceeded to combine my Kotlin script with the shell to run it on the set of images like this:

# look at directories 351, 352, ... 360
for i in `seq 350 360`; do
   DIR="/path/$i"
   cd $DIR
   # we have 15 images in each directory, sub_image_0.png, sub_image_1.png, ...
   for j in `seq 0 14`; do
      IMAGE=sub_image_$j.png
      kotlin /Users/ahmedre/Documents/code/kotlin/line_finder/finder.main.kts $IMAGE
   done
   cd -
done

The script loops on a set of directories, each with a number, and on a set of 15 images inside that directory. For each image, I run the Kotlin script, passing it as a parameter. The surprise to me came when I found that running this took ~2.5 minutes! 150 images isn’t a massive amount, and these aren’t very large images, so why does it take so long?

KScript

At this point, I remembered having come across KScript at some point. It provides a wrapper around kotlinc, caching script compilation among other things. I replaced /usr/bin/env kotlin with /usr/bin/env kscript in the script, and replaced kotlin with kscript. This brought the time down to ~1 minute and 12 seconds.

I continued reading, after which I found that KScript has a package option used to deploy scripts as standalone binaries. I ran kscript --package finder.main.kts and took the finder.main binary and replaced it in the for loop above. This brought the time down to ~29 seconds.

Compiled Jar

Maybe it’s slow due to Kotlin Scripting, I thought. What if I make a compiled jar instead? I modified the script (adding a main method, etc), and used kotlinc to build a jar, using kotlinc finder.kt -include-runtime -d finder.jar to generate a new jar, and used java -jar finder.jar in place of the existing Kotlin command in the loop. Running this took ~28 seconds also.

This makes sense, since it seems that KScript is precompiling and caching the compiled script.

The Culprit

Bringing down the run to 28 seconds is great, but still felt way too long for a set of 150 images. As I was thinking about this while rerunning the script, I noticed something interesting - every few runs of the script, I’d get a dock icon in macOS, which would then disappear, followed by another one. This brought me to a realization - what if the reason this is so slow is that we are processing a single file per run, causing the jvm to spawn once for each image processed.

Going back to the original script and measuring the time of the entire method, I saw that processing an image takes roughly 75ms. Based on this, the expected time would be 75ms * 150 images =~ 11.25 seconds. Moreover, updating the initial shell loop to add a time in front of each run shows runs that are taking a bit longer (over 110ms) per run. The combined signal from these should have been enough to consider this optimization sooner.

What if we modified the script to run on each directory of images instead of on each image? I modified the script to handle a directory at a time, and re-ran the loop, without the inner loop from 0-14. The updated times were really surprising.

Running the Kotlin Script with kotlin: 15 seconds.
Running the Kotlin Script with kscript: 16 seconds.
Running the Kotlin Script with a packaged kscript: 10 seconds.
Running with a compiled Jar: 11 seconds.

The real issue here was the cost of starting up a jvm for each file. Partially combining the files (to do 10 jvm process starts instead of 150), brought the time from ~28 seconds, to close to 10 seconds (or, with a vanilla Kotlin script, from ~2.5 minutes to 15 seconds). Using this information, combining the 10 runs down to a single run by further modifying the code to support nested directories brings the run time down to ~1.5 seconds (compiled), or ~5 seconds using Kotlin scripting ¹. Not bad!

An Untested Idea

Kotlin Multiplatform is very powerful, and provides us the ability to compile Kotlin for non-JVM platforms (by going through LLVM). KMP could easily allow us to take our script and compile it for non-JVM platforms, therefore making it native. In other words, given something like finder.kt, I could do something like:

# install kotlin-native - on macOS, we can do:
brew install kotlin-native

# for the first time running kotlin-native on macOS, we'd have to clear the
# quarantine extended attribute.
xattr -d com.apple.quarantine '/opt/homebrew/Caskroom/kotlin-native/1.8.21/kotlin-native-macos-aarch64-1.8.21/konan/nativelib'/*

# now we can run it through the kotlin-native compiler
# won't work if we have any java.* or android.* imports
kotlinc-native finder.kt -o finder

This would give us a native command line application, without the jvm. We’d expect the performance to be better than what we’ve seen so far, due to not going through the jvm. I didn’t test this approach, however, due to the fact that ImageIO is a jvm only construct. To do this, I’d have to use skia, or expect/actual methods for reading and manipulating pixels on the various platforms.

Takeaways

There are three key takeaways here:

I was again reminded that the adages of “Measure before optimizing,” and “Premature optimization is the root of all evil” are both true. Measure first, and it becomes clear where to spend time to get the most impact.
There is a cost to spinning up a jvm. Keep this in mind when writing Kotlin scripts.
Converting a vanilla Kotlin script to a native script is very compelling, and something I will consider in the future.

This is ~10ms per image, which is a lot less than the estimate of 75ms per image (since that 75ms was a measure of the entire execution). ↩︎