Auto Added by WPeMatico

Multik 0.2: Multiplatform, With Support for Android and Apple Silicon

Introducing Multik 0.2.0! Now a multiplatform library, it allows you to use multidimensional arrays in your favorite multiplatform projects. Let’s take a closer look at what’s new in v0.2.0.

Multik on GitHub

Multiplatform

We are very grateful to Luca Spinazzola for his huge contribution to the multiplatform capabilities included in this release of the library.

Before we move on to reviewing Multik’s new multiplatform structure, we need to say a few words about the new naming conventions. Ever since we multiplied the number of artifacts and added platform suffixes, such as jvm, macosx64, js, and others, there have been collisions with older names. To solve this problem, we’ve renamed some of the modules.

Let’s reacquaint ourselves with the modules and get a sense of which platforms each module now supports.

multik-core

As the name suggests, this is the main module providing multidimensional arrays and the ability to perform transformations, iteration, and arithmetic operations with them. This module also provides an API for calculations that require complex algorithms and resources. Now there are three types of such APIs: mathematics, linear algebra, and statistics. Other modules are responsible for the implementation of this API. Remember – Multik lets you replace these implementations at runtime.

We have added support for all major platforms for this module. Note that JavaScript uses a new IR and Kotlin/Native uses a new memory model, so these artifacts will only be compatible with projects that support the new IR and new memory model, respectively.

multik-core multik-kotlin multik-openblas multik-default
jvm linuxX64
mingwX64
macosX64
macosArm64
androidArm64
linuxX64
mingwX64
macosX64
macosArm64
androidArm64
linuxX64
mingwX64
macosX64
macosArm64
iosArm64
iosX64
iosSimulatorArm64
js
other platforms are not supported

multik-kotlin

The first module that implements the above API is multik-kotlin. In this module, all algorithms and logic are written in pure Kotlin. Even though it may be slower than native libraries, it provides more stability and allows for easier code debugging.

Because everything is written in Kotlin, it was also possible to support most of the important platforms, including JVM, Desktop, iOS, and JavaScript.

multik-openblas

The next module is multik-openblas. Here, the OpenBLAS library is responsible for all linear algebra as well as the C wrapper over the Fortran libraries LAPACK and BLAS. C++ code is responsible for mathematics and statistics.

This module, unlike the previous one, is quite demanding on the environment and the platform it’s launched on. In the table, you can see that the code under the JVM will only work on the specified systems and architectures. On these platforms, we ensure that it works out of the box and the users are rewarded with excellent performance.

multik-default

multik-default, the last of the 4 models available at the moment, has kept its old name. It includes the two previous modules, multik-kotlin and multik-openblas. The idea is to combine the pros of both modules while doing away with the cons.

It supports all of the same platforms as the previous modules.

Support for Android and Apple Silicon processors

multik-openblas is supported by Android and macOS on new Apple processors. Now you can enjoy the speed of applications on Android with ARMv8 processors and native support for M1 and M2 processors from Apple.

Random, norm matrix, easy creation of complex numbers, and more

In this release, we have also improved the usability of the library. For example, we wrapped random from Kotlin to create arrays with random numbers:

val ndarray = mk.rand<Float>(3, 5, 2)

We have changed the matrix norm calculation function and added it to the native one:

val ndarray = mk.ndarray(mk[mk[1.0, 2.0], mk[3.0, 4.0]])
mk.linalg.norm(ndarray)
mk.linalg.norm(ndarray, Norm.Inf)

And now you can create complex numbers easily and naturally. Credit for this contribution goes to Marcus Dunn.

val complexNumber: ComplexDouble = 1.0 + 1.0.i

For more details about this new release, please check out the changelog.

How to try it

To try Multik 0.2.0 in your project, do the following:

  • Make sure that you have mavenCentral() in your list of repositories:
repository {
    mavenCentral()
}
  • Add the Multik module you need as a dependency:
dependencies {
    implementation("org.jetbrains.kotlinx:multik-core:0.2.0")
}

For a multiplatform project, put the Multik dependency in the common set:

commonMain{
   dependencies {
       implementation("org.jetbrains.kotlinx:multik-core:0.2.0")
   }
}

Or put the dependency in a specific source set.

Multik is also available in Kotlin Jupyter notebooks.

%use multik

Try it in Datalore.

Conclusion

We are on our way to a stable release and could really use your feedback.

Try out Mutlik 0.2.0 and share your experience with us! Report any issues you encounter to the project’s issue tracker.

Read more

Continue ReadingMultik 0.2: Multiplatform, With Support for Android and Apple Silicon

Multik 0.2: Multiplatform, With Support for Android and Apple Silicon

Introducing Multik 0.2.0! Now a multiplatform library, it allows you to use multidimensional arrays in your favorite multiplatform projects. Let’s take a closer look at what’s new in v0.2.0.

Multik on GitHub

Multiplatform

We are very grateful to Luca Spinazzola for his huge contribution to the multiplatform capabilities included in this release of the library.

Before we move on to reviewing Multik’s new multiplatform structure, we need to say a few words about the new naming conventions. Ever since we multiplied the number of artifacts and added platform suffixes, such as jvm, macosx64, js, and others, there have been collisions with older names. To solve this problem, we’ve renamed some of the modules.

Let’s reacquaint ourselves with the modules and get a sense of which platforms each module now supports.

multik-core

As the name suggests, this is the main module providing multidimensional arrays and the ability to perform transformations, iteration, and arithmetic operations with them. This module also provides an API for calculations that require complex algorithms and resources. Now there are three types of such APIs: mathematics, linear algebra, and statistics. Other modules are responsible for the implementation of this API. Remember – Multik lets you replace these implementations at runtime.

We have added support for all major platforms for this module. Note that JavaScript uses a new IR and Kotlin/Native uses a new memory model, so these artifacts will only be compatible with projects that support them.

multik-core multik-kotlin multik-openblas multik-default
jvm linuxX64
mingwX64
macosX64
macosArm64
androidArm64
linuxX64
mingwX64
macosX64
macosArm64
androidArm64
linuxX64
mingwX64
macosX64
macosArm64
iosArm64
iosX64
iosSimulatorArm64
js
other platforms are not supported

multik-kotlin

The first module that implements the above API is multik-kotlin. In this module, all algorithms and logic are written in pure Kotlin. Even though it may be slower than native libraries, it provides more stability and allows for easier code debugging.

Because everything is written in Kotlin, it was also possible to support most of the important platforms, including JVM, Desktop, iOS, and JavaScript.

multik-openblas

The next module is multik-openblas. Here, the OpenBLAS library is responsible for all linear algebra as well as the C wrapper over the Fortran libraries LAPACK and BLAS. C++ code is responsible for mathematics and statistics.

This module, unlike the previous one, is quite demanding on the environment and the platform it’s launched on. In the table, you can see that the code under the JVM will only work on the specified systems and architectures. On these platforms, we ensure that it works out of the box and the users are rewarded with excellent performance.

multik-default

multik-default, the last of the 4 models available at the moment, has kept its old name. It includes the two previous modules, multik-kotlin and multik-openblas. The idea is to combine the pros of both modules while doing away with the cons.

It supports all of the same platforms as the previous modules.

Support for Android and Apple Silicon processors

multik-openblas is supported by Android and macOS on new Apple processors. Now you can enjoy the speed of applications on Android with ARMv8 processors and native support for M1 and M2 processors from Apple.

Random, norm matrix, easy creation of complex numbers, and more

In this release, we have also improved the usability of the library. For example, we wrapped random from Kotlin to create arrays with random numbers:

val ndarray = mk.rand<Float>(3, 5, 2)

We have changed the matrix norm calculation function and added it to the native one:

val ndarray = mk.ndarray(mk[mk[1.0, 2.0], mk[3.0, 4.0]])
mk.linalg.norm(ndarray)
mk.linalg.norm(ndarray, Norm.Inf)

And now you can create complex numbers easily and naturally. Credit for this contribution goes to Marcus Dunn.

val complexNumber: ComplexDouble = 1.0 + 1.0.i

For more details about this new release, please check out the changelog.

How to try it

To try Multik 0.2.0 in your project, do the following:

  • Make sure that you have mavenCentral() in your list of repositories:
repository {
    mavenCentral()
}
  • Add the Multik module you need as a dependency:
dependencies {
    implementation("org.jetbrains.kotlinx:multik-core:0.2.0")
}

For a multiplatform project, put the Multik dependency in the common set:

commonMain{
   dependencies {
       implementation("org.jetbrains.kotlinx:multik-core:0.2.0")
   }
}

Or put the dependency in a specific source set.

Multik is also available in Kotlin Jupyter notebooks.

%use multik

Try it in Datalore.

Conclusion

We are on our way to a stable release and could really use your feedback.

Try out Mutlik 0.2.0 and share your experience with us! Report any issues you encounter to the project’s issue tracker.

Read more

Continue ReadingMultik 0.2: Multiplatform, With Support for Android and Apple Silicon

Kotlin DataFrame Library Preview

TL;DR: We at the Kotlin team have developed a Kotlin library for data frames. Today we’re releasing its first public preview version. It provides a readable and powerful DSL for data wrangling and i/o via CSV, JSON, Excel, and Apache Arrow, as well as interop with Kotlin data classes and hierarchical data schemas. The library is ready for you to try, and we’re keen to get your feedback.

Kotlin DataFrame on GitHub


Today we’re unveiling a new member of the collection of Kotlin libraries for data science. We’ve previously written about KotlinDL for deep learning and Multik for tensors. Now we’d like to introduce Kotlin DataFrame, a library for working with data frames.

Overview

One blog post is not enough to cover every aspect of the library, so we started a series of videos about Kotlin DataFrame. Below you’ll find the first video, which covers basic operations and the process of working with plain (non-hierarchical) tables. More videos are in the works, so please let us know whether you like this format and what we can improve.

What does it look like?

A simple example using the DataFrame library in Datalore

What is a data frame?

So what is a data frame? A data frame is a convenient abstraction for working with structured data. Essentially, it’s a 2-dimensional table with labeled columns of potentially different types. You can think of it as a spreadsheet or SQL table, or a dictionary of collections. If you’ve ever worked with Excel or CSV files, you are already more or less familiar with the concept of data frames.

But what makes this abstraction so convenient is not the table itself, but rather the set of operations defined in on it. And Kotlin DataFrame is an idiomatic Kotlin DSL-based language for defining such operations. The process of working with data frames is often called data wrangling. This involves transforming and mapping data from a “raw” data format onto another format that is more appropriate for analytics and visualization. The goal of data wrangling is to ensure that data is useful and of high quality. Data analysts typically spend the majority of their time wrangling data, rather than analyzing. And that’s why it is so important to make this process easy, smooth, and enjoyable.

I’m not a data scientist, why should I care?

First of all, who knows? Maybe you will become a data scientist one day. 🙂

Analyzing data is not restricted to the field of data science. We often do it in our roles as software developers. For example, we analyze what’s actually inside collections when debugging, dig into memory dumps or databases, work with REST APIs, and receive JSON files with data in them. Having a typesafe and easy DSL for these sorts of tasks would be really beneficial.

Why a new library?

Why are we developing a new library if several JVM-based data frames already exist?

Kotlin DataFrame was inspired by the Krangl library and started as a wrapper on top of it. Over time, however, we had to rewrite more and more of the library, and we ended up changing almost all of it. While rewriting it, we’ve followed these guiding principles:

  • Idiomatic – The API should be natural for Kotlin developers and consistent with the Kotlin standard library.
  • Hierarchical – Data frames should be able to read and present data from different sources, including not only plain CSV but also JSON, e.g. directly from REST APIs. That’s why data frames have been designed hierarchically and allow the nesting of columns and cells.
  • Generic – Data frames should be able to store not just a limited set of value types, but any Kotlin object, providing null-safety support for all of them.
  • Interoperable – We need to add seamless integration with Kotlin collections, converting any object structure in application memory to a data frame, and vice versa.
data class Person(val name: String, val age: Int)
val persons = listOf(Person("Alice", 15), Person("Bob", 20), Person("Charlie", 22))
// Convert collection to DataFrame
val df = persons.convertToDataFrame()
// Convert DataFrame to Kotlin collection
val persons1 = df.toListOf<Person>()
  • Typesafe and practical – Data frames are highly dynamic objects. Their labels depend on the input source, and new columns can also be added or removed during data wrangling. To make it possible to access them in a safe and convenient way, we’ve developed a mechanism for the on-the-fly generation of extension properties that correspond to the columns of a data frame. In interactive notebooks like Jupyter or Datalore, this generation runs after the execution of each cell. Currently, we’re working on a compiler plugin that infers and transforms the data frame schema while typing.
    The generated properties ensure you’ll never misspell column names or mess up their types. And of course, nullability is also preserved.
Properties correspond names, types and nullability of the columns
  • Polymorphic – If all the columns of one data frame are presented in some other data frame, then the former can be a supertype for the latter. This means we can define a function for a data frame with an expected set of columns and later safely execute it for any data frame that contains them

Where to start?

Let’s Kotlin!

Continue ReadingKotlin DataFrame Library Preview

KotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

Version 0.4 of our deep learning library, KotlinDL, is out!

KotlinDL 0.4 is now available on Maven Central with a variety of new features – check out all of the changes that are coming to the new release! We’re currently introducing new models in ModelHub (including the EfficientNet and EfficientDet model families), the experimental high-level Kotlin API for Pose Detection, new layers and preprocessors contributed by the community members, and many other changes.

KotlinDL on GitHub

In this post, we’ll walk you through the changes to the Kotlin Deep Learning library in the 0.4 release:

  1. Pose Detection
  2. NoTop models in the ModelHub
  3. New models: EfficientDet and EfficientNet
  4. Multiple callbacks
  5. Breaking changes in the Image Preprocessing DSL
  6. 4 new layers and 2 new activation functions
  7. Learn more and share your feedback


Pose Detection

Pose detection is using an ML model to detect the pose of a person from an image or a video by detecting the spatial locations of key body joints (keypoints).

We’re excited to launch the MoveNet family of pose detection modes with our new pose detection API in KotlinDL. MoveNet is a fast and accurate model that detects 17 keypoints on the body. The model is offered on ONNXModelHub with two variants, MoveNetSinglePoseLighting and MoveNetSinglePoseThunder. MoveNetSinglePoseLighting is intended for latency-critical applications, while MoveNetSinglePoseThunder is intended for applications that require high accuracy.

If you need to detect a few poses on a given image or video frame, try MoveNetMultiPoseLighting. This model is able to detect multiple people in the image frame at the same time, while still achieving real-time speed.

There are two ways to detect poses within the KotlinDL: parsing the model output manually or using our LightAPI for Pose Detection (the recommended way).

Just load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Run the predictions and print out the pose landmarks and edges connecting the detected pose landmarks:

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPose = poseDetectionModel.detectPose(imageFile = imageFile)

       detectedPose.poseLandmarks.forEach {
           println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
       }

       detectedPose.edges.forEach {
           println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
       }
}

Some visualization examples, where we drew landmarks and edges on the given images, are below.

The complete example can be found here.

If you want to run the MoveNet model to detect multiple poses on the given image, you need to make some minor changes to your code.

First, load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Secondly, run the model and get the MultiPoseDetectionResult object, which contains the list of pairs <DetectedObject, DetectedPose>. As a result, we have access not only to the landmarks’ coordinates and labels, but also to the coordinates of the bounding box for the whole person.

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPoses = poseDetectionModel.detectPoses(imageFile = imageFile, confidence = 0.0f)

       detectedPoses.multiplePoses.forEach { detectedPose ->
           println("Found ${detectedPose.first.classLabel} with probability ${detectedPose.first.probability}")
           detectedPose.second.poseLandmarks.forEach {
               println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
           }

           detectedPose.second.edges.forEach {
               println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
           }
       }
}

Some visualization examples, where we drew the bounding boxes, landmarks, and edges on the images are below.

The complete example can be found here.


NoTop models in the ModelHub

Running predictions on ready-made models is good, but what about fine-tuning them for your tasks?

The classic approach to Transfer Learning is to freeze all layers except the last few and then train the top few layers (the fully connected layers at the top of the network) on a new piece of data, often changing the number of model outputs.

Before the 0.4 release, KotlinDL users needed to remove the last layers manually, but with the 0.4 release, TensorFlowModelHub provides an option to download “noTop” models  – equivalent to earlier available models, but without weights and configurations for the last few layers.

The following “noTop” models are now available:

  • VGG’16
  • VGG’19
  • ResNet50
  • ResNet101
  • ResNet152
  • ResNet50V2
  • ResNet101V2
  • ResNet152V2
  • MobileNet
  • MobileNetV2
  • NasNetMobile
  • NasNetLarge
  • DenseNet121
  • DenseNet169
  • DenseNet201
  • Xception
  • Inception

In the example below, we load the ResNet50 model from our TensorFlowModelHub and fine-tune it to classify cats and dogs (using the embedded Dogs-vs-Cats dataset):

val modelHub = TFModelHub(cacheDirectory = File("cache/pretrainedModels"))

val modelType = TFModels.CV.ResNet50(noTop = true, inputShape = intArrayOf(IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))

val noTopModel = modelHub.loadModel(modelType)

The topModel is the simplest neural network and can be trained quickly, as it has few parameters.

val topModel = Sequential.of(
   GlobalAvgPool2D(
       name = "top_avg_pool",
   ),
   Dense(
       name = "top_dense",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = 200,
       activation = Activations.Relu
   ),
   Dense(
       name = "pred",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = NUM_CLASSES,
       activation = Activations.Linear
   ),
   noInput = true
)

The new helper function could join two models together: noTop and topModel: val model = Functional.of(pretrainedModel = noTopModel, topModel = topModel)

After that, load weights for the frozen layers from the noTop model, and the weights for the unfrozen layers from the topModel will be initialized during the fit method call.

model.use {
   it.compile(
       optimizer = Adam(),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.loadWeightsForFrozenLayers(hdfFile)

   it.fit(
       dataset = train,
       batchSize = TRAINING_BATCH_SIZE,
       epochs = EPOCHS
   )

   val accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]

   println("Accuracy: $accuracy")
}

The complete example can be found here.


New models: EfficientDet and EfficientNet

Until v0.4, our ModelHub contained only one model (SSD) suitable for solving the Object Detection problem. Starting with this release, we’re gradually expanding the library’s capabilities for solving the Object Detection problem. We’d like to introduce to you a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior object detectors across a wide spectrum of resource constraints.

All models from this family have the same internal architecture which scales for different inputs (image resolution). The final user has a choice of models: from the smallest EfficientDet-D0, model with 3.9 million parameters and 10.2 ms latency on the V100 up to the EfficientDet-D7, with 52 million parameters and 122 ms latency on the V100.

Internally, EfficientDet models use another famous model, EfficientNet, as a backbone. It extracts features from input images and passes them to the next component of the Object Detection model).

EfficientDet Architecture

An example of EfficientDet-D2 usage can be found here.

The EfficientNet model family is also available in the ONNXModelHub. There are 8 different types of models and each model is presented in two variants: full and “noTop” for fine-tuning.

These models achieve better accuracy on the ImageNet dataset with 10x fewer parameters than ResNet or NasNet. If you need fast and accurate image recognition, EfficientNet is a good choice.

An example of EfficientNet0 usage can be found here.


Multiple callbacks

Earlier, Callback support for KotlinDL was pretty simple and not fully compatible with Keras. As a result, users faced difficulties in implementing their neural networks, building the custom validation process, and monitoring the neural network’s training.

The callback object was passed during compilation and was unique for each stage in the model’s lifecycle. However, model compilation can be located in very different places in the code than fit/predict/evaluate, meaning that users may need to create different callbacks for different purposes.

Let’s assume that we need to define EarlyStopping and  TerminateOnNaN for training to handle exceptional cases, and also add two custom callbacks for the prediction and evaluation phases:

val earlyStopping = EarlyStopping(
   monitor = EpochTrainingEvent::valLossValue,
   minDelta = 0.0,
   patience = 2,
   verbose = true,
   mode = EarlyStoppingMode.AUTO,
   baseline = 0.1,
   restoreBestWeights = false
)
val terminateOnNaN = TerminateOnNaN()


class EvaluateCallback : Callback() {
   override fun onTestBatchEnd(batch: Int, batchSize: Int, event: BatchEvent?, logs: History) {
       println("Test batch $batch ends with loss ${event!!.lossValue}..")
   }

   override fun onTestEnd(logs: History) {
       println("Train ends with last loss ${logs.lastBatchEvent().lossValue}")
   }
}

class PredictCallback : Callback() {
   override fun onPredictBatchBegin(batch: Int, batchSize: Int) {
       println("Prediction batch $batch begins.")
   }

   override fun onPredictBatchEnd(batch: Int, batchSize: Int) {
       println("Prediction batch $batch ends.")
   }
}

Let’s pass these callbacks to the model methods:

model.use {
   it.compile(
       optimizer = Adam(clipGradient = ClipGradientByValue(0.1f)),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.logSummary()

   it.fit(
       dataset = train,
       epochs = EPOCHS,
       batchSize = TRAINING_BATCH_SIZE,
       callbacks = listOf(earlyStopping, terminateOnNaN)
   )

   val accuracy = it.evaluate(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = EvaluateCallback()
   ).metrics[Metrics.ACCURACY]


   val predictions = it.predict(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = PredictCallback()
   )
}

Found below in the logs:

The complete example can be found here.


4 new layers and 2 new activation functions

Many contributors to this release have added layers to Kotlin for performing non-trivial logic. With these added layers, you can start working with autoencoders and load the GAN models:

There are also two new activation functions:

These activation functions are not available in the TensorFlow core package, but we decided to add them after seeing how they’ve been widely used in recent papers.

We’d be delighted to look at your pull requests if you’d like to contribute a layer, activation function, callback, or initializer from a recent paper!


Breaking changes in the Image Preprocessing DSL

There are a few major changes in the Image Preprocessing DSL:

  • CustomPreprocessor was removed.
  • The loading section was moved from image preprocessing to the Dataset API
  • A few new Preprocessors were added:
    • Padding
    • CenterCrop
    • Convert
    • Grayscale
    • Normalizing

Here is an example of some of the new operations:

val preprocessing = preprocess {
 transformImage {
   centerCrop {
     size = 214
   }
   pad {
     top = 10
     bottom = 10
     left = 10
     right = 10
     mode = PaddingMode.Fill(Color.BLACK)
   }
   convert {
     colorMode = ColorMode.BGR
   }
 }
 transformTensor {
   normalize {
     mean = floatArrayOf(103.939f, 116.779f, 123.68f)
     std = floatArrayOf(57.375f, 57.12f, 58.395f)
   }
 }
}

Because of the removal of the loading section, the same preprocessing instance could now be used in several datasets:

val trainDataset = OnHeapDataset.create(File(datasetPath, "train"), labelGenerator, preprocessing)
val valDataset = OnHeapDataset.create(File(datasetPath, "val"), labelGenerator, preprocessing)


Standing on the shoulders of giants

We’d like to express our deep gratitude to Alexey Zinoviev for his great work developing the framework from minimum viable product to the current state, efforts towards creating a community, skillful release management, and competent marketing support.

His passion for democratizing AI and his continuous work to improve the ability of Kotlin and Java developers to use ML/DL models deserves great respect and inspires us to continue our work.

We’d also like to express our gratitude to Veniamin Viflyantsev, who’s invested a lot of time and effort into changing the architecture of the api module. Many of his changes are now part of this release.

Our team has expanded! Julia Beliaeva (author of the new version of Image Preprocessing DSL) and Nikita Ermolenko have joined us on a permanent basisWe wish them good luck and look forward to new releases!


Learn more and share your feedback

We hope you enjoyed this brief overview of the new features in KotlinDL 0.4! For more information, including the up-to-date Readme file, visit the project’s home on GitHub. Be sure to check out the KotlinDL guide, which contains detailed information about the library’s basic and advanced features and covers many of the topics mentioned in this blog post in more detail.

If you’ve previously used KotlinDL, use the changelog to find out what has changed and how to upgrade your projects to the stable release.

We’d be very thankful if you’d report any bugs you find to our issue tracker. We’ll try to fix all of the critical issues in the 0.4.1 release.

You’re also welcome to join the #kotlindl channel in Kotlin Slack (get an invite here). In this channel, you can ask questions, participate in discussions, and get notifications about the new preview releases and models in ModelHub.

Continue ReadingKotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

KotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

Version 0.4 of our deep learning library, KotlinDL, is out!

KotlinDL 0.4 is now available on Maven Central with a variety of new features – check out all of the changes that are coming to the new release! We’re currently introducing new models in ModelHub (including the EfficientNet and EfficientDet model families), the experimental high-level Kotlin API for Pose Detection, new layers and preprocessors contributed by the community members, and many other changes.

KotlinDL on GitHub

In this post, we’ll walk you through the changes to the Kotlin Deep Learning library in the 0.4 release:

  1. Pose Detection
  2. NoTop models in the ModelHub
  3. New models: EfficientDet and EfficientNet
  4. Multiple callbacks
  5. Breaking changes in the Image Preprocessing DSL
  6. 4 new layers and 2 new activation functions
  7. Learn more and share your feedback


Pose Detection

Pose detection is using an ML model to detect the pose of a person from an image or a video by detecting the spatial locations of key body joints (keypoints).

We’re excited to launch the MoveNet family of pose detection modes with our new pose detection API in KotlinDL. MoveNet is a fast and accurate model that detects 17 keypoints on the body. The model is offered on ONNXModelHub with two variants, MoveNetSinglePoseLighting and MoveNetSinglePoseThunder. MoveNetSinglePoseLighting is intended for latency-critical applications, while MoveNetSinglePoseThunder is intended for applications that require high accuracy.

If you need to detect a few poses on a given image or video frame, try MoveNetMultiPoseLighting. This model is able to detect multiple people in the image frame at the same time, while still achieving real-time speed.

There are two ways to detect poses within the KotlinDL: parsing the model output manually or using our LightAPI for Pose Detection (the recommended way).

Just load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Run the predictions and print out the pose landmarks and edges connecting the detected pose landmarks:

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPose = poseDetectionModel.detectPose(imageFile = imageFile)

       detectedPose.poseLandmarks.forEach {
           println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
       }

       detectedPose.edges.forEach {
           println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
       }
}

Some visualization examples, where we drew landmarks and edges on the given images, are below.

The complete example can be found here.

If you want to run the MoveNet model to detect multiple poses on the given image, you need to make some minor changes to your code.

First, load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Secondly, run the model and get the MultiPoseDetectionResult object, which contains the list of pairs <DetectedObject, DetectedPose>. As a result, we have access not only to the landmarks’ coordinates and labels, but also to the coordinates of the bounding box for the whole person.

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPoses = poseDetectionModel.detectPoses(imageFile = imageFile, confidence = 0.0f)

       detectedPoses.multiplePoses.forEach { detectedPose ->
           println("Found ${detectedPose.first.classLabel} with probability ${detectedPose.first.probability}")
           detectedPose.second.poseLandmarks.forEach {
               println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
           }

           detectedPose.second.edges.forEach {
               println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
           }
       }
}

Some visualization examples, where we drew the bounding boxes, landmarks, and edges on the images are below.

The complete example can be found here.


NoTop models in the ModelHub

Running predictions on ready-made models is good, but what about fine-tuning them for your tasks?

The classic approach to Transfer Learning is to freeze all layers except the last few and then train the top few layers (the fully connected layers at the top of the network) on a new piece of data, often changing the number of model outputs.

Before the 0.4 release, KotlinDL users needed to remove the last layers manually, but with the 0.4 release, TensorFlowModelHub provides an option to download “noTop” models  – equivalent to earlier available models, but without weights and configurations for the last few layers.

The following “noTop” models are now available:

  • VGG’16
  • VGG’19
  • ResNet50
  • ResNet101
  • ResNet152
  • ResNet50V2
  • ResNet101V2
  • ResNet152V2
  • MobileNet
  • MobileNetV2
  • NasNetMobile
  • NasNetLarge
  • DenseNet121
  • DenseNet169
  • DenseNet201
  • Xception
  • Inception

In the example below, we load the ResNet50 model from our TensorFlowModelHub and fine-tune it to classify cats and dogs (using the embedded Dogs-vs-Cats dataset):

val modelHub = TFModelHub(cacheDirectory = File("cache/pretrainedModels"))

val modelType = TFModels.CV.ResNet50(noTop = true, inputShape = intArrayOf(IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))

val noTopModel = modelHub.loadModel(modelType)

The topModel is the simplest neural network and can be trained quickly, as it has few parameters.

val topModel = Sequential.of(
   GlobalAvgPool2D(
       name = "top_avg_pool",
   ),
   Dense(
       name = "top_dense",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = 200,
       activation = Activations.Relu
   ),
   Dense(
       name = "pred",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = NUM_CLASSES,
       activation = Activations.Linear
   ),
   noInput = true
)

The new helper function could join two models together: noTop and topModel: val model = Functional.of(pretrainedModel = noTopModel, topModel = topModel)

After that, load weights for the frozen layers from the noTop model, and the weights for the unfrozen layers from the topModel will be initialized during the fit method call.

model.use {
   it.compile(
       optimizer = Adam(),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.loadWeightsForFrozenLayers(hdfFile)

   it.fit(
       dataset = train,
       batchSize = TRAINING_BATCH_SIZE,
       epochs = EPOCHS
   )

   val accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]

   println("Accuracy: $accuracy")
}

The complete example can be found here.


New models: EfficientDet and EfficientNet

Until v0.4, our ModelHub contained only one model (SSD) suitable for solving the Object Detection problem. Starting with this release, we’re gradually expanding the library’s capabilities for solving the Object Detection problem. We’d like to introduce to you a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior object detectors across a wide spectrum of resource constraints.

All models from this family have the same internal architecture which scales for different inputs (image resolution). The final user has a choice of models: from the smallest EfficientDet-D0, model with 3.9 million parameters and 10.2 ms latency on the V100 up to the EfficientDet-D7, with 52 million parameters and 122 ms latency on the V100.

Internally, EfficientDet models use another famous model, EfficientNet, as a backbone. It extracts features from input images and passes them to the next component of the Object Detection model).

EfficientDet Architecture

An example of EfficientDet-D2 usage can be found here.

The EfficientNet model family is also available in the ONNXModelHub. There are 8 different types of models and each model is presented in two variants: full and “noTop” for fine-tuning.

These models achieve better accuracy on the ImageNet dataset with 10x fewer parameters than ResNet or NasNet. If you need fast and accurate image recognition, EfficientNet is a good choice.

An example of EfficientNet0 usage can be found here.


Multiple callbacks

Earlier, Callback support for KotlinDL was pretty simple and not fully compatible with Keras. As a result, users faced difficulties in implementing their neural networks, building the custom validation process, and monitoring the neural network’s training.

The callback object was passed during compilation and was unique for each stage in the model’s lifecycle. However, model compilation can be located in very different places in the code than fit/predict/evaluate, meaning that users may need to create different callbacks for different purposes.

Let’s assume that we need to define EarlyStopping and  TerminateOnNaN for training to handle exceptional cases, and also add two custom callbacks for the prediction and evaluation phases:

val earlyStopping = EarlyStopping(
   monitor = EpochTrainingEvent::valLossValue,
   minDelta = 0.0,
   patience = 2,
   verbose = true,
   mode = EarlyStoppingMode.AUTO,
   baseline = 0.1,
   restoreBestWeights = false
)
val terminateOnNaN = TerminateOnNaN()


class EvaluateCallback : Callback() {
   override fun onTestBatchEnd(batch: Int, batchSize: Int, event: BatchEvent?, logs: History) {
       println("Test batch $batch ends with loss ${event!!.lossValue}..")
   }

   override fun onTestEnd(logs: History) {
       println("Train ends with last loss ${logs.lastBatchEvent().lossValue}")
   }
}

class PredictCallback : Callback() {
   override fun onPredictBatchBegin(batch: Int, batchSize: Int) {
       println("Prediction batch $batch begins.")
   }

   override fun onPredictBatchEnd(batch: Int, batchSize: Int) {
       println("Prediction batch $batch ends.")
   }
}

Let’s pass these callbacks to the model methods:

model.use {
   it.compile(
       optimizer = Adam(clipGradient = ClipGradientByValue(0.1f)),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.logSummary()

   it.fit(
       dataset = train,
       epochs = EPOCHS,
       batchSize = TRAINING_BATCH_SIZE,
       callbacks = listOf(earlyStopping, terminateOnNaN)
   )

   val accuracy = it.evaluate(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = EvaluateCallback()
   ).metrics[Metrics.ACCURACY]


   val predictions = it.predict(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = PredictCallback()
   )
}

Found below in the logs:

The complete example can be found here.


4 new layers and 2 new activation functions

Many contributors to this release have added layers to Kotlin for performing non-trivial logic. With these added layers, you can start working with autoencoders and load the GAN models:

There are also two new activation functions:

These activation functions are not available in the TensorFlow core package, but we decided to add them after seeing how they’ve been widely used in recent papers.

We’d be delighted to look at your pull requests if you’d like to contribute a layer, activation function, callback, or initializer from a recent paper!


Breaking changes in the Image Preprocessing DSL

There are a few major changes in the Image Preprocessing DSL:

  • CustomPreprocessor was removed.
  • The loading section was moved from image preprocessing to the Dataset API
  • A few new Preprocessors were added:
    • Padding
    • CenterCrop
    • Convert
    • Grayscale
    • Normalizing

Here is an example of some of the new operations:

val preprocessing = preprocess {
 transformImage {
   centerCrop {
     size = 214
   }
   pad {
     top = 10
     bottom = 10
     left = 10
     right = 10
     mode = PaddingMode.Fill(Color.BLACK)
   }
   convert {
     colorMode = ColorMode.BGR
   }
 }
 transformTensor {
   normalize {
     mean = floatArrayOf(103.939f, 116.779f, 123.68f)
     std = floatArrayOf(57.375f, 57.12f, 58.395f)
   }
 }
}

Because of the removal of the loading section, the same preprocessing instance could now be used in several datasets:

val trainDataset = OnHeapDataset.create(File(datasetPath, "train"), labelGenerator, preprocessing)
val valDataset = OnHeapDataset.create(File(datasetPath, "val"), labelGenerator, preprocessing)


Standing on the shoulders of giants

We’d like to express our deep gratitude to Alexey Zinoviev for his great work developing the framework from minimum viable product to the current state, efforts towards creating a community, skillful release management, and competent marketing support.

His passion for democratizing AI and his continuous work to improve the ability of Kotlin and Java developers to use ML/DL models deserves great respect and inspires us to continue our work.

We’d also like to express our gratitude to Veniamin Viflyantsev, who’s invested a lot of time and effort into changing the architecture of the api module. Many of his changes are now part of this release.

Our team has expanded! Julia Beliaeva (author of the new version of Image Preprocessing DSL) and Nikita Ermolenko have joined us on a permanent basisWe wish them good luck and look forward to new releases!


Learn more and share your feedback

We hope you enjoyed this brief overview of the new features in KotlinDL 0.4! For more information, including the up-to-date Readme file, visit the project’s home on GitHub. Be sure to check out the KotlinDL guide, which contains detailed information about the library’s basic and advanced features and covers many of the topics mentioned in this blog post in more detail.

If you’ve previously used KotlinDL, use the changelog to find out what has changed and how to upgrade your projects to the stable release.

We’d be very thankful if you’d report any bugs you find to our issue tracker. We’ll try to fix all of the critical issues in the 0.4.1 release.

You’re also welcome to join the #kotlindl channel in Kotlin Slack (get an invite here). In this channel, you can ask questions, participate in discussions, and get notifications about the new preview releases and models in ModelHub.

Continue ReadingKotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

KotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

Version 0.4 of our deep learning library, KotlinDL, is out!

KotlinDL 0.4 is now available on Maven Central with a variety of new features – check out all of the changes that are coming to the new release! We’re currently introducing new models in ModelHub (including the EfficientNet and EfficientDet model families), the experimental high-level Kotlin API for Pose Detection, new layers and preprocessors contributed by the community members, and many other changes.

KotlinDL on GitHub

In this post, we’ll walk you through the changes to the Kotlin Deep Learning library in the 0.4 release:

  1. Pose Detection
  2. NoTop models in the ModelHub
  3. New models: EfficientDet and EfficientNet
  4. Multiple callbacks
  5. Breaking changes in the Image Preprocessing DSL
  6. 4 new layers and 2 new activation functions
  7. Learn more and share your feedback


Pose Detection

Pose detection is using an ML model to detect the pose of a person from an image or a video by detecting the spatial locations of key body joints (keypoints).

We’re excited to launch the MoveNet family of pose detection modes with our new pose detection API in KotlinDL. MoveNet is a fast and accurate model that detects 17 keypoints on the body. The model is offered on ONNXModelHub with two variants, MoveNetSinglePoseLighting and MoveNetSinglePoseThunder. MoveNetSinglePoseLighting is intended for latency-critical applications, while MoveNetSinglePoseThunder is intended for applications that require high accuracy.

If you need to detect a few poses on a given image or video frame, try MoveNetMultiPoseLighting. This model is able to detect multiple people in the image frame at the same time, while still achieving real-time speed.

There are two ways to detect poses within the KotlinDL: parsing the model output manually or using our LightAPI for Pose Detection (the recommended way).

Just load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Run the predictions and print out the pose landmarks and edges connecting the detected pose landmarks:

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPose = poseDetectionModel.detectPose(imageFile = imageFile)

       detectedPose.poseLandmarks.forEach {
           println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
       }

       detectedPose.edges.forEach {
           println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
       }
}

Some visualization examples, where we drew landmarks and edges on the given images, are below.

The complete example can be found here.

If you want to run the MoveNet model to detect multiple poses on the given image, you need to make some minor changes to your code.

First, load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Secondly, run the model and get the MultiPoseDetectionResult object, which contains the list of pairs <DetectedObject, DetectedPose>. As a result, we have access not only to the landmarks’ coordinates and labels, but also to the coordinates of the bounding box for the whole person.

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPoses = poseDetectionModel.detectPoses(imageFile = imageFile, confidence = 0.0f)

       detectedPoses.multiplePoses.forEach { detectedPose ->
           println("Found ${detectedPose.first.classLabel} with probability ${detectedPose.first.probability}")
           detectedPose.second.poseLandmarks.forEach {
               println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
           }

           detectedPose.second.edges.forEach {
               println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
           }
       }
}

Some visualization examples, where we drew the bounding boxes, landmarks, and edges on the images are below.

The complete example can be found here.


NoTop models in the ModelHub

Running predictions on ready-made models is good, but what about fine-tuning them for your tasks?

The classic approach to Transfer Learning is to freeze all layers except the last few and then train the top few layers (the fully connected layers at the top of the network) on a new piece of data, often changing the number of model outputs.

Before the 0.4 release, KotlinDL users needed to remove the last layers manually, but with the 0.4 release, TensorFlowModelHub provides an option to download “noTop” models  – equivalent to earlier available models, but without weights and configurations for the last few layers.

The following “noTop” models are now available:

  • VGG’16
  • VGG’19
  • ResNet50
  • ResNet101
  • ResNet152
  • ResNet50V2
  • ResNet101V2
  • ResNet152V2
  • MobileNet
  • MobileNetV2
  • NasNetMobile
  • NasNetLarge
  • DenseNet121
  • DenseNet169
  • DenseNet201
  • Xception
  • Inception

In the example below, we load the ResNet50 model from our TensorFlowModelHub and fine-tune it to classify cats and dogs (using the embedded Dogs-vs-Cats dataset):

val modelHub = TFModelHub(cacheDirectory = File("cache/pretrainedModels"))

val modelType = TFModels.CV.ResNet50(noTop = true, inputShape = intArrayOf(IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))

val noTopModel = modelHub.loadModel(modelType)

The topModel is the simplest neural network and can be trained quickly, as it has few parameters.

val topModel = Sequential.of(
   GlobalAvgPool2D(
       name = "top_avg_pool",
   ),
   Dense(
       name = "top_dense",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = 200,
       activation = Activations.Relu
   ),
   Dense(
       name = "pred",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = NUM_CLASSES,
       activation = Activations.Linear
   ),
   noInput = true
)

The new helper function could join two models together: noTop and topModel: val model = Functional.of(pretrainedModel = noTopModel, topModel = topModel)

After that, load weights for the frozen layers from the noTop model, and the weights for the unfrozen layers from the topModel will be initialized during the fit method call.

model.use {
   it.compile(
       optimizer = Adam(),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.loadWeightsForFrozenLayers(hdfFile)

   it.fit(
       dataset = train,
       batchSize = TRAINING_BATCH_SIZE,
       epochs = EPOCHS
   )

   val accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]

   println("Accuracy: $accuracy")
}

The complete example can be found here.


New models: EfficientDet and EfficientNet

Until v0.4, our ModelHub contained only one model (SSD) suitable for solving the Object Detection problem. Starting with this release, we’re gradually expanding the library’s capabilities for solving the Object Detection problem. We’d like to introduce to you a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior object detectors across a wide spectrum of resource constraints.

All models from this family have the same internal architecture which scales for different inputs (image resolution). The final user has a choice of models: from the smallest EfficientDet-D0, model with 3.9 million parameters and 10.2 ms latency on the V100 up to the EfficientDet-D7, with 52 million parameters and 122 ms latency on the V100.

Internally, EfficientDet models use another famous model, EfficientNet, as a backbone. It extracts features from input images and passes them to the next component of the Object Detection model).

EfficientDet Architecture

An example of EfficientDet-D2 usage can be found here.

The EfficientNet model family is also available in the ONNXModelHub. There are 8 different types of models and each model is presented in two variants: full and “noTop” for fine-tuning.

These models achieve better accuracy on the ImageNet dataset with 10x fewer parameters than ResNet or NasNet. If you need fast and accurate image recognition, EfficientNet is a good choice.

An example of EfficientNet0 usage can be found here.


Multiple callbacks

Earlier, Callback support for KotlinDL was pretty simple and not fully compatible with Keras. As a result, users faced difficulties in implementing their neural networks, building the custom validation process, and monitoring the neural network’s training.

The callback object was passed during compilation and was unique for each stage in the model’s lifecycle. However, model compilation can be located in very different places in the code than fit/predict/evaluate, meaning that users may need to create different callbacks for different purposes.

Let’s assume that we need to define EarlyStopping and  TerminateOnNaN for training to handle exceptional cases, and also add two custom callbacks for the prediction and evaluation phases:

val earlyStopping = EarlyStopping(
   monitor = EpochTrainingEvent::valLossValue,
   minDelta = 0.0,
   patience = 2,
   verbose = true,
   mode = EarlyStoppingMode.AUTO,
   baseline = 0.1,
   restoreBestWeights = false
)
val terminateOnNaN = TerminateOnNaN()


class EvaluateCallback : Callback() {
   override fun onTestBatchEnd(batch: Int, batchSize: Int, event: BatchEvent?, logs: History) {
       println("Test batch $batch ends with loss ${event!!.lossValue}..")
   }

   override fun onTestEnd(logs: History) {
       println("Train ends with last loss ${logs.lastBatchEvent().lossValue}")
   }
}

class PredictCallback : Callback() {
   override fun onPredictBatchBegin(batch: Int, batchSize: Int) {
       println("Prediction batch $batch begins.")
   }

   override fun onPredictBatchEnd(batch: Int, batchSize: Int) {
       println("Prediction batch $batch ends.")
   }
}

Let’s pass these callbacks to the model methods:

model.use {
   it.compile(
       optimizer = Adam(clipGradient = ClipGradientByValue(0.1f)),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.logSummary()

   it.fit(
       dataset = train,
       epochs = EPOCHS,
       batchSize = TRAINING_BATCH_SIZE,
       callbacks = listOf(earlyStopping, terminateOnNaN)
   )

   val accuracy = it.evaluate(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = EvaluateCallback()
   ).metrics[Metrics.ACCURACY]


   val predictions = it.predict(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = PredictCallback()
   )
}

Found below in the logs:

The complete example can be found here.


4 new layers and 2 new activation functions

Many contributors to this release have added layers to Kotlin for performing non-trivial logic. With these added layers, you can start working with autoencoders and load the GAN models:

There are also two new activation functions:

These activation functions are not available in the TensorFlow core package, but we decided to add them after seeing how they’ve been widely used in recent papers.

We’d be delighted to look at your pull requests if you’d like to contribute a layer, activation function, callback, or initializer from a recent paper!


Breaking changes in the Image Preprocessing DSL

There are a few major changes in the Image Preprocessing DSL:

  • CustomPreprocessor was removed.
  • The loading section was moved from image preprocessing to the Dataset API
  • A few new Preprocessors were added:
    • Padding
    • CenterCrop
    • Convert
    • Grayscale
    • Normalizing

Here is an example of some of the new operations:

val preprocessing = preprocess {
 transformImage {
   centerCrop {
     size = 214
   }
   pad {
     top = 10
     bottom = 10
     left = 10
     right = 10
     mode = PaddingMode.Fill(Color.BLACK)
   }
   convert {
     colorMode = ColorMode.BGR
   }
 }
 transformTensor {
   normalize {
     mean = floatArrayOf(103.939f, 116.779f, 123.68f)
     std = floatArrayOf(57.375f, 57.12f, 58.395f)
   }
 }
}

Because of the removal of the loading section, the same preprocessing instance could now be used in several datasets:

val trainDataset = OnHeapDataset.create(File(datasetPath, "train"), labelGenerator, preprocessing)
val valDataset = OnHeapDataset.create(File(datasetPath, "val"), labelGenerator, preprocessing)


Standing on the shoulders of giants

We’d like to express our deep gratitude to Alexey Zinoviev for his great work developing the framework from minimum viable product to the current state, efforts towards creating a community, skillful release management, and competent marketing support.

His passion for democratizing AI and his continuous work to improve the ability of Kotlin and Java developers to use ML/DL models deserves great respect and inspires us to continue our work.

We’d also like to express our gratitude to Veniamin Viflyantsev, who’s invested a lot of time and effort into changing the architecture of the api module. Many of his changes are now part of this release.

Our team has expanded! Julia Beliaeva (author of the new version of Image Preprocessing DSL) and Nikita Ermolenko have joined us on a permanent basisWe wish them good luck and look forward to new releases!


Learn more and share your feedback

We hope you enjoyed this brief overview of the new features in KotlinDL 0.4! For more information, including the up-to-date Readme file, visit the project’s home on GitHub. Be sure to check out the KotlinDL guide, which contains detailed information about the library’s basic and advanced features and covers many of the topics mentioned in this blog post in more detail.

If you’ve previously used KotlinDL, use the changelog to find out what has changed and how to upgrade your projects to the stable release.

We’d be very thankful if you’d report any bugs you find to our issue tracker. We’ll try to fix all of the critical issues in the 0.4.1 release.

You’re also welcome to join the #kotlindl channel in Kotlin Slack (get an invite here). In this channel, you can ask questions, participate in discussions, and get notifications about the new preview releases and models in ModelHub.

Continue ReadingKotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

KotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

Version 0.4 of our deep learning library, KotlinDL, is out!

KotlinDL 0.4 is now available on Maven Central with a variety of new features – check out all of the changes that are coming to the new release! We’re currently introducing new models in ModelHub (including the EfficientNet and EfficientDet model families), the experimental high-level Kotlin API for Pose Detection, new layers and preprocessors contributed by the community members, and many other changes.

KotlinDL on GitHub

In this post, we’ll walk you through the changes to the Kotlin Deep Learning library in the 0.4 release:

  1. Pose Detection
  2. NoTop models in the ModelHub
  3. New models: EfficientDet and EfficientNet
  4. Multiple callbacks
  5. Breaking changes in the Image Preprocessing DSL
  6. 4 new layers and 2 new activation functions
  7. Learn more and share your feedback


Pose Detection

Pose detection is using an ML model to detect the pose of a person from an image or a video by detecting the spatial locations of key body joints (keypoints).

We’re excited to launch the MoveNet family of pose detection modes with our new pose detection API in KotlinDL. MoveNet is a fast and accurate model that detects 17 keypoints on the body. The model is offered on ONNXModelHub with two variants, MoveNetSinglePoseLighting and MoveNetSinglePoseThunder. MoveNetSinglePoseLighting is intended for latency-critical applications, while MoveNetSinglePoseThunder is intended for applications that require high accuracy.

If you need to detect a few poses on a given image or video frame, try MoveNetMultiPoseLighting. This model is able to detect multiple people in the image frame at the same time, while still achieving real-time speed.

There are two ways to detect poses within the KotlinDL: parsing the model output manually or using our LightAPI for Pose Detection (the recommended way).

Just load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Run the predictions and print out the pose landmarks and edges connecting the detected pose landmarks:

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPose = poseDetectionModel.detectPose(imageFile = imageFile)

       detectedPose.poseLandmarks.forEach {
           println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
       }

       detectedPose.edges.forEach {
           println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
       }
}

Some visualization examples, where we drew landmarks and edges on the given images, are below.

The complete example can be found here.

If you want to run the MoveNet model to detect multiple poses on the given image, you need to make some minor changes to your code.

First, load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Secondly, run the model and get the MultiPoseDetectionResult object, which contains the list of pairs <DetectedObject, DetectedPose>. As a result, we have access not only to the landmarks’ coordinates and labels, but also to the coordinates of the bounding box for the whole person.

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPoses = poseDetectionModel.detectPoses(imageFile = imageFile, confidence = 0.0f)

       detectedPoses.multiplePoses.forEach { detectedPose ->
           println("Found ${detectedPose.first.classLabel} with probability ${detectedPose.first.probability}")
           detectedPose.second.poseLandmarks.forEach {
               println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
           }

           detectedPose.second.edges.forEach {
               println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
           }
       }
}

Some visualization examples, where we drew the bounding boxes, landmarks, and edges on the images are below.

The complete example can be found here.


NoTop models in the ModelHub

Running predictions on ready-made models is good, but what about fine-tuning them for your tasks?

The classic approach to Transfer Learning is to freeze all layers except the last few and then train the top few layers (the fully connected layers at the top of the network) on a new piece of data, often changing the number of model outputs.

Before the 0.4 release, KotlinDL users needed to remove the last layers manually, but with the 0.4 release, TensorFlowModelHub provides an option to download “noTop” models  – equivalent to earlier available models, but without weights and configurations for the last few layers.

The following “noTop” models are now available:

  • VGG’16
  • VGG’19
  • ResNet50
  • ResNet101
  • ResNet152
  • ResNet50V2
  • ResNet101V2
  • ResNet152V2
  • MobileNet
  • MobileNetV2
  • NasNetMobile
  • NasNetLarge
  • DenseNet121
  • DenseNet169
  • DenseNet201
  • Xception
  • Inception

In the example below, we load the ResNet50 model from our TensorFlowModelHub and fine-tune it to classify cats and dogs (using the embedded Dogs-vs-Cats dataset):

val modelHub = TFModelHub(cacheDirectory = File("cache/pretrainedModels"))

val modelType = TFModels.CV.ResNet50(noTop = true, inputShape = intArrayOf(IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))

val noTopModel = modelHub.loadModel(modelType)

The topModel is the simplest neural network and can be trained quickly, as it has few parameters.

val topModel = Sequential.of(
   GlobalAvgPool2D(
       name = "top_avg_pool",
   ),
   Dense(
       name = "top_dense",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = 200,
       activation = Activations.Relu
   ),
   Dense(
       name = "pred",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = NUM_CLASSES,
       activation = Activations.Linear
   ),
   noInput = true
)

The new helper function could join two models together: noTop and topModel: val model = Functional.of(pretrainedModel = noTopModel, topModel = topModel)

After that, load weights for the frozen layers from the noTop model, and the weights for the unfrozen layers from the topModel will be initialized during the fit method call.

model.use {
   it.compile(
       optimizer = Adam(),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.loadWeightsForFrozenLayers(hdfFile)

   it.fit(
       dataset = train,
       batchSize = TRAINING_BATCH_SIZE,
       epochs = EPOCHS
   )

   val accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]

   println("Accuracy: $accuracy")
}

The complete example can be found here.


New models: EfficientDet and EfficientNet

Until v0.4, our ModelHub contained only one model (SSD) suitable for solving the Object Detection problem. Starting with this release, we’re gradually expanding the library’s capabilities for solving the Object Detection problem. We’d like to introduce to you a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior object detectors across a wide spectrum of resource constraints.

All models from this family have the same internal architecture which scales for different inputs (image resolution). The final user has a choice of models: from the smallest EfficientDet-D0, model with 3.9 million parameters and 10.2 ms latency on the V100 up to the EfficientDet-D7, with 52 million parameters and 122 ms latency on the V100.

Internally, EfficientDet models use another famous model, EfficientNet, as a backbone. It extracts features from input images and passes them to the next component of the Object Detection model).

EfficientDet Architecture

An example of EfficientDet-D2 usage can be found here.

The EfficientNet model family is also available in the ONNXModelHub. There are 8 different types of models and each model is presented in two variants: full and “noTop” for fine-tuning.

These models achieve better accuracy on the ImageNet dataset with 10x fewer parameters than ResNet or NasNet. If you need fast and accurate image recognition, EfficientNet is a good choice.

An example of EfficientNet0 usage can be found here.


Multiple callbacks

Earlier, Callback support for KotlinDL was pretty simple and not fully compatible with Keras. As a result, users faced difficulties in implementing their neural networks, building the custom validation process, and monitoring the neural network’s training.

The callback object was passed during compilation and was unique for each stage in the model’s lifecycle. However, model compilation can be located in very different places in the code than fit/predict/evaluate, meaning that users may need to create different callbacks for different purposes.

Let’s assume that we need to define EarlyStopping and  TerminateOnNaN for training to handle exceptional cases, and also add two custom callbacks for the prediction and evaluation phases:

val earlyStopping = EarlyStopping(
   monitor = EpochTrainingEvent::valLossValue,
   minDelta = 0.0,
   patience = 2,
   verbose = true,
   mode = EarlyStoppingMode.AUTO,
   baseline = 0.1,
   restoreBestWeights = false
)
val terminateOnNaN = TerminateOnNaN()


class EvaluateCallback : Callback() {
   override fun onTestBatchEnd(batch: Int, batchSize: Int, event: BatchEvent?, logs: History) {
       println("Test batch $batch ends with loss ${event!!.lossValue}..")
   }

   override fun onTestEnd(logs: History) {
       println("Train ends with last loss ${logs.lastBatchEvent().lossValue}")
   }
}

class PredictCallback : Callback() {
   override fun onPredictBatchBegin(batch: Int, batchSize: Int) {
       println("Prediction batch $batch begins.")
   }

   override fun onPredictBatchEnd(batch: Int, batchSize: Int) {
       println("Prediction batch $batch ends.")
   }
}

Let’s pass these callbacks to the model methods:

model.use {
   it.compile(
       optimizer = Adam(clipGradient = ClipGradientByValue(0.1f)),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.logSummary()

   it.fit(
       dataset = train,
       epochs = EPOCHS,
       batchSize = TRAINING_BATCH_SIZE,
       callbacks = listOf(earlyStopping, terminateOnNaN)
   )

   val accuracy = it.evaluate(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = EvaluateCallback()
   ).metrics[Metrics.ACCURACY]


   val predictions = it.predict(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = PredictCallback()
   )
}

Found below in the logs:

The complete example can be found here.


4 new layers and 2 new activation functions

Many contributors to this release have added layers to Kotlin for performing non-trivial logic. With these added layers, you can start working with autoencoders and load the GAN models:

There are also two new activation functions:

These activation functions are not available in the TensorFlow core package, but we decided to add them after seeing how they’ve been widely used in recent papers.

We’d be delighted to look at your pull requests if you’d like to contribute a layer, activation function, callback, or initializer from a recent paper!


Breaking changes in the Image Preprocessing DSL

There are a few major changes in the Image Preprocessing DSL:

  • CustomPreprocessor was removed.
  • The loading section was moved from image preprocessing to the Dataset API
  • A few new Preprocessors were added:
    • Padding
    • CenterCrop
    • Convert
    • Grayscale
    • Normalizing

Here is an example of some of the new operations:

val preprocessing = preprocess {
 transformImage {
   centerCrop {
     size = 214
   }
   pad {
     top = 10
     bottom = 10
     left = 10
     right = 10
     mode = PaddingMode.Fill(Color.BLACK)
   }
   convert {
     colorMode = ColorMode.BGR
   }
 }
 transformTensor {
   normalize {
     mean = floatArrayOf(103.939f, 116.779f, 123.68f)
     std = floatArrayOf(57.375f, 57.12f, 58.395f)
   }
 }
}

Because of the removal of the loading section, the same preprocessing instance could now be used in several datasets:

val trainDataset = OnHeapDataset.create(File(datasetPath, "train"), labelGenerator, preprocessing)
val valDataset = OnHeapDataset.create(File(datasetPath, "val"), labelGenerator, preprocessing)


Standing on the shoulders of giants

We’d like to express our deep gratitude to Alexey Zinoviev for his great work developing the framework from minimum viable product to the current state, efforts towards creating a community, skillful release management, and competent marketing support.

His passion for democratizing AI and his continuous work to improve the ability of Kotlin and Java developers to use ML/DL models deserves great respect and inspires us to continue our work.

We’d also like to express our gratitude to Veniamin Viflyantsev, who’s invested a lot of time and effort into changing the architecture of the api module. Many of his changes are now part of this release.

Our team has expanded! Julia Beliaeva (author of the new version of Image Preprocessing DSL) and Nikita Ermolenko have joined us on a permanent basisWe wish them good luck and look forward to new releases!


Learn more and share your feedback

We hope you enjoyed this brief overview of the new features in KotlinDL 0.4! For more information, including the up-to-date Readme file, visit the project’s home on GitHub. Be sure to check out the KotlinDL guide, which contains detailed information about the library’s basic and advanced features and covers many of the topics mentioned in this blog post in more detail.

If you’ve previously used KotlinDL, use the changelog to find out what has changed and how to upgrade your projects to the stable release.

We’d be very thankful if you’d report any bugs you find to our issue tracker. We’ll try to fix all of the critical issues in the 0.4.1 release.

You’re also welcome to join the #kotlindl channel in Kotlin Slack (get an invite here). In this channel, you can ask questions, participate in discussions, and get notifications about the new preview releases and models in ModelHub.

Continue ReadingKotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

KotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

Version 0.4 of our deep learning library, KotlinDL, is out!

KotlinDL 0.4 is now available on Maven Central with a variety of new features – check out all of the changes that are coming to the new release! We’re currently introducing new models in ModelHub (including the EfficientNet and EfficientDet model families), the experimental high-level Kotlin API for Pose Detection, new layers and preprocessors contributed by the community members, and many other changes.

KotlinDL on GitHub

In this post, we’ll walk you through the changes to the Kotlin Deep Learning library in the 0.4 release:

  1. Pose Detection
  2. NoTop models in the ModelHub
  3. New models: EfficientDet and EfficientNet
  4. Multiple callbacks
  5. Breaking changes in the Image Preprocessing DSL
  6. 4 new layers and 2 new activation functions
  7. Learn more and share your feedback


Pose Detection

Pose detection is using an ML model to detect the pose of a person from an image or a video by detecting the spatial locations of key body joints (keypoints).

We’re excited to launch the MoveNet family of pose detection modes with our new pose detection API in KotlinDL. MoveNet is a fast and accurate model that detects 17 keypoints on the body. The model is offered on ONNXModelHub with two variants, MoveNetSinglePoseLighting and MoveNetSinglePoseThunder. MoveNetSinglePoseLighting is intended for latency-critical applications, while MoveNetSinglePoseThunder is intended for applications that require high accuracy.

If you need to detect a few poses on a given image or video frame, try MoveNetMultiPoseLighting. This model is able to detect multiple people in the image frame at the same time, while still achieving real-time speed.

There are two ways to detect poses within the KotlinDL: parsing the model output manually or using our LightAPI for Pose Detection (the recommended way).

Just load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Run the predictions and print out the pose landmarks and edges connecting the detected pose landmarks:

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPose = poseDetectionModel.detectPose(imageFile = imageFile)

       detectedPose.poseLandmarks.forEach {
           println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
       }

       detectedPose.edges.forEach {
           println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
       }
}

Some visualization examples, where we drew landmarks and edges on the given images, are below.

The complete example can be found here.

If you want to run the MoveNet model to detect multiple poses on the given image, you need to make some minor changes to your code.

First, load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Secondly, run the model and get the MultiPoseDetectionResult object, which contains the list of pairs <DetectedObject, DetectedPose>. As a result, we have access not only to the landmarks’ coordinates and labels, but also to the coordinates of the bounding box for the whole person.

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPoses = poseDetectionModel.detectPoses(imageFile = imageFile, confidence = 0.0f)

       detectedPoses.multiplePoses.forEach { detectedPose ->
           println("Found ${detectedPose.first.classLabel} with probability ${detectedPose.first.probability}")
           detectedPose.second.poseLandmarks.forEach {
               println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
           }

           detectedPose.second.edges.forEach {
               println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
           }
       }
}

Some visualization examples, where we drew the bounding boxes, landmarks, and edges on the images are below.

The complete example can be found here.


NoTop models in the ModelHub

Running predictions on ready-made models is good, but what about fine-tuning them for your tasks?

The classic approach to Transfer Learning is to freeze all layers except the last few and then train the top few layers (the fully connected layers at the top of the network) on a new piece of data, often changing the number of model outputs.

Before the 0.4 release, KotlinDL users needed to remove the last layers manually, but with the 0.4 release, TensorFlowModelHub provides an option to download “noTop” models  – equivalent to earlier available models, but without weights and configurations for the last few layers.

The following “noTop” models are now available:

  • VGG’16
  • VGG’19
  • ResNet50
  • ResNet101
  • ResNet152
  • ResNet50V2
  • ResNet101V2
  • ResNet152V2
  • MobileNet
  • MobileNetV2
  • NasNetMobile
  • NasNetLarge
  • DenseNet121
  • DenseNet169
  • DenseNet201
  • Xception
  • Inception

In the example below, we load the ResNet50 model from our TensorFlowModelHub and fine-tune it to classify cats and dogs (using the embedded Dogs-vs-Cats dataset):

val modelHub = TFModelHub(cacheDirectory = File("cache/pretrainedModels"))

val modelType = TFModels.CV.ResNet50(noTop = true, inputShape = intArrayOf(IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))

val noTopModel = modelHub.loadModel(modelType)

The topModel is the simplest neural network and can be trained quickly, as it has few parameters.

val topModel = Sequential.of(
   GlobalAvgPool2D(
       name = "top_avg_pool",
   ),
   Dense(
       name = "top_dense",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = 200,
       activation = Activations.Relu
   ),
   Dense(
       name = "pred",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = NUM_CLASSES,
       activation = Activations.Linear
   ),
   noInput = true
)

The new helper function could join two models together: noTop and topModel: val model = Functional.of(pretrainedModel = noTopModel, topModel = topModel)

After that, load weights for the frozen layers from the noTop model, and the weights for the unfrozen layers from the topModel will be initialized during the fit method call.

model.use {
   it.compile(
       optimizer = Adam(),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.loadWeightsForFrozenLayers(hdfFile)

   it.fit(
       dataset = train,
       batchSize = TRAINING_BATCH_SIZE,
       epochs = EPOCHS
   )

   val accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]

   println("Accuracy: $accuracy")
}

The complete example can be found here.


New models: EfficientDet and EfficientNet

Until v0.4, our ModelHub contained only one model (SSD) suitable for solving the Object Detection problem. Starting with this release, we’re gradually expanding the library’s capabilities for solving the Object Detection problem. We’d like to introduce to you a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior object detectors across a wide spectrum of resource constraints.

All models from this family have the same internal architecture which scales for different inputs (image resolution). The final user has a choice of models: from the smallest EfficientDet-D0, model with 3.9 million parameters and 10.2 ms latency on the V100 up to the EfficientDet-D7, with 52 million parameters and 122 ms latency on the V100.

Internally, EfficientDet models use another famous model, EfficientNet, as a backbone. It extracts features from input images and passes them to the next component of the Object Detection model).

EfficientDet Architecture

An example of EfficientDet-D2 usage can be found here.

The EfficientNet model family is also available in the ONNXModelHub. There are 8 different types of models and each model is presented in two variants: full and “noTop” for fine-tuning.

These models achieve better accuracy on the ImageNet dataset with 10x fewer parameters than ResNet or NasNet. If you need fast and accurate image recognition, EfficientNet is a good choice.

An example of EfficientNet0 usage can be found here.


Multiple callbacks

Earlier, Callback support for KotlinDL was pretty simple and not fully compatible with Keras. As a result, users faced difficulties in implementing their neural networks, building the custom validation process, and monitoring the neural network’s training.

The callback object was passed during compilation and was unique for each stage in the model’s lifecycle. However, model compilation can be located in very different places in the code than fit/predict/evaluate, meaning that users may need to create different callbacks for different purposes.

Let’s assume that we need to define EarlyStopping and  TerminateOnNaN for training to handle exceptional cases, and also add two custom callbacks for the prediction and evaluation phases:

val earlyStopping = EarlyStopping(
   monitor = EpochTrainingEvent::valLossValue,
   minDelta = 0.0,
   patience = 2,
   verbose = true,
   mode = EarlyStoppingMode.AUTO,
   baseline = 0.1,
   restoreBestWeights = false
)
val terminateOnNaN = TerminateOnNaN()


class EvaluateCallback : Callback() {
   override fun onTestBatchEnd(batch: Int, batchSize: Int, event: BatchEvent?, logs: History) {
       println("Test batch $batch ends with loss ${event!!.lossValue}..")
   }

   override fun onTestEnd(logs: History) {
       println("Train ends with last loss ${logs.lastBatchEvent().lossValue}")
   }
}

class PredictCallback : Callback() {
   override fun onPredictBatchBegin(batch: Int, batchSize: Int) {
       println("Prediction batch $batch begins.")
   }

   override fun onPredictBatchEnd(batch: Int, batchSize: Int) {
       println("Prediction batch $batch ends.")
   }
}

Let’s pass these callbacks to the model methods:

model.use {
   it.compile(
       optimizer = Adam(clipGradient = ClipGradientByValue(0.1f)),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.logSummary()

   it.fit(
       dataset = train,
       epochs = EPOCHS,
       batchSize = TRAINING_BATCH_SIZE,
       callbacks = listOf(earlyStopping, terminateOnNaN)
   )

   val accuracy = it.evaluate(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = EvaluateCallback()
   ).metrics[Metrics.ACCURACY]


   val predictions = it.predict(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = PredictCallback()
   )
}

Found below in the logs:

The complete example can be found here.


4 new layers and 2 new activation functions

Many contributors to this release have added layers to Kotlin for performing non-trivial logic. With these added layers, you can start working with autoencoders and load the GAN models:

There are also two new activation functions:

These activation functions are not available in the TensorFlow core package, but we decided to add them after seeing how they’ve been widely used in recent papers.

We’d be delighted to look at your pull requests if you’d like to contribute a layer, activation function, callback, or initializer from a recent paper!


Breaking changes in the Image Preprocessing DSL

There are a few major changes in the Image Preprocessing DSL:

  • CustomPreprocessor was removed.
  • The loading section was moved from image preprocessing to the Dataset API
  • A few new Preprocessors were added:
    • Padding
    • CenterCrop
    • Convert
    • Grayscale
    • Normalizing

Here is an example of some of the new operations:

val preprocessing = preprocess {
 transformImage {
   centerCrop {
     size = 214
   }
   pad {
     top = 10
     bottom = 10
     left = 10
     right = 10
     mode = PaddingMode.Fill(Color.BLACK)
   }
   convert {
     colorMode = ColorMode.BGR
   }
 }
 transformTensor {
   normalize {
     mean = floatArrayOf(103.939f, 116.779f, 123.68f)
     std = floatArrayOf(57.375f, 57.12f, 58.395f)
   }
 }
}

Because of the removal of the loading section, the same preprocessing instance could now be used in several datasets:

val trainDataset = OnHeapDataset.create(File(datasetPath, "train"), labelGenerator, preprocessing)
val valDataset = OnHeapDataset.create(File(datasetPath, "val"), labelGenerator, preprocessing)


Standing on the shoulders of giants

We’d like to express our deep gratitude to Alexey Zinoviev for his great work developing the framework from minimum viable product to the current state, efforts towards creating a community, skillful release management, and competent marketing support.

His passion for democratizing AI and his continuous work to improve the ability of Kotlin and Java developers to use ML/DL models deserves great respect and inspires us to continue our work.

We’d also like to express our gratitude to Veniamin Viflyantsev, who’s invested a lot of time and effort into changing the architecture of the api module. Many of his changes are now part of this release.

Our team has expanded! Julia Beliaeva (author of the new version of Image Preprocessing DSL) and Nikita Ermolenko have joined us on a permanent basisWe wish them good luck and look forward to new releases!


Learn more and share your feedback

We hope you enjoyed this brief overview of the new features in KotlinDL 0.4! For more information, including the up-to-date Readme file, visit the project’s home on GitHub. Be sure to check out the KotlinDL guide, which contains detailed information about the library’s basic and advanced features and covers many of the topics mentioned in this blog post in more detail.

If you’ve previously used KotlinDL, use the changelog to find out what has changed and how to upgrade your projects to the stable release.

We’d be very thankful if you’d report any bugs you find to our issue tracker. We’ll try to fix all of the critical issues in the 0.4.1 release.

You’re also welcome to join the #kotlindl channel in Kotlin Slack (get an invite here). In this channel, you can ask questions, participate in discussions, and get notifications about the new preview releases and models in ModelHub.

Continue ReadingKotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

KotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

Version 0.4 of our deep learning library, KotlinDL, is out!

KotlinDL 0.4 is now available on Maven Central with a variety of new features – check out all of the changes that are coming to the new release! We’re currently introducing new models in ModelHub (including the EfficientNet and EfficientDet model families), the experimental high-level Kotlin API for Pose Detection, new layers and preprocessors contributed by the community members, and many other changes.

KotlinDL on GitHub

In this post, we’ll walk you through the changes to the Kotlin Deep Learning library in the 0.4 release:

  1. Pose Detection
  2. NoTop models in the ModelHub
  3. New models: EfficientDet and EfficientNet
  4. Multiple callbacks
  5. Breaking changes in the Image Preprocessing DSL
  6. 4 new layers and 2 new activation functions
  7. Learn more and share your feedback


Pose Detection

Pose detection is using an ML model to detect the pose of a person from an image or a video by detecting the spatial locations of key body joints (keypoints).

We’re excited to launch the MoveNet family of pose detection modes with our new pose detection API in KotlinDL. MoveNet is a fast and accurate model that detects 17 keypoints on the body. The model is offered on ONNXModelHub with two variants, MoveNetSinglePoseLighting and MoveNetSinglePoseThunder. MoveNetSinglePoseLighting is intended for latency-critical applications, while MoveNetSinglePoseThunder is intended for applications that require high accuracy.

If you need to detect a few poses on a given image or video frame, try MoveNetMultiPoseLighting. This model is able to detect multiple people in the image frame at the same time, while still achieving real-time speed.

There are two ways to detect poses within the KotlinDL: parsing the model output manually or using our LightAPI for Pose Detection (the recommended way).

Just load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Run the predictions and print out the pose landmarks and edges connecting the detected pose landmarks:

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPose = poseDetectionModel.detectPose(imageFile = imageFile)

       detectedPose.poseLandmarks.forEach {
           println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
       }

       detectedPose.edges.forEach {
           println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
       }
}

Some visualization examples, where we drew landmarks and edges on the given images, are below.

The complete example can be found here.

If you want to run the MoveNet model to detect multiple poses on the given image, you need to make some minor changes to your code.

First, load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Secondly, run the model and get the MultiPoseDetectionResult object, which contains the list of pairs <DetectedObject, DetectedPose>. As a result, we have access not only to the landmarks’ coordinates and labels, but also to the coordinates of the bounding box for the whole person.

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPoses = poseDetectionModel.detectPoses(imageFile = imageFile, confidence = 0.0f)

       detectedPoses.multiplePoses.forEach { detectedPose ->
           println("Found ${detectedPose.first.classLabel} with probability ${detectedPose.first.probability}")
           detectedPose.second.poseLandmarks.forEach {
               println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
           }

           detectedPose.second.edges.forEach {
               println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
           }
       }
}

Some visualization examples, where we drew the bounding boxes, landmarks, and edges on the images are below.

The complete example can be found here.


NoTop models in the ModelHub

Running predictions on ready-made models is good, but what about fine-tuning them for your tasks?

The classic approach to Transfer Learning is to freeze all layers except the last few and then train the top few layers (the fully connected layers at the top of the network) on a new piece of data, often changing the number of model outputs.

Before the 0.4 release, KotlinDL users needed to remove the last layers manually, but with the 0.4 release, TensorFlowModelHub provides an option to download “noTop” models  – equivalent to earlier available models, but without weights and configurations for the last few layers.

The following “noTop” models are now available:

  • VGG’16
  • VGG’19
  • ResNet50
  • ResNet101
  • ResNet152
  • ResNet50V2
  • ResNet101V2
  • ResNet152V2
  • MobileNet
  • MobileNetV2
  • NasNetMobile
  • NasNetLarge
  • DenseNet121
  • DenseNet169
  • DenseNet201
  • Xception
  • Inception

In the example below, we load the ResNet50 model from our TensorFlowModelHub and fine-tune it to classify cats and dogs (using the embedded Dogs-vs-Cats dataset):

val modelHub = TFModelHub(cacheDirectory = File("cache/pretrainedModels"))

val modelType = TFModels.CV.ResNet50(noTop = true, inputShape = intArrayOf(IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))

val noTopModel = modelHub.loadModel(modelType)

The topModel is the simplest neural network and can be trained quickly, as it has few parameters.

val topModel = Sequential.of(
   GlobalAvgPool2D(
       name = "top_avg_pool",
   ),
   Dense(
       name = "top_dense",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = 200,
       activation = Activations.Relu
   ),
   Dense(
       name = "pred",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = NUM_CLASSES,
       activation = Activations.Linear
   ),
   noInput = true
)

The new helper function could join two models together: noTop and topModel: val model = Functional.of(pretrainedModel = noTopModel, topModel = topModel)

After that, load weights for the frozen layers from the noTop model, and the weights for the unfrozen layers from the topModel will be initialized during the fit method call.

model.use {
   it.compile(
       optimizer = Adam(),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.loadWeightsForFrozenLayers(hdfFile)

   it.fit(
       dataset = train,
       batchSize = TRAINING_BATCH_SIZE,
       epochs = EPOCHS
   )

   val accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]

   println("Accuracy: $accuracy")
}

The complete example can be found here.


New models: EfficientDet and EfficientNet

Until v0.4, our ModelHub contained only one model (SSD) suitable for solving the Object Detection problem. Starting with this release, we’re gradually expanding the library’s capabilities for solving the Object Detection problem. We’d like to introduce to you a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior object detectors across a wide spectrum of resource constraints.

All models from this family have the same internal architecture which scales for different inputs (image resolution). The final user has a choice of models: from the smallest EfficientDet-D0, model with 3.9 million parameters and 10.2 ms latency on the V100 up to the EfficientDet-D7, with 52 million parameters and 122 ms latency on the V100.

Internally, EfficientDet models use another famous model, EfficientNet, as a backbone. It extracts features from input images and passes them to the next component of the Object Detection model).

EfficientDet Architecture

An example of EfficientDet-D2 usage can be found here.

The EfficientNet model family is also available in the ONNXModelHub. There are 8 different types of models and each model is presented in two variants: full and “noTop” for fine-tuning.

These models achieve better accuracy on the ImageNet dataset with 10x fewer parameters than ResNet or NasNet. If you need fast and accurate image recognition, EfficientNet is a good choice.

An example of EfficientNet0 usage can be found here.


Multiple callbacks

Earlier, Callback support for KotlinDL was pretty simple and not fully compatible with Keras. As a result, users faced difficulties in implementing their neural networks, building the custom validation process, and monitoring the neural network’s training.

The callback object was passed during compilation and was unique for each stage in the model’s lifecycle. However, model compilation can be located in very different places in the code than fit/predict/evaluate, meaning that users may need to create different callbacks for different purposes.

Let’s assume that we need to define EarlyStopping and  TerminateOnNaN for training to handle exceptional cases, and also add two custom callbacks for the prediction and evaluation phases:

val earlyStopping = EarlyStopping(
   monitor = EpochTrainingEvent::valLossValue,
   minDelta = 0.0,
   patience = 2,
   verbose = true,
   mode = EarlyStoppingMode.AUTO,
   baseline = 0.1,
   restoreBestWeights = false
)
val terminateOnNaN = TerminateOnNaN()


class EvaluateCallback : Callback() {
   override fun onTestBatchEnd(batch: Int, batchSize: Int, event: BatchEvent?, logs: History) {
       println("Test batch $batch ends with loss ${event!!.lossValue}..")
   }

   override fun onTestEnd(logs: History) {
       println("Train ends with last loss ${logs.lastBatchEvent().lossValue}")
   }
}

class PredictCallback : Callback() {
   override fun onPredictBatchBegin(batch: Int, batchSize: Int) {
       println("Prediction batch $batch begins.")
   }

   override fun onPredictBatchEnd(batch: Int, batchSize: Int) {
       println("Prediction batch $batch ends.")
   }
}

Let’s pass these callbacks to the model methods:

model.use {
   it.compile(
       optimizer = Adam(clipGradient = ClipGradientByValue(0.1f)),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.logSummary()

   it.fit(
       dataset = train,
       epochs = EPOCHS,
       batchSize = TRAINING_BATCH_SIZE,
       callbacks = listOf(earlyStopping, terminateOnNaN)
   )

   val accuracy = it.evaluate(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = EvaluateCallback()
   ).metrics[Metrics.ACCURACY]


   val predictions = it.predict(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = PredictCallback()
   )
}

Found below in the logs:

The complete example can be found here.


4 new layers and 2 new activation functions

Many contributors to this release have added layers to Kotlin for performing non-trivial logic. With these added layers, you can start working with autoencoders and load the GAN models:

There are also two new activation functions:

These activation functions are not available in the TensorFlow core package, but we decided to add them after seeing how they’ve been widely used in recent papers.

We’d be delighted to look at your pull requests if you’d like to contribute a layer, activation function, callback, or initializer from a recent paper!


Breaking changes in the Image Preprocessing DSL

There are a few major changes in the Image Preprocessing DSL:

  • CustomPreprocessor was removed.
  • The loading section was moved from image preprocessing to the Dataset API
  • A few new Preprocessors were added:
    • Padding
    • CenterCrop
    • Convert
    • Grayscale
    • Normalizing

Here is an example of some of the new operations:

val preprocessing = preprocess {
 transformImage {
   centerCrop {
     size = 214
   }
   pad {
     top = 10
     bottom = 10
     left = 10
     right = 10
     mode = PaddingMode.Fill(Color.BLACK)
   }
   convert {
     colorMode = ColorMode.BGR
   }
 }
 transformTensor {
   normalize {
     mean = floatArrayOf(103.939f, 116.779f, 123.68f)
     std = floatArrayOf(57.375f, 57.12f, 58.395f)
   }
 }
}

Because of the removal of the loading section, the same preprocessing instance could now be used in several datasets:

val trainDataset = OnHeapDataset.create(File(datasetPath, "train"), labelGenerator, preprocessing)
val valDataset = OnHeapDataset.create(File(datasetPath, "val"), labelGenerator, preprocessing)


Standing on the shoulders of giants

We’d like to express our deep gratitude to Alexey Zinoviev for his great work developing the framework from minimum viable product to the current state, efforts towards creating a community, skillful release management, and competent marketing support.

His passion for democratizing AI and his continuous work to improve the ability of Kotlin and Java developers to use ML/DL models deserves great respect and inspires us to continue our work.

We’d also like to express our gratitude to Veniamin Viflyantsev, who’s invested a lot of time and effort into changing the architecture of the api module. Many of his changes are now part of this release.

Our team has expanded! Julia Beliaeva (author of the new version of Image Preprocessing DSL) and Nikita Ermolenko have joined us on a permanent basisWe wish them good luck and look forward to new releases!


Learn more and share your feedback

We hope you enjoyed this brief overview of the new features in KotlinDL 0.4! For more information, including the up-to-date Readme file, visit the project’s home on GitHub. Be sure to check out the KotlinDL guide, which contains detailed information about the library’s basic and advanced features and covers many of the topics mentioned in this blog post in more detail.

If you’ve previously used KotlinDL, use the changelog to find out what has changed and how to upgrade your projects to the stable release.

We’d be very thankful if you’d report any bugs you find to our issue tracker. We’ll try to fix all of the critical issues in the 0.4.1 release.

You’re also welcome to join the #kotlindl channel in Kotlin Slack (get an invite here). In this channel, you can ask questions, participate in discussions, and get notifications about the new preview releases and models in ModelHub.

Continue ReadingKotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

KotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

Version 0.4 of our deep learning library, KotlinDL, is out!

KotlinDL 0.4 is now available on Maven Central with a variety of new features – check out all of the changes that are coming to the new release! We’re currently introducing new models in ModelHub (including the EfficientNet and EfficientDet model families), the experimental high-level Kotlin API for Pose Detection, new layers and preprocessors contributed by the community members, and many other changes.

KotlinDL on GitHub

In this post, we’ll walk you through the changes to the Kotlin Deep Learning library in the 0.4 release:

  1. Pose Detection
  2. NoTop models in the ModelHub
  3. New models: EfficientDet and EfficientNet
  4. Multiple callbacks
  5. Breaking changes in the Image Preprocessing DSL
  6. 4 new layers and 2 new activation functions
  7. Learn more and share your feedback


Pose Detection

Pose detection is using an ML model to detect the pose of a person from an image or a video by detecting the spatial locations of key body joints (keypoints).

We’re excited to launch the MoveNet family of pose detection modes with our new pose detection API in KotlinDL. MoveNet is a fast and accurate model that detects 17 keypoints on the body. The model is offered on ONNXModelHub with two variants, MoveNetSinglePoseLighting and MoveNetSinglePoseThunder. MoveNetSinglePoseLighting is intended for latency-critical applications, while MoveNetSinglePoseThunder is intended for applications that require high accuracy.

If you need to detect a few poses on a given image or video frame, try MoveNetMultiPoseLighting. This model is able to detect multiple people in the image frame at the same time, while still achieving real-time speed.

There are two ways to detect poses within the KotlinDL: parsing the model output manually or using our LightAPI for Pose Detection (the recommended way).

Just load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Run the predictions and print out the pose landmarks and edges connecting the detected pose landmarks:

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPose = poseDetectionModel.detectPose(imageFile = imageFile)

       detectedPose.poseLandmarks.forEach {
           println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
       }

       detectedPose.edges.forEach {
           println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
       }
}

Some visualization examples, where we drew landmarks and edges on the given images, are below.

The complete example can be found here.

If you want to run the MoveNet model to detect multiple poses on the given image, you need to make some minor changes to your code.

First, load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Secondly, run the model and get the MultiPoseDetectionResult object, which contains the list of pairs <DetectedObject, DetectedPose>. As a result, we have access not only to the landmarks’ coordinates and labels, but also to the coordinates of the bounding box for the whole person.

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPoses = poseDetectionModel.detectPoses(imageFile = imageFile, confidence = 0.0f)

       detectedPoses.multiplePoses.forEach { detectedPose ->
           println("Found ${detectedPose.first.classLabel} with probability ${detectedPose.first.probability}")
           detectedPose.second.poseLandmarks.forEach {
               println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
           }

           detectedPose.second.edges.forEach {
               println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
           }
       }
}

Some visualization examples, where we drew the bounding boxes, landmarks, and edges on the images are below.

The complete example can be found here.


NoTop models in the ModelHub

Running predictions on ready-made models is good, but what about fine-tuning them for your tasks?

The classic approach to Transfer Learning is to freeze all layers except the last few and then train the top few layers (the fully connected layers at the top of the network) on a new piece of data, often changing the number of model outputs.

Before the 0.4 release, KotlinDL users needed to remove the last layers manually, but with the 0.4 release, TensorFlowModelHub provides an option to download “noTop” models  – equivalent to earlier available models, but without weights and configurations for the last few layers.

The following “noTop” models are now available:

  • VGG’16
  • VGG’19
  • ResNet50
  • ResNet101
  • ResNet152
  • ResNet50V2
  • ResNet101V2
  • ResNet152V2
  • MobileNet
  • MobileNetV2
  • NasNetMobile
  • NasNetLarge
  • DenseNet121
  • DenseNet169
  • DenseNet201
  • Xception
  • Inception

In the example below, we load the ResNet50 model from our TensorFlowModelHub and fine-tune it to classify cats and dogs (using the embedded Dogs-vs-Cats dataset):

val modelHub = TFModelHub(cacheDirectory = File("cache/pretrainedModels"))

val modelType = TFModels.CV.ResNet50(noTop = true, inputShape = intArrayOf(IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))

val noTopModel = modelHub.loadModel(modelType)

The topModel is the simplest neural network and can be trained quickly, as it has few parameters.

val topModel = Sequential.of(
   GlobalAvgPool2D(
       name = "top_avg_pool",
   ),
   Dense(
       name = "top_dense",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = 200,
       activation = Activations.Relu
   ),
   Dense(
       name = "pred",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = NUM_CLASSES,
       activation = Activations.Linear
   ),
   noInput = true
)

The new helper function could join two models together: noTop and topModel: val model = Functional.of(pretrainedModel = noTopModel, topModel = topModel)

After that, load weights for the frozen layers from the noTop model, and the weights for the unfrozen layers from the topModel will be initialized during the fit method call.

model.use {
   it.compile(
       optimizer = Adam(),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.loadWeightsForFrozenLayers(hdfFile)

   it.fit(
       dataset = train,
       batchSize = TRAINING_BATCH_SIZE,
       epochs = EPOCHS
   )

   val accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]

   println("Accuracy: $accuracy")
}

The complete example can be found here.


New models: EfficientDet and EfficientNet

Until v0.4, our ModelHub contained only one model (SSD) suitable for solving the Object Detection problem. Starting with this release, we’re gradually expanding the library’s capabilities for solving the Object Detection problem. We’d like to introduce to you a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior object detectors across a wide spectrum of resource constraints.

All models from this family have the same internal architecture which scales for different inputs (image resolution). The final user has a choice of models: from the smallest EfficientDet-D0, model with 3.9 million parameters and 10.2 ms latency on the V100 up to the EfficientDet-D7, with 52 million parameters and 122 ms latency on the V100.

Internally, EfficientDet models use another famous model, EfficientNet, as a backbone. It extracts features from input images and passes them to the next component of the Object Detection model).

EfficientDet Architecture

An example of EfficientDet-D2 usage can be found here.

The EfficientNet model family is also available in the ONNXModelHub. There are 8 different types of models and each model is presented in two variants: full and “noTop” for fine-tuning.

These models achieve better accuracy on the ImageNet dataset with 10x fewer parameters than ResNet or NasNet. If you need fast and accurate image recognition, EfficientNet is a good choice.

An example of EfficientNet0 usage can be found here.


Multiple callbacks

Earlier, Callback support for KotlinDL was pretty simple and not fully compatible with Keras. As a result, users faced difficulties in implementing their neural networks, building the custom validation process, and monitoring the neural network’s training.

The callback object was passed during compilation and was unique for each stage in the model’s lifecycle. However, model compilation can be located in very different places in the code than fit/predict/evaluate, meaning that users may need to create different callbacks for different purposes.

Let’s assume that we need to define EarlyStopping and  TerminateOnNaN for training to handle exceptional cases, and also add two custom callbacks for the prediction and evaluation phases:

val earlyStopping = EarlyStopping(
   monitor = EpochTrainingEvent::valLossValue,
   minDelta = 0.0,
   patience = 2,
   verbose = true,
   mode = EarlyStoppingMode.AUTO,
   baseline = 0.1,
   restoreBestWeights = false
)
val terminateOnNaN = TerminateOnNaN()


class EvaluateCallback : Callback() {
   override fun onTestBatchEnd(batch: Int, batchSize: Int, event: BatchEvent?, logs: History) {
       println("Test batch $batch ends with loss ${event!!.lossValue}..")
   }

   override fun onTestEnd(logs: History) {
       println("Train ends with last loss ${logs.lastBatchEvent().lossValue}")
   }
}

class PredictCallback : Callback() {
   override fun onPredictBatchBegin(batch: Int, batchSize: Int) {
       println("Prediction batch $batch begins.")
   }

   override fun onPredictBatchEnd(batch: Int, batchSize: Int) {
       println("Prediction batch $batch ends.")
   }
}

Let’s pass these callbacks to the model methods:

model.use {
   it.compile(
       optimizer = Adam(clipGradient = ClipGradientByValue(0.1f)),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.logSummary()

   it.fit(
       dataset = train,
       epochs = EPOCHS,
       batchSize = TRAINING_BATCH_SIZE,
       callbacks = listOf(earlyStopping, terminateOnNaN)
   )

   val accuracy = it.evaluate(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = EvaluateCallback()
   ).metrics[Metrics.ACCURACY]


   val predictions = it.predict(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = PredictCallback()
   )
}

Found below in the logs:

The complete example can be found here.


4 new layers and 2 new activation functions

Many contributors to this release have added layers to Kotlin for performing non-trivial logic. With these added layers, you can start working with autoencoders and load the GAN models:

There are also two new activation functions:

These activation functions are not available in the TensorFlow core package, but we decided to add them after seeing how they’ve been widely used in recent papers.

We’d be delighted to look at your pull requests if you’d like to contribute a layer, activation function, callback, or initializer from a recent paper!


Breaking changes in the Image Preprocessing DSL

There are a few major changes in the Image Preprocessing DSL:

  • CustomPreprocessor was removed.
  • The loading section was moved from image preprocessing to the Dataset API
  • A few new Preprocessors were added:
    • Padding
    • CenterCrop
    • Convert
    • Grayscale
    • Normalizing

Here is an example of some of the new operations:

val preprocessing = preprocess {
 transformImage {
   centerCrop {
     size = 214
   }
   pad {
     top = 10
     bottom = 10
     left = 10
     right = 10
     mode = PaddingMode.Fill(Color.BLACK)
   }
   convert {
     colorMode = ColorMode.BGR
   }
 }
 transformTensor {
   normalize {
     mean = floatArrayOf(103.939f, 116.779f, 123.68f)
     std = floatArrayOf(57.375f, 57.12f, 58.395f)
   }
 }
}

Because of the removal of the loading section, the same preprocessing instance could now be used in several datasets:

val trainDataset = OnHeapDataset.create(File(datasetPath, "train"), labelGenerator, preprocessing)
val valDataset = OnHeapDataset.create(File(datasetPath, "val"), labelGenerator, preprocessing)


Standing on the shoulders of giants

We’d like to express our deep gratitude to Alexey Zinoviev for his great work developing the framework from minimum viable product to the current state, efforts towards creating a community, skillful release management, and competent marketing support.

His passion for democratizing AI and his continuous work to improve the ability of Kotlin and Java developers to use ML/DL models deserves great respect and inspires us to continue our work.

We’d also like to express our gratitude to Veniamin Viflyantsev, who’s invested a lot of time and effort into changing the architecture of the api module. Many of his changes are now part of this release.

Our team has expanded! Julia Beliaeva (author of the new version of Image Preprocessing DSL) and Nikita Ermolenko have joined us on a permanent basisWe wish them good luck and look forward to new releases!


Learn more and share your feedback

We hope you enjoyed this brief overview of the new features in KotlinDL 0.4! For more information, including the up-to-date Readme file, visit the project’s home on GitHub. Be sure to check out the KotlinDL guide, which contains detailed information about the library’s basic and advanced features and covers many of the topics mentioned in this blog post in more detail.

If you’ve previously used KotlinDL, use the changelog to find out what has changed and how to upgrade your projects to the stable release.

We’d be very thankful if you’d report any bugs you find to our issue tracker. We’ll try to fix all of the critical issues in the 0.4.1 release.

You’re also welcome to join the #kotlindl channel in Kotlin Slack (get an invite here). In this channel, you can ask questions, participate in discussions, and get notifications about the new preview releases and models in ModelHub.

Continue ReadingKotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

KotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

Version 0.4 of our deep learning library, KotlinDL, is out!

KotlinDL 0.4 is now available on Maven Central with a variety of new features – check out all of the changes that are coming to the new release! We’re currently introducing new models in ModelHub (including the EfficientNet and EfficientDet model families), the experimental high-level Kotlin API for Pose Detection, new layers and preprocessors contributed by the community members, and many other changes.

KotlinDL on GitHub

In this post, we’ll walk you through the changes to the Kotlin Deep Learning library in the 0.4 release:

  1. Pose Detection
  2. NoTop models in the ModelHub
  3. New models: EfficientDet and EfficientNet
  4. Multiple callbacks
  5. Breaking changes in the Image Preprocessing DSL
  6. 4 new layers and 2 new activation functions
  7. Learn more and share your feedback


Pose Detection

Pose detection is using an ML model to detect the pose of a person from an image or a video by detecting the spatial locations of key body joints (keypoints).

We’re excited to launch the MoveNet family of pose detection modes with our new pose detection API in KotlinDL. MoveNet is a fast and accurate model that detects 17 keypoints on the body. The model is offered on ONNXModelHub with two variants, MoveNetSinglePoseLighting and MoveNetSinglePoseThunder. MoveNetSinglePoseLighting is intended for latency-critical applications, while MoveNetSinglePoseThunder is intended for applications that require high accuracy.

If you need to detect a few poses on a given image or video frame, try MoveNetMultiPoseLighting. This model is able to detect multiple people in the image frame at the same time, while still achieving real-time speed.

There are two ways to detect poses within the KotlinDL: parsing the model output manually or using our LightAPI for Pose Detection (the recommended way).

Just load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Run the predictions and print out the pose landmarks and edges connecting the detected pose landmarks:

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPose = poseDetectionModel.detectPose(imageFile = imageFile)

       detectedPose.poseLandmarks.forEach {
           println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
       }

       detectedPose.edges.forEach {
           println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
       }
}

Some visualization examples, where we drew landmarks and edges on the given images, are below.

The complete example can be found here.

If you want to run the MoveNet model to detect multiple poses on the given image, you need to make some minor changes to your code.

First, load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Secondly, run the model and get the MultiPoseDetectionResult object, which contains the list of pairs <DetectedObject, DetectedPose>. As a result, we have access not only to the landmarks’ coordinates and labels, but also to the coordinates of the bounding box for the whole person.

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPoses = poseDetectionModel.detectPoses(imageFile = imageFile, confidence = 0.0f)

       detectedPoses.multiplePoses.forEach { detectedPose ->
           println("Found ${detectedPose.first.classLabel} with probability ${detectedPose.first.probability}")
           detectedPose.second.poseLandmarks.forEach {
               println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
           }

           detectedPose.second.edges.forEach {
               println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
           }
       }
}

Some visualization examples, where we drew the bounding boxes, landmarks, and edges on the images are below.

The complete example can be found here.


NoTop models in the ModelHub

Running predictions on ready-made models is good, but what about fine-tuning them for your tasks?

The classic approach to Transfer Learning is to freeze all layers except the last few and then train the top few layers (the fully connected layers at the top of the network) on a new piece of data, often changing the number of model outputs.

Before the 0.4 release, KotlinDL users needed to remove the last layers manually, but with the 0.4 release, TensorFlowModelHub provides an option to download “noTop” models  – equivalent to earlier available models, but without weights and configurations for the last few layers.

The following “noTop” models are now available:

  • VGG’16
  • VGG’19
  • ResNet50
  • ResNet101
  • ResNet152
  • ResNet50V2
  • ResNet101V2
  • ResNet152V2
  • MobileNet
  • MobileNetV2
  • NasNetMobile
  • NasNetLarge
  • DenseNet121
  • DenseNet169
  • DenseNet201
  • Xception
  • Inception

In the example below, we load the ResNet50 model from our TensorFlowModelHub and fine-tune it to classify cats and dogs (using the embedded Dogs-vs-Cats dataset):

val modelHub = TFModelHub(cacheDirectory = File("cache/pretrainedModels"))

val modelType = TFModels.CV.ResNet50(noTop = true, inputShape = intArrayOf(IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))

val noTopModel = modelHub.loadModel(modelType)

The topModel is the simplest neural network and can be trained quickly, as it has few parameters.

val topModel = Sequential.of(
   GlobalAvgPool2D(
       name = "top_avg_pool",
   ),
   Dense(
       name = "top_dense",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = 200,
       activation = Activations.Relu
   ),
   Dense(
       name = "pred",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = NUM_CLASSES,
       activation = Activations.Linear
   ),
   noInput = true
)

The new helper function could join two models together: noTop and topModel: val model = Functional.of(pretrainedModel = noTopModel, topModel = topModel)

After that, load weights for the frozen layers from the noTop model, and the weights for the unfrozen layers from the topModel will be initialized during the fit method call.

model.use {
   it.compile(
       optimizer = Adam(),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.loadWeightsForFrozenLayers(hdfFile)

   it.fit(
       dataset = train,
       batchSize = TRAINING_BATCH_SIZE,
       epochs = EPOCHS
   )

   val accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]

   println("Accuracy: $accuracy")
}

The complete example can be found here.


New models: EfficientDet and EfficientNet

Until v0.4, our ModelHub contained only one model (SSD) suitable for solving the Object Detection problem. Starting with this release, we’re gradually expanding the library’s capabilities for solving the Object Detection problem. We’d like to introduce to you a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior object detectors across a wide spectrum of resource constraints.

All models from this family have the same internal architecture which scales for different inputs (image resolution). The final user has a choice of models: from the smallest EfficientDet-D0, model with 3.9 million parameters and 10.2 ms latency on the V100 up to the EfficientDet-D7, with 52 million parameters and 122 ms latency on the V100.

Internally, EfficientDet models use another famous model, EfficientNet, as a backbone. It extracts features from input images and passes them to the next component of the Object Detection model).

EfficientDet Architecture

An example of EfficientDet-D2 usage can be found here.

The EfficientNet model family is also available in the ONNXModelHub. There are 8 different types of models and each model is presented in two variants: full and “noTop” for fine-tuning.

These models achieve better accuracy on the ImageNet dataset with 10x fewer parameters than ResNet or NasNet. If you need fast and accurate image recognition, EfficientNet is a good choice.

An example of EfficientNet0 usage can be found here.


Multiple callbacks

Earlier, Callback support for KotlinDL was pretty simple and not fully compatible with Keras. As a result, users faced difficulties in implementing their neural networks, building the custom validation process, and monitoring the neural network’s training.

The callback object was passed during compilation and was unique for each stage in the model’s lifecycle. However, model compilation can be located in very different places in the code than fit/predict/evaluate, meaning that users may need to create different callbacks for different purposes.

Let’s assume that we need to define EarlyStopping and  TerminateOnNaN for training to handle exceptional cases, and also add two custom callbacks for the prediction and evaluation phases:

val earlyStopping = EarlyStopping(
   monitor = EpochTrainingEvent::valLossValue,
   minDelta = 0.0,
   patience = 2,
   verbose = true,
   mode = EarlyStoppingMode.AUTO,
   baseline = 0.1,
   restoreBestWeights = false
)
val terminateOnNaN = TerminateOnNaN()


class EvaluateCallback : Callback() {
   override fun onTestBatchEnd(batch: Int, batchSize: Int, event: BatchEvent?, logs: History) {
       println("Test batch $batch ends with loss ${event!!.lossValue}..")
   }

   override fun onTestEnd(logs: History) {
       println("Train ends with last loss ${logs.lastBatchEvent().lossValue}")
   }
}

class PredictCallback : Callback() {
   override fun onPredictBatchBegin(batch: Int, batchSize: Int) {
       println("Prediction batch $batch begins.")
   }

   override fun onPredictBatchEnd(batch: Int, batchSize: Int) {
       println("Prediction batch $batch ends.")
   }
}

Let’s pass these callbacks to the model methods:

model.use {
   it.compile(
       optimizer = Adam(clipGradient = ClipGradientByValue(0.1f)),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.logSummary()

   it.fit(
       dataset = train,
       epochs = EPOCHS,
       batchSize = TRAINING_BATCH_SIZE,
       callbacks = listOf(earlyStopping, terminateOnNaN)
   )

   val accuracy = it.evaluate(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = EvaluateCallback()
   ).metrics[Metrics.ACCURACY]


   val predictions = it.predict(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = PredictCallback()
   )
}

Found below in the logs:

The complete example can be found here.


4 new layers and 2 new activation functions

Many contributors to this release have added layers to Kotlin for performing non-trivial logic. With these added layers, you can start working with autoencoders and load the GAN models:

There are also two new activation functions:

These activation functions are not available in the TensorFlow core package, but we decided to add them after seeing how they’ve been widely used in recent papers.

We’d be delighted to look at your pull requests if you’d like to contribute a layer, activation function, callback, or initializer from a recent paper!


Breaking changes in the Image Preprocessing DSL

There are a few major changes in the Image Preprocessing DSL:

  • CustomPreprocessor was removed.
  • The loading section was moved from image preprocessing to the Dataset API
  • A few new Preprocessors were added:
    • Padding
    • CenterCrop
    • Convert
    • Grayscale
    • Normalizing

Here is an example of some of the new operations:

val preprocessing = preprocess {
 transformImage {
   centerCrop {
     size = 214
   }
   pad {
     top = 10
     bottom = 10
     left = 10
     right = 10
     mode = PaddingMode.Fill(Color.BLACK)
   }
   convert {
     colorMode = ColorMode.BGR
   }
 }
 transformTensor {
   normalize {
     mean = floatArrayOf(103.939f, 116.779f, 123.68f)
     std = floatArrayOf(57.375f, 57.12f, 58.395f)
   }
 }
}

Because of the removal of the loading section, the same preprocessing instance could now be used in several datasets:

val trainDataset = OnHeapDataset.create(File(datasetPath, "train"), labelGenerator, preprocessing)
val valDataset = OnHeapDataset.create(File(datasetPath, "val"), labelGenerator, preprocessing)


Standing on the shoulders of giants

We’d like to express our deep gratitude to Alexey Zinoviev for his great work developing the framework from minimum viable product to the current state, efforts towards creating a community, skillful release management, and competent marketing support.

His passion for democratizing AI and his continuous work to improve the ability of Kotlin and Java developers to use ML/DL models deserves great respect and inspires us to continue our work.

We’d also like to express our gratitude to Veniamin Viflyantsev, who’s invested a lot of time and effort into changing the architecture of the api module. Many of his changes are now part of this release.

Our team has expanded! Julia Beliaeva (author of the new version of Image Preprocessing DSL) and Nikita Ermolenko have joined us on a permanent basisWe wish them good luck and look forward to new releases!


Learn more and share your feedback

We hope you enjoyed this brief overview of the new features in KotlinDL 0.4! For more information, including the up-to-date Readme file, visit the project’s home on GitHub. Be sure to check out the KotlinDL guide, which contains detailed information about the library’s basic and advanced features and covers many of the topics mentioned in this blog post in more detail.

If you’ve previously used KotlinDL, use the changelog to find out what has changed and how to upgrade your projects to the stable release.

We’d be very thankful if you’d report any bugs you find to our issue tracker. We’ll try to fix all of the critical issues in the 0.4.1 release.

You’re also welcome to join the #kotlindl channel in Kotlin Slack (get an invite here). In this channel, you can ask questions, participate in discussions, and get notifications about the new preview releases and models in ModelHub.

Continue ReadingKotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

KotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

Version 0.4 of our deep learning library, KotlinDL, is out!

KotlinDL 0.4 is now available on Maven Central with a variety of new features – check out all of the changes that are coming to the new release! We’re currently introducing new models in ModelHub (including the EfficientNet and EfficientDet model families), the experimental high-level Kotlin API for Pose Detection, new layers and preprocessors contributed by the community members, and many other changes.

KotlinDL on GitHub

In this post, we’ll walk you through the changes to the Kotlin Deep Learning library in the 0.4 release:

  1. Pose Detection
  2. NoTop models in the ModelHub
  3. New models: EfficientDet and EfficientNet
  4. Multiple callbacks
  5. Breaking changes in the Image Preprocessing DSL
  6. 4 new layers and 2 new activation functions
  7. Learn more and share your feedback


Pose Detection

Pose detection is using an ML model to detect the pose of a person from an image or a video by detecting the spatial locations of key body joints (keypoints).

We’re excited to launch the MoveNet family of pose detection modes with our new pose detection API in KotlinDL. MoveNet is a fast and accurate model that detects 17 keypoints on the body. The model is offered on ONNXModelHub with two variants, MoveNetSinglePoseLighting and MoveNetSinglePoseThunder. MoveNetSinglePoseLighting is intended for latency-critical applications, while MoveNetSinglePoseThunder is intended for applications that require high accuracy.

If you need to detect a few poses on a given image or video frame, try MoveNetMultiPoseLighting. This model is able to detect multiple people in the image frame at the same time, while still achieving real-time speed.

There are two ways to detect poses within the KotlinDL: parsing the model output manually or using our LightAPI for Pose Detection (the recommended way).

Just load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Run the predictions and print out the pose landmarks and edges connecting the detected pose landmarks:

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPose = poseDetectionModel.detectPose(imageFile = imageFile)

       detectedPose.poseLandmarks.forEach {
           println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
       }

       detectedPose.edges.forEach {
           println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
       }
}

Some visualization examples, where we drew landmarks and edges on the given images, are below.

The complete example can be found here.

If you want to run the MoveNet model to detect multiple poses on the given image, you need to make some minor changes to your code.

First, load the model:

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.PoseDetection.MoveNetSinglePoseLighting.pretrainedModel(modelHub)

Secondly, run the model and get the MultiPoseDetectionResult object, which contains the list of pairs <DetectedObject, DetectedPose>. As a result, we have access not only to the landmarks’ coordinates and labels, but also to the coordinates of the bounding box for the whole person.

model.use { poseDetectionModel ->
       val imageFile = …
       val detectedPoses = poseDetectionModel.detectPoses(imageFile = imageFile, confidence = 0.0f)

       detectedPoses.multiplePoses.forEach { detectedPose ->
           println("Found ${detectedPose.first.classLabel} with probability ${detectedPose.first.probability}")
           detectedPose.second.poseLandmarks.forEach {
               println("Found ${it.poseLandmarkLabel} with probability ${it.probability}")
           }

           detectedPose.second.edges.forEach {
               println("The ${it.poseEdgeLabel} starts at ${it.start.poseLandmarkLabel} and ends with ${it.end.poseLandmarkLabel}")
           }
       }
}

Some visualization examples, where we drew the bounding boxes, landmarks, and edges on the images are below.

The complete example can be found here.


NoTop models in the ModelHub

Running predictions on ready-made models is good, but what about fine-tuning them for your tasks?

The classic approach to Transfer Learning is to freeze all layers except the last few and then train the top few layers (the fully connected layers at the top of the network) on a new piece of data, often changing the number of model outputs.

Before the 0.4 release, KotlinDL users needed to remove the last layers manually, but with the 0.4 release, TensorFlowModelHub provides an option to download “noTop” models  – equivalent to earlier available models, but without weights and configurations for the last few layers.

The following “noTop” models are now available:

  • VGG’16
  • VGG’19
  • ResNet50
  • ResNet101
  • ResNet152
  • ResNet50V2
  • ResNet101V2
  • ResNet152V2
  • MobileNet
  • MobileNetV2
  • NasNetMobile
  • NasNetLarge
  • DenseNet121
  • DenseNet169
  • DenseNet201
  • Xception
  • Inception

In the example below, we load the ResNet50 model from our TensorFlowModelHub and fine-tune it to classify cats and dogs (using the embedded Dogs-vs-Cats dataset):

val modelHub = TFModelHub(cacheDirectory = File("cache/pretrainedModels"))

val modelType = TFModels.CV.ResNet50(noTop = true, inputShape = intArrayOf(IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))

val noTopModel = modelHub.loadModel(modelType)

The topModel is the simplest neural network and can be trained quickly, as it has few parameters.

val topModel = Sequential.of(
   GlobalAvgPool2D(
       name = "top_avg_pool",
   ),
   Dense(
       name = "top_dense",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = 200,
       activation = Activations.Relu
   ),
   Dense(
       name = "pred",
       kernelInitializer = GlorotUniform(),
       biasInitializer = GlorotUniform(),
       outputSize = NUM_CLASSES,
       activation = Activations.Linear
   ),
   noInput = true
)

The new helper function could join two models together: noTop and topModel: val model = Functional.of(pretrainedModel = noTopModel, topModel = topModel)

After that, load weights for the frozen layers from the noTop model, and the weights for the unfrozen layers from the topModel will be initialized during the fit method call.

model.use {
   it.compile(
       optimizer = Adam(),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.loadWeightsForFrozenLayers(hdfFile)

   it.fit(
       dataset = train,
       batchSize = TRAINING_BATCH_SIZE,
       epochs = EPOCHS
   )

   val accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]

   println("Accuracy: $accuracy")
}

The complete example can be found here.


New models: EfficientDet and EfficientNet

Until v0.4, our ModelHub contained only one model (SSD) suitable for solving the Object Detection problem. Starting with this release, we’re gradually expanding the library’s capabilities for solving the Object Detection problem. We’d like to introduce to you a new family of object detectors, called EfficientDet, which consistently achieve much better efficiency than prior object detectors across a wide spectrum of resource constraints.

All models from this family have the same internal architecture which scales for different inputs (image resolution). The final user has a choice of models: from the smallest EfficientDet-D0, model with 3.9 million parameters and 10.2 ms latency on the V100 up to the EfficientDet-D7, with 52 million parameters and 122 ms latency on the V100.

Internally, EfficientDet models use another famous model, EfficientNet, as a backbone. It extracts features from input images and passes them to the next component of the Object Detection model).

EfficientDet Architecture

An example of EfficientDet-D2 usage can be found here.

The EfficientNet model family is also available in the ONNXModelHub. There are 8 different types of models and each model is presented in two variants: full and “noTop” for fine-tuning.

These models achieve better accuracy on the ImageNet dataset with 10x fewer parameters than ResNet or NasNet. If you need fast and accurate image recognition, EfficientNet is a good choice.

An example of EfficientNet0 usage can be found here.


Multiple callbacks

Earlier, Callback support for KotlinDL was pretty simple and not fully compatible with Keras. As a result, users faced difficulties in implementing their neural networks, building the custom validation process, and monitoring the neural network’s training.

The callback object was passed during compilation and was unique for each stage in the model’s lifecycle. However, model compilation can be located in very different places in the code than fit/predict/evaluate, meaning that users may need to create different callbacks for different purposes.

Let’s assume that we need to define EarlyStopping and  TerminateOnNaN for training to handle exceptional cases, and also add two custom callbacks for the prediction and evaluation phases:

val earlyStopping = EarlyStopping(
   monitor = EpochTrainingEvent::valLossValue,
   minDelta = 0.0,
   patience = 2,
   verbose = true,
   mode = EarlyStoppingMode.AUTO,
   baseline = 0.1,
   restoreBestWeights = false
)
val terminateOnNaN = TerminateOnNaN()


class EvaluateCallback : Callback() {
   override fun onTestBatchEnd(batch: Int, batchSize: Int, event: BatchEvent?, logs: History) {
       println("Test batch $batch ends with loss ${event!!.lossValue}..")
   }

   override fun onTestEnd(logs: History) {
       println("Train ends with last loss ${logs.lastBatchEvent().lossValue}")
   }
}

class PredictCallback : Callback() {
   override fun onPredictBatchBegin(batch: Int, batchSize: Int) {
       println("Prediction batch $batch begins.")
   }

   override fun onPredictBatchEnd(batch: Int, batchSize: Int) {
       println("Prediction batch $batch ends.")
   }
}

Let’s pass these callbacks to the model methods:

model.use {
   it.compile(
       optimizer = Adam(clipGradient = ClipGradientByValue(0.1f)),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.logSummary()

   it.fit(
       dataset = train,
       epochs = EPOCHS,
       batchSize = TRAINING_BATCH_SIZE,
       callbacks = listOf(earlyStopping, terminateOnNaN)
   )

   val accuracy = it.evaluate(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = EvaluateCallback()
   ).metrics[Metrics.ACCURACY]


   val predictions = it.predict(
       dataset = test,
       batchSize = TEST_BATCH_SIZE,
       callback = PredictCallback()
   )
}

Found below in the logs:

The complete example can be found here.


4 new layers and 2 new activation functions

Many contributors to this release have added layers to Kotlin for performing non-trivial logic. With these added layers, you can start working with autoencoders and load the GAN models:

There are also two new activation functions:

These activation functions are not available in the TensorFlow core package, but we decided to add them after seeing how they’ve been widely used in recent papers.

We’d be delighted to look at your pull requests if you’d like to contribute a layer, activation function, callback, or initializer from a recent paper!


Breaking changes in the Image Preprocessing DSL

There are a few major changes in the Image Preprocessing DSL:

  • CustomPreprocessor was removed.
  • The loading section was moved from image preprocessing to the Dataset API
  • A few new Preprocessors were added:
    • Padding
    • CenterCrop
    • Convert
    • Grayscale
    • Normalizing

Here is an example of some of the new operations:

val preprocessing = preprocess {
 transformImage {
   centerCrop {
     size = 214
   }
   pad {
     top = 10
     bottom = 10
     left = 10
     right = 10
     mode = PaddingMode.Fill(Color.BLACK)
   }
   convert {
     colorMode = ColorMode.BGR
   }
 }
 transformTensor {
   normalize {
     mean = floatArrayOf(103.939f, 116.779f, 123.68f)
     std = floatArrayOf(57.375f, 57.12f, 58.395f)
   }
 }
}

Because of the removal of the loading section, the same preprocessing instance could now be used in several datasets:

val trainDataset = OnHeapDataset.create(File(datasetPath, "train"), labelGenerator, preprocessing)
val valDataset = OnHeapDataset.create(File(datasetPath, "val"), labelGenerator, preprocessing)


Standing on the shoulders of giants

We’d like to express our deep gratitude to Alexey Zinoviev for his great work developing the framework from minimum viable product to the current state, efforts towards creating a community, skillful release management, and competent marketing support.

His passion for democratizing AI and his continuous work to improve the ability of Kotlin and Java developers to use ML/DL models deserves great respect and inspires us to continue our work.

We’d also like to express our gratitude to Veniamin Viflyantsev, who’s invested a lot of time and effort into changing the architecture of the api module. Many of his changes are now part of this release.

Our team has expanded! Julia Beliaeva (author of the new version of Image Preprocessing DSL) and Nikita Ermolenko have joined us on a permanent basisWe wish them good luck and look forward to new releases!


Learn more and share your feedback

We hope you enjoyed this brief overview of the new features in KotlinDL 0.4! For more information, including the up-to-date Readme file, visit the project’s home on GitHub. Be sure to check out the KotlinDL guide, which contains detailed information about the library’s basic and advanced features and covers many of the topics mentioned in this blog post in more detail.

If you’ve previously used KotlinDL, use the changelog to find out what has changed and how to upgrade your projects to the stable release.

We’d be very thankful if you’d report any bugs you find to our issue tracker. We’ll try to fix all of the critical issues in the 0.4.1 release.

You’re also welcome to join the #kotlindl channel in Kotlin Slack (get an invite here). In this channel, you can ask questions, participate in discussions, and get notifications about the new preview releases and models in ModelHub.

Continue ReadingKotlinDL 0.4 Is Out With Pose Detection API, EfficientDet for Object Detection, and EfficientNet for Image Recognition

End of content

No more pages to load