Category: kotlindl

Object Detection with KotlinDL and Ktor

I presented the webinar “Object Detection and Image Recognition with Kotlin,” where I explored a deep learning library written in Kotlin, described how to detect objects of different types in images, and explained how to create a Kotlin Web Application using Ktor and KotlinDL that recognizes cars and persons on photos. I have decided there is more that I would like to share with you on the subject, and so here is an extended article.

If you are new to Deep Learning, don’t worry about it. You don’t need any high-level calculus knowledge to start using the Object Detection Light API in the KotlinDL library.

However, when writing this article, I did assume you would be familiar with basic Kotlin web-development fundamentals, e.g., HTML, web-server, HTTP, and client-server communications.

This article will take you through how to detect objects in different images and create a Kotlin Web Application using Ktor and KotlinDL.

What is Object Detection?

It’s a pretty simple term from the Deep Learning world and just means the task of detecting instances of objects of a certain class within an image.

You are probably already familiar with Image Recognition, where the idea is to recognize the class or type of only one object within an image without having any coordinates for the recognized object.

Unlike the Image Recognition, during Object Detection, we are trying to detect a few objects (sometimes it could be a significant number, 100 or even 1,000, for example) and their locations, which are usually presented as four coordinates of a rectangle (x_min, x_max, y_min, y_max) containing the detected object.

For example, this screenshot of the example application shows how a few objects have been recognized, and their positions annotated:

OK – now for the fun stuff! It’s time to write some Kotlin code to detect objects within an image.

Object Detection Example

Let’s say we have the following image. We see a typical street: several cars, pedestrians crossing, traffic lights, and even someone using the pedestrian crossing on a bicycle.

With a few rows of code, we can obtain a list of the detected objects, sorted by score or probability (the degree of confidence of the model that a certain rectangle contains an object of a certain type).

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))

val model = modelHub.loadPretrainedModel(ONNXModels.ObjectDetection.SSD)

model.use { detectionModel ->
   println(detectionModel)

   val imageFile = getFileFromResource("detection/image2.jpg")
   val detectedObjects = detectionModel.detectObjects(imageFile = imageFile, topK = 20)

   detectedObjects.forEach {
       println("Found ${it.classLabel} with probability ${it.probability}")
   }
}

This code prints the following:

Found car with probability 0.9872914
Found bicycle with probability 0.9547764
Found car with probability 0.93248314
Found person with probability 0.85994
Found person with probability 0.8397419
Found car with probability 0.7488473
Found person with probability 0.49446288
Found person with probability 0.48537987
Found person with probability 0.40268868
Found person with probability 0.3972058
Found person with probability 0.38047826
Found traffic light with probability 0.36501375
Found car with probability 0.30308443
Found traffic light with probability 0.30084336
Found person with probability 0.27078137
Found car with probability 0.26892117
Found person with probability 0.26232794
Found person with probability 0.23597576
Found person with probability 0.23156123
Found person with probability 0.21393918

OK, it looks like the model can detect objects, just like our eyes can do, but how do we go about marking the objects?

We can use the Swing framework to draw rectangles over the image. This also requires simple image preprocessing before visualization.

First, we need to add a simple visualization using JPanel, BufferedImage, and Graphics2D objects in the visualise function.

model.use { detectionModel ->
  …

   visualise(imageFile, detectedObjects)
}

Drawing rectangles on an image with the Graphics2D API may not be the best approach, but we can use it as a good starting point for our research.

private fun visualise(
   imageFile: File,
   detectedObjects: List<DetectedObject>
) {
   val frame = JFrame("Detected Objects")
   @Suppress("UNCHECKED_CAST")
   frame.contentPane.add(JPanel(imageFile, detectedObjects))
   frame.pack()
   frame.setLocationRelativeTo(null)
   frame.isVisible = true
   frame.defaultCloseOperation = JFrame.EXIT_ON_CLOSE
   frame.isResizable = false
}

class JPanel(
   val image: File,
   private val detectedObjects: List<DetectedObject>
) : JPanel() {
   private var bufferedImage = ImageIO.read(image)

   override fun paint(graphics: Graphics) {
       super.paint(graphics)
       graphics.drawImage(bufferedImage, 0, 0, null)

       detectedObjects.forEach {
           val top = it.yMin * bufferedImage.height
           val left = it.xMin * bufferedImage.width
           val bottom = it.yMax * bufferedImage.height
           val right = it.xMax * bufferedImage.width
           if (abs(top - bottom) > 300 || abs(right - left) > 300) return@forEach

           graphics.color = Color.ORANGE
           graphics.font = Font("Courier New", 1, 17)
           graphics.drawString(" ${it.classLabel} : ${it.probability}", left.toInt(), bottom.toInt() - 8)

           graphics as Graphics2D
           val stroke1: Stroke = BasicStroke(6f)
           graphics.setColor(Color.RED)
           graphics.stroke = stroke1
           graphics.drawRect(left.toInt(), bottom.toInt(), (right - left).toInt(), (top - bottom).toInt())
       }
   }

   override fun getPreferredSize(): Dimension {
       return Dimension(bufferedImage.width, bufferedImage.height)
   }

   override fun getMinimumSize(): Dimension {
       return Dimension(bufferedImage.width, bufferedImage.height)
   }
}

The result is the following image:

As you can see, the Object Detection Light API returns not only the class label and score but the relative image coordinates, which can be used for drawing rectangles or boxes around the detected objects.

Also, we could play a little bit with the paint palette and use different colors to differentiate people, bicycles, cars, and traffic lights.

when(it.classLabel) {
   "person" -> graphics.setColor(Color.WHITE)
   "car" -> graphics.setColor(Color.GREEN)
   "traffic light" -> graphics.setColor(Color.YELLOW)
   "bicycle" -> graphics.setColor(Color.MAGENTA)
   else -> graphics.setColor(Color.RED)
}

That looks significantly better!

You can continue experimenting with the visualization, but we need to move on!

Client-Server Application with Ktor

In this section, I will use Ktor to write two simple programs: client and server. The client application will send the image to the server application. If you have never used Ktor before, it’s an excellent time to see how easy it is to deal with classic web stuff like HTTP requests, headers, MIME types, and so on.

When the code below is run, the client application sends a POST request via the submitFormWithBinaryData method. You can read more about how this works in Ktor documentation. The result with the added boxes for the detected objects can be found in the clientFiles folder.

runBlocking {
   val client = HttpClient(CIO)

   val response: HttpResponse = client.submitFormWithBinaryData(
       url = "http://localhost:8001/detect",
       formData = formData {
           append("image", getFileFromResource("detection/image2.jpg").readBytes(), Headers.build {
               append(HttpHeaders.ContentType, "image/jpg")
               append(HttpHeaders.ContentDisposition, "filename=image2.jpg")
           })
       }
   )

   val imageFile = File("clientFiles/detectedObjects2.jpg")
   imageFile.writeBytes(response.readBytes())
}

Unfortunately, Ktor has no special API for receiving files from the server-side. But we’re programmers, right? Let’s just write the bytes obtained over the network to the File object.

The server part is a little more difficult. I’ll need to explain some parts of the code below.

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.ObjectDetection.SSD.pretrainedModel(modelHub)

Because model creation is a time-consuming step (due to loading and initializing), we need to create the model before we can run the server.

embeddedServer(Netty, 8001) {
   routing {
       post("/detect") {
           val multipartData = call.receiveMultipart()
           var newFileName = ""
           multipartData.forEachPart { part ->
               when (part) {
                   is PartData.FileItem -> {
                       val fileName = part.originalFileName as String
                       newFileName = fileName.replace("image", "detectedObjects")
                       val fileBytes = part.streamProvider().readBytes()
                       val imageFile = File("serverFiles/$fileName")
                       imageFile.writeBytes(fileBytes)

                       val detectedObjects =
                           model.detectObjects(imageFile = imageFile, topK = 20)

                       val filteredObjects =
                           detectedObjects.filter { it.classLabel == "car" || it.classLabel == "person" || it.classLabel == "bicycle" }

                       drawRectanglesForDetectedObjects(newFileName, imageFile, filteredObjects)

The intermediate result will be saved to the serverFiles folder. After that, the server application will send this file back to the client.

To send form data in a test POST/PUT request, you must set the Content-Type header and specify the request body. To do this, you can use the addHeader and setBody functions, respectively.

     
                  
                       call.response.header(
                           HttpHeaders.ContentDisposition,
                           ContentDisposition.Attachment.withParameter(ContentDisposition.Parameters.FileName, newFileName)
                               .toString()
                       )
                       call.respondFile(file)
                   }
               }
           }
       }
   }
}.start(wait = true)

At the end, we need to close our model to release all the resources.

   
model.close()

Run the server, and after that, try to make multiple runs of the client with the different images. Check clientFiles and serverFiles folders to find all the images that were sent with detected objects.

The complete example, including drawing and saving files to the serverFiles folder, can be found here in the GitHub repository.

Web Application

It’s time to write the whole Web Application with an HTML page rendered on the server, a few inputs, and a button. I’d like to upload an image, fill some input fields with the parameters, and press a button to download the image with the detected objects on my laptop.

The application will contain only the server part, but it has a few interesting aspects we will need to consider. It should handle two HTTP requests: the POST request, which handles multipart data with FileItem and FormItem handlers, and the GET request, which returns a simple HTML page.

From multipartData we can not only extract binary data like in the previous example but the values of the form parameters, too. These parameters, topK, and classLabelNames, will be explained later.

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.ObjectDetection.SSD.pretrainedModel(modelHub)

embeddedServer(Netty, 8002) {
   routing {
       post("/detect") {
           val multipartData = call.receiveMultipart()
           var imageFile: File? = null
           var newFileName = ""
           var topK = 20
           val classLabels = mutableListOf<String>()
           multipartData.forEachPart { part ->
               when (part) {
                   is PartData.FileItem -> {
                       val fileName = part.originalFileName as String
                       val fileBytes = part.streamProvider().readBytes()

                       newFileName = fileName.replace("image", "detectedObjects")
                       imageFile = File("serverFiles/$fileName")
                       imageFile!!.writeBytes(fileBytes)
                   }
                   is PartData.FormItem -> {
                       when (part.name) {
                           "topK" -> topK = if (part.value.isNotBlank()) part.value.toInt() else 20
                           "classLabelNames" -> part.value.split(",").forEach {
                               classLabels += it.trim()
                           }
                       }
                   }
                   is PartData.BinaryItem -> TODO()
               }
           }

           val detectedObjects =
               model.detectObjects(imageFile = imageFile!!, topK = topK)

           val filteredObjects = detectedObjects.filter {
                   if (classLabels.isNotEmpty()) {
                       it.classLabel in classLabels
                   } else {
                       it.classLabel == "car" || it.classLabel == "person" || it.classLabel == "bicycle"
                   }
               }

           drawRectanglesForDetectedObjects(newFileName, imageFile!!, filteredObjects)

           call.response.header(
               HttpHeaders.ContentDisposition,
               ContentDisposition.Attachment.withParameter(ContentDisposition.Parameters.FileName, newFileName)
                   .toString()
           )
           call.respondFile(File("serverFiles/$newFileName"))
       }

To describe the HTML page with this nice DSL, Ktor uses kotlinx.html as written in the documentation. This integration allows you to respond to a client with HTML blocks. With HTML DSL, you can write pure HTML in Kotlin, interpolate variables into views, and build complex HTML layouts using templates.

get("/") {
           call.respondHtml {
               body {
                   form(action = "/detect", encType = FormEncType.multipartFormData, method = FormMethod.post) {
                       p {
                           +"Your image: "
                           fileInput(name = "image")
                       }
                       p {
                           +"TopK: "
                           numberInput(name = "topK")
                       }
                       p {
                           +"Classes to detect: "
                           textInput(name = "classLabelNames")
                       }
                       p {
                           submitInput() { value = "Detect objects" }
                       }
                   }
               }
           }
       }
   }
}.start(wait = true)

model.close()

Run the server and open the page http://localhost:8002. Here you’ll find a form. Simply upload the image, fill inputs with the request parameters (or leave them empty), and press the button “Detect objects.” The new image will start downloading in a few seconds.

You also could play with the parameters topK, and classLabelNames to obtain different results. The topK parameter is used to determine how many detected objects (sorted by a score from highest to lowest) will be drawn on the image. The classLabelNames parameter takes as an input a list of labels (from the following list) separated by commas to filter categories of detected objects in the picture that will be enclosed in a rectangle.

The complete example can be found here in the GitHub repository.

This represents only a small fraction of what you can do with the full power of Ktor. For example, you can also build a REST API for Object Detection, Image Recognition, or build a helpful microservice. It is your choice!

In conclusion

Release 0.3 was shipped with only one effective object detection model: SSD. The new release, 0.4, brings seven new object detection models with different characteristics of velocity and accuracy as well as the ability to detect complex objects.

We strongly recommend using Compose for Desktop, instead of Swing, for your visualization needs. The community is working on moving these examples to the new framework.

This is not the only improvement you can expect in the Object Detection Light API. In future releases, we will add some helpful methods for filtering and unioning different boxes in the YOLO style to avoid having places in the image where a single object has multiple rectangles drawn on it.

If you have any thoughts or user experience related to this use case, just make an issue on GitHub or ask in the Kotlin Slack (kotlindl channel).

Object Detection with KotlinDL and Ktor

I presented the webinar “Object Detection and Image Recognition with Kotlin,” where I explored a deep learning library written in Kotlin, described how to detect objects of different types in images, and explained how to create a Kotlin Web Application using Ktor and KotlinDL that recognizes cars and persons on photos. I have decided there is more that I would like to share with you on the subject, and so here is an extended article.

If you are new to Deep Learning, don’t worry about it. You don’t need any high-level calculus knowledge to start using the Object Detection Light API in the KotlinDL library.

However, when writing this article, I did assume you would be familiar with basic Kotlin web-development fundamentals, e.g., HTML, web-server, HTTP, and client-server communications.

This article will take you through how to detect objects in different images and create a Kotlin Web Application using Ktor and KotlinDL.

What is Object Detection?

It’s a pretty simple term from the Deep Learning world and just means the task of detecting instances of objects of a certain class within an image.

You are probably already familiar with Image Recognition, where the idea is to recognize the class or type of only one object within an image without having any coordinates for the recognized object.

Unlike the Image Recognition, during Object Detection, we are trying to detect a few objects (sometimes it could be a significant number, 100 or even 1,000, for example) and their locations, which are usually presented as four coordinates of a rectangle (x_min, x_max, y_min, y_max) containing the detected object.

For example, this screenshot of the example application shows how a few objects have been recognized, and their positions annotated:

OK – now for the fun stuff! It’s time to write some Kotlin code to detect objects within an image.

Object Detection Example

Let’s say we have the following image. We see a typical street: several cars, pedestrians crossing, traffic lights, and even someone using the pedestrian crossing on a bicycle.

With a few rows of code, we can obtain a list of the detected objects, sorted by score or probability (the degree of confidence of the model that a certain rectangle contains an object of a certain type).

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))

val model = modelHub.loadPretrainedModel(ONNXModels.ObjectDetection.SSD)

model.use { detectionModel ->
   println(detectionModel)

   val imageFile = getFileFromResource("detection/image2.jpg")
   val detectedObjects = detectionModel.detectObjects(imageFile = imageFile, topK = 20)

   detectedObjects.forEach {
       println("Found ${it.classLabel} with probability ${it.probability}")
   }
}

This code prints the following:

Found car with probability 0.9872914
Found bicycle with probability 0.9547764
Found car with probability 0.93248314
Found person with probability 0.85994
Found person with probability 0.8397419
Found car with probability 0.7488473
Found person with probability 0.49446288
Found person with probability 0.48537987
Found person with probability 0.40268868
Found person with probability 0.3972058
Found person with probability 0.38047826
Found traffic light with probability 0.36501375
Found car with probability 0.30308443
Found traffic light with probability 0.30084336
Found person with probability 0.27078137
Found car with probability 0.26892117
Found person with probability 0.26232794
Found person with probability 0.23597576
Found person with probability 0.23156123
Found person with probability 0.21393918

OK, it looks like the model can detect objects, just like our eyes can do, but how do we go about marking the objects?

We can use the Swing framework to draw rectangles over the image. This also requires simple image preprocessing before visualization.

First, we need to add a simple visualization using JPanel, BufferedImage, and Graphics2D objects in the visualise function.

model.use { detectionModel ->
  …

   visualise(imageFile, detectedObjects)
}

Drawing rectangles on an image with the Graphics2D API may not be the best approach, but we can use it as a good starting point for our research.

private fun visualise(
   imageFile: File,
   detectedObjects: List<DetectedObject>
) {
   val frame = JFrame("Detected Objects")
   @Suppress("UNCHECKED_CAST")
   frame.contentPane.add(JPanel(imageFile, detectedObjects))
   frame.pack()
   frame.setLocationRelativeTo(null)
   frame.isVisible = true
   frame.defaultCloseOperation = JFrame.EXIT_ON_CLOSE
   frame.isResizable = false
}

class JPanel(
   val image: File,
   private val detectedObjects: List<DetectedObject>
) : JPanel() {
   private var bufferedImage = ImageIO.read(image)

   override fun paint(graphics: Graphics) {
       super.paint(graphics)
       graphics.drawImage(bufferedImage, 0, 0, null)

       detectedObjects.forEach {
           val top = it.yMin * bufferedImage.height
           val left = it.xMin * bufferedImage.width
           val bottom = it.yMax * bufferedImage.height
           val right = it.xMax * bufferedImage.width
           if (abs(top - bottom) > 300 || abs(right - left) > 300) return@forEach

           graphics.color = Color.ORANGE
           graphics.font = Font("Courier New", 1, 17)
           graphics.drawString(" ${it.classLabel} : ${it.probability}", left.toInt(), bottom.toInt() - 8)

           graphics as Graphics2D
           val stroke1: Stroke = BasicStroke(6f)
           graphics.setColor(Color.RED)
           graphics.stroke = stroke1
           graphics.drawRect(left.toInt(), bottom.toInt(), (right - left).toInt(), (top - bottom).toInt())
       }
   }

   override fun getPreferredSize(): Dimension {
       return Dimension(bufferedImage.width, bufferedImage.height)
   }

   override fun getMinimumSize(): Dimension {
       return Dimension(bufferedImage.width, bufferedImage.height)
   }
}

The result is the following image:

As you can see, the Object Detection Light API returns not only the class label and score but the relative image coordinates, which can be used for drawing rectangles or boxes around the detected objects.

Also, we could play a little bit with the paint palette and use different colors to differentiate people, bicycles, cars, and traffic lights.

when(it.classLabel) {
   "person" -> graphics.setColor(Color.WHITE)
   "car" -> graphics.setColor(Color.GREEN)
   "traffic light" -> graphics.setColor(Color.YELLOW)
   "bicycle" -> graphics.setColor(Color.MAGENTA)
   else -> graphics.setColor(Color.RED)
}

That looks significantly better!

You can continue experimenting with the visualization, but we need to move on!

Client-Server Application with Ktor

In this section, I will use Ktor to write two simple programs: client and server. The client application will send the image to the server application. If you have never used Ktor before, it’s an excellent time to see how easy it is to deal with classic web stuff like HTTP requests, headers, MIME types, and so on.

When the code below is run, the client application sends a POST request via the submitFormWithBinaryData method. You can read more about how this works in Ktor documentation. The result with the added boxes for the detected objects can be found in the clientFiles folder.

runBlocking {
   val client = HttpClient(CIO)

   val response: HttpResponse = client.submitFormWithBinaryData(
       url = "http://localhost:8001/detect",
       formData = formData {
           append("image", getFileFromResource("detection/image2.jpg").readBytes(), Headers.build {
               append(HttpHeaders.ContentType, "image/jpg")
               append(HttpHeaders.ContentDisposition, "filename=image2.jpg")
           })
       }
   )

   val imageFile = File("clientFiles/detectedObjects2.jpg")
   imageFile.writeBytes(response.readBytes())
}

Unfortunately, Ktor has no special API for receiving files from the server-side. But we’re programmers, right? Let’s just write the bytes obtained over the network to the File object.

The server part is a little more difficult. I’ll need to explain some parts of the code below.

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.ObjectDetection.SSD.pretrainedModel(modelHub)

Because model creation is a time-consuming step (due to loading and initializing), we need to create the model before we can run the server.

embeddedServer(Netty, 8001) {
   routing {
       post("/detect") {
           val multipartData = call.receiveMultipart()
           var newFileName = ""
           multipartData.forEachPart { part ->
               when (part) {
                   is PartData.FileItem -> {
                       val fileName = part.originalFileName as String
                       newFileName = fileName.replace("image", "detectedObjects")
                       val fileBytes = part.streamProvider().readBytes()
                       val imageFile = File("serverFiles/$fileName")
                       imageFile.writeBytes(fileBytes)

                       val detectedObjects =
                           model.detectObjects(imageFile = imageFile, topK = 20)

                       val filteredObjects =
                           detectedObjects.filter { it.classLabel == "car" || it.classLabel == "person" || it.classLabel == "bicycle" }

                       drawRectanglesForDetectedObjects(newFileName, imageFile, filteredObjects)

The intermediate result will be saved to the serverFiles folder. After that, the server application will send this file back to the client.

To send form data in a test POST/PUT request, you must set the Content-Type header and specify the request body. To do this, you can use the addHeader and setBody functions, respectively.

     
                  
                       call.response.header(
                           HttpHeaders.ContentDisposition,
                           ContentDisposition.Attachment.withParameter(ContentDisposition.Parameters.FileName, newFileName)
                               .toString()
                       )
                       call.respondFile(file)
                   }
               }
           }
       }
   }
}.start(wait = true)

At the end, we need to close our model to release all the resources.

   
model.close()

Run the server, and after that, try to make multiple runs of the client with the different images. Check clientFiles and serverFiles folders to find all the images that were sent with detected objects.

The complete example, including drawing and saving files to the serverFiles folder, can be found here in the GitHub repository.

Web Application

It’s time to write the whole Web Application with an HTML page rendered on the server, a few inputs, and a button. I’d like to upload an image, fill some input fields with the parameters, and press a button to download the image with the detected objects on my laptop.

The application will contain only the server part, but it has a few interesting aspects we will need to consider. It should handle two HTTP requests: the POST request, which handles multipart data with FileItem and FormItem handlers, and the GET request, which returns a simple HTML page.

From multipartData we can not only extract binary data like in the previous example but the values of the form parameters, too. These parameters, topK, and classLabelNames, will be explained later.

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.ObjectDetection.SSD.pretrainedModel(modelHub)

embeddedServer(Netty, 8002) {
   routing {
       post("/detect") {
           val multipartData = call.receiveMultipart()
           var imageFile: File? = null
           var newFileName = ""
           var topK = 20
           val classLabels = mutableListOf<String>()
           multipartData.forEachPart { part ->
               when (part) {
                   is PartData.FileItem -> {
                       val fileName = part.originalFileName as String
                       val fileBytes = part.streamProvider().readBytes()

                       newFileName = fileName.replace("image", "detectedObjects")
                       imageFile = File("serverFiles/$fileName")
                       imageFile!!.writeBytes(fileBytes)
                   }
                   is PartData.FormItem -> {
                       when (part.name) {
                           "topK" -> topK = if (part.value.isNotBlank()) part.value.toInt() else 20
                           "classLabelNames" -> part.value.split(",").forEach {
                               classLabels += it.trim()
                           }
                       }
                   }
                   is PartData.BinaryItem -> TODO()
               }
           }

           val detectedObjects =
               model.detectObjects(imageFile = imageFile!!, topK = topK)

           val filteredObjects = detectedObjects.filter {
                   if (classLabels.isNotEmpty()) {
                       it.classLabel in classLabels
                   } else {
                       it.classLabel == "car" || it.classLabel == "person" || it.classLabel == "bicycle"
                   }
               }

           drawRectanglesForDetectedObjects(newFileName, imageFile!!, filteredObjects)

           call.response.header(
               HttpHeaders.ContentDisposition,
               ContentDisposition.Attachment.withParameter(ContentDisposition.Parameters.FileName, newFileName)
                   .toString()
           )
           call.respondFile(File("serverFiles/$newFileName"))
       }

To describe the HTML page with this nice DSL, Ktor uses kotlinx.html as written in the documentation. This integration allows you to respond to a client with HTML blocks. With HTML DSL, you can write pure HTML in Kotlin, interpolate variables into views, and build complex HTML layouts using templates.

get("/") {
           call.respondHtml {
               body {
                   form(action = "/detect", encType = FormEncType.multipartFormData, method = FormMethod.post) {
                       p {
                           +"Your image: "
                           fileInput(name = "image")
                       }
                       p {
                           +"TopK: "
                           numberInput(name = "topK")
                       }
                       p {
                           +"Classes to detect: "
                           textInput(name = "classLabelNames")
                       }
                       p {
                           submitInput() { value = "Detect objects" }
                       }
                   }
               }
           }
       }
   }
}.start(wait = true)

model.close()

Run the server and open the page http://localhost:8002. Here you’ll find a form. Simply upload the image, fill inputs with the request parameters (or leave them empty), and press the button “Detect objects.” The new image will start downloading in a few seconds.

You also could play with the parameters topK, and classLabelNames to obtain different results. The topK parameter is used to determine how many detected objects (sorted by a score from highest to lowest) will be drawn on the image. The classLabelNames parameter takes as an input a list of labels (from the following list) separated by commas to filter categories of detected objects in the picture that will be enclosed in a rectangle.

The complete example can be found here in the GitHub repository.

This represents only a small fraction of what you can do with the full power of Ktor. For example, you can also build a REST API for Object Detection, Image Recognition, or build a helpful microservice. It is your choice!

In conclusion

Release 0.3 was shipped with only one effective object detection model: SSD. The new release, 0.4, brings seven new object detection models with different characteristics of velocity and accuracy as well as the ability to detect complex objects.

We strongly recommend using Compose for Desktop, instead of Swing, for your visualization needs. The community is working on moving these examples to the new framework.

This is not the only improvement you can expect in the Object Detection Light API. In future releases, we will add some helpful methods for filtering and unioning different boxes in the YOLO style to avoid having places in the image where a single object has multiple rectangles drawn on it.

If you have any thoughts or user experience related to this use case, just make an issue on GitHub or ask in the Kotlin Slack (kotlindl channel).

KotlinDL 0.3 Is Out With ONNX Integration, Object Detection API, 20+ New Models in ModelHub, and Many New Layers

Introducing version 0.3 of our deep learning library, KotlinDL.

KotlinDL 0.3 is available now on Maven Central with a variety of new features – check out all the changes that are coming to the new release! We’re currently introducing a lot of new models in ModelHub (including the first Object Detection and Face Alignment models), the ability to fine-tune the Image Recognition models saved in ONNX format from Keras and PyTorch, the experimental high-level Kotlin API for image recognition, a lot of new layers contributed by the community members and many other changes.

KotlinDL on GitHub

In this post, we’ll walk you through the changes to the Kotlin Deep Learning library in the 0.3 release:

ONNX integration

Over the past year, library users have been asking us to add support for working with models saved in the ONNX format.

Open Neural Network Exchange (ONNX) is an open-source format for AI models. It defines an extensible computation graph model and definitions of built-in operators and standard data types. It works well with both of today’s most popular frameworks, TensorFlow and PyTorch.

We use the ONNX Runtime Java API to parse and execute models saved in the `.onnx` file format. You could read more about this API in the documentation of the project.

KotlinDL has a separate `onnx` module that provides the ONNX integration support. To use it in your project, you should add a different dependency.

There are two ways to run predictions on the ONNX model. If you want to use LeNet-5, one of our models from ModelHub, you can load it in the following manner:

val (train, test) = mnist()

val modelHub = ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))

val modelType = ONNXModels.CV.Lenet
val model = modelHub.loadModel(modelType)

model.use {
   val prediction = it.predict(train.getX(0))

   println("Predicted Label is: $prediction")
   println("Correct Label is: " + train.getY(0))
}

To load a model that is in the ONNX format, instantiate OnnxInferenceModel and run the predictions.

OnnxInferenceModel.load(PATH_TO_MODEL).use {
    val prediction = it.predict(...)
}

If the model has a very complex output, for example a few tensors like YOLOv4 or SSD (which can be loaded from ModelHub, too), you may want to call a predictRaw method:

val yhat = it.predictRaw(inputData)

to get access to all the output and parse the output manually.

Finding an appropriate model in ONNXModelHub is easy: just start your search from the top-level object ONNXModels and go deeper to CV or ObjectDetection. Use this chain of inner objects like a unique model identifier to obtain the model itself or its preprocessing method. For example, the SOTA model from 2020 for the Image Recognition task, called EfficientNet, can be found by following ONNXModels.CV.EfficientNet4Lite.

Fine-tuning of ONNX models

Of course, running predictions on ready-made models is good, but what about fine-tuning them a little for our tasks?

Unfortunately, the ONNX Java API does not support training mode for models, but we do not need to train the entire model as a whole to perform Transfer Learning tasks.

The classical approach to the Transfer Learning task is to freeze all layers, except for the last few, and then train the top few layers (the fully connected layers at the top of the network) on a new piece of data, often changing the number of model outputs.

These top layers can be viewed as a small neural network, whose input is the output of a model consisting of frozen layers. These frozen layers can be considered as preprocessing of a small top model.

We have implemented this approach in our library, which has an ONNX model as a preprocessing stage and a top model as a small KotlinDL neural network.

Suppose you have a huge model in Keras or PyTorch that you want to fine-tune in KotlinDL: cut off the last layers from it, export to the ONNX format, load into KotlinDL as an additional preprocessing layer via ONNXModelPreprocessor, describe the missing layers using the KotlinDL API, and train them.

In the example below, we load the ResNet50 model from our ONNXModelHub and fine-tune it to classify cats and dogs (the embedded Dogs-vs-Cats dataset is used):

val modelHub = ONNXModelHub(
   cacheDirectory = File("cache/pretrainedModels")
)
val model = modelHub.loadModel(ONNXModels.CV.ResNet50noTopCustom)

val dogsVsCatsDatasetPath = dogsCatsDatasetPath()

model.use {
   it.reshape(64, 64, 3)

   val preprocessing: Preprocessing = preprocess {
       load {
           pathToData = File(dogsVsCatsDatasetPath)
           imageShape = ImageShape(channels = NUM_CHANNELS)
           colorMode = ColorOrder.BGR
           labelGenerator = FromFolders(mapping = mapOf("cat" to 0, "dog" to 1))
       }
       transformImage {
           resize {
               outputHeight = IMAGE_SIZE.toInt()
               outputWidth = IMAGE_SIZE.toInt()
               interpolation = InterpolationType.BILINEAR
           }
       }
       transformTensor {
           sharpen {
               modelType = TFModels.CV.ResNet50
           }
           onnx {
               onnxModel = model
           }
       }
   }

   val dataset = OnFlyImageDataset.create(preprocessing).shuffle()
   val (train, test) = dataset.split(TRAIN_TEST_SPLIT_RATIO)

   topModel.use {
       topModel.compile(
           optimizer = Adam(),
           loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
           metric = Metrics.ACCURACY
       )

       topModel.fit(dataset = train, epochs = EPOCHS, batchSize = TRAINING_BATCH_SIZE)

       val accuracy = topModel.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]

       println("Accuracy: $accuracy")
   }
}

The topModel is just the simplest neural network and can be trained quickly as it has few parameters.

/**
* This is a simple model based on Dense layers only.
*/
private val topModel = Sequential.of(
   Input(2, 2, 2048),
   GlobalAvgPool2D(),
   Dense(2, Activations.Linear, kernelInitializer = HeNormal(12L), biasInitializer = Zeros())
)

The complete example can be found here.

NOTE: Since there is no API for cutting layers and weights from the model saved in the ONNX format, you need to perform these operations yourself before exporting to the ONNX format. We’re going to add to ModelHub a lot of models from PyTorch and Keras prepared this way in the 0.4 release.

ModelHub: support for the DenseNet, Inception, and NasNet model families

In the 0.2 release, KotlinDL added storage of models, which could be loaded from JetBrains S3 storage and cached on disk. We first called it ModelZoo and have since renamed it to ModelHub.

ModelHub contains a collection of Deep Learning models that are pre-trained on large datasets like ImageNet or COCO.

There are two ModelHubs currently: the basic TFModelHub, available in the `api` module, and an additional ONNXModelHub, available in the `onnx` module.

TFModelHub currently supports the following models:

VGG’16
VGG’19
ResNet18
ResNet34
ResNet50
ResNet101
ResNet152
ResNet50v2
ResNet101v2
ResNet152v2
MobileNet
MobileNetv2
Inception
Xception
DenseNet121
DenseNet169
DenseNet201
NASNetMobile
NASNetLarge

ONNXModelHub currently supports the following models:

CV
- Lenet
- ResNet18
- ResNet34
- ResNet50
- ResNet101
- ResNet152
- ResNet50v2
- ResNet101v2
- ResNet152v2

ObjectDetection
- SSD
FaceAlignment
- Fan2d106

All models in TFModelHub include a special loader of model configs and model weights, as well as the special data preprocessing function that was applied when the models were trained on the ImageNet dataset.

Here’s an example of how you can use one of these models, ResNet50, for prediction:

val modelHub = TFModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = modelHub.loadModel(TFModels.CV.ResNet50)

val imageNetClassLabels = modelHub.loadClassLabels()

model.use {
    val hdfFile = modelHub.loadWeights(TFModels.CV.ResNet50)
    it.loadWeights(hdfFile)
…
}

Now you’ve got a model and weights and you can use them in KotlinDL.

NOTE: Don’t forget to apply model-specific preprocessing for the new data. All the preprocessing functions are included in ModelHub and can be called via the preprocessInput function:

val inputData = modelType.preprocessInput(...)
val res = it.predict(inputData)

A complete example of how to use ResNet’50 for prediction and transfer learning with additional training on a custom dataset can be found in this tutorial.

NOTE: When working with ONNX models, you do not have to load the weights separately (see the ONNX integration section above).

Object detection with the SSD model

Until v0.3, our ModelHub contained models suitable for solving the Image Recognition problem. But starting with this release, we are gradually expanding the library’s capabilities for working with images. We’d like to introduce to you the Single Shot MultiBox Detector (SSD) model, which is capable of solving the Object Detection problem.

Figure 3. The architecture of a convolutional neural network with an SSD detector (from)

Object detection is the task of detecting instances of objects of a certain class within an image.

The SSD model is trained on the COCO dataset, which consists of 328,000 images, each with bounding boxes and per-instance segmentation masks in 80 object categories. The model can be used in real-time prediction to detect objects.

We designed an API for Object Detection, which hides the details of image post- and preprocessing.

val modelHub =
   ONNXModelHub(cacheDirectory = File("cache/pretrainedModels"))
val model = ONNXModels.ObjectDetection.SSD.pretrainedModel(modelHub)

model.use { detectionModel ->
   val detectedObjects =
       detectionModel.detectObjects(imageFile = File (...), topK = 50)

   detectedObjects.forEach {
       println("Found ${it.classLabel} with probability ${it.probability}")
   }
}

The result of the object detection is drawn on a Swing Panel; the image is preprocessed for SSD input.

Figure 4. Object detection *via ObjectDetection API.*

The famous YOLOv4 model is also available in ONNXModelHub. However, we haven’t added the postprocessing of YOLOv4 output because some operations are missing in the Multik library (the Kotlin analog of NumPy). We’re looking for contributions from the community, so please don’t hesitate to join the effort!

NOTE: Of course, you can load the standard API to load the model and call the predictRaw method to handle the result manually, but we suggest avoiding these difficulties

Experimental high-level API for Image Recognition

With the ObjectDetection task, we offered a simplified API for predictions. Likewise, with the Image Recognition task, we can simplify the interaction with models loaded from ModelHub by hiding the image preprocessing, compilation, and model initialization from the user.

To illustrate how this works, let’s load and store on disk a pre-trained model of the particular type ImageRecognitionModel. Models of this type are not capable of additional training – they can only make predictions. On the other hand, they are extremely easy to work with.

val modelHub =
   TFModelHub(cacheDirectory = File("cache/pretrainedModels"))

val model = modelHub[TFModels.CV.ResNet50]

The syntax for working with pre-trained models uses brackets, which is nice.

model.use {
   for (i in 1..8) {
       val imageFile = getFileFromResource("datasets/vgg/image$i.jpg")

       val recognizedObject = it.predictObject(imageFile = imageFile)
       println(recognizedObject)
      
       val top5 = it.predictTopKObjects(imageFile = imageFile, topK = 5)
       println(top5.toString())
   }
}

ImageRecognitionModel has methods that immediately return human-readable labels and accept image files as input.

This is an experimental API for hardcore backend engineers, for whom the model is a black box with entry and exit. We’d love to hear about your experience with and your thoughts about this approach.

Sound classification with the SoundNet architecture

The KotlinDL library is taking its first steps in the audio domain. This release adds a few layers required for building a SoundNet-like model, such as Conv1D, MaxPooling1D, Cropping1D, UpSampling1D, and other layers with the “1D” suffix.

Let’s build a toy neural network inspired by the SoundNet model’s architecture:

val soundNet = Sequential.of(
   Input(
       FSDD_SOUND_DATA_SIZE,
       NUM_CHANNELS
   ),
   *soundBlock(
       filters = 4,
       kernelSize = 8,
       poolStride = 2
   ),
   *soundBlock(
       filters = 4,
       kernelSize = 16,
       poolStride = 4
   ),
   *soundBlock(
       filters = 8,
       kernelSize = 16,
       poolStride = 4
   ),
   *soundBlock(
       filters = 8,
       kernelSize = 16,
       poolStride = 4
   ),
   Flatten(),
   Dense(
       outputSize = 1024,
       activation = Activations.Relu,
       kernelInitializer = HeNormal(SEED),
       biasInitializer = HeNormal(SEED)
   ),
   Dense(
       outputSize = NUMBER_OF_CLASSES,
       activation = Activations.Linear,
       kernelInitializer = HeNormal(SEED),
       biasInitializer = HeNormal(SEED)
   )
)

This is a CNN that uses only 1D parts for convolutions and max-pooling of the input sound data. This network was able to achieve approximately 55% accuracy on test data from FSDD after ten epochs and approximately 85% after 100 epochs.

SoundBlock is a pretty simple composition of the two Conv1D and one MaxPool1D layer:

fun soundBlock(filters: Long, kernelSize: Long, poolStride: Long): Array<layer> =
   arrayOf(
       Conv1D(
           filters = filters,
           kernelSize = kernelSize,
           strides = longArrayOf(1, 1, 1),
           activation = Activations.Relu,
           kernelInitializer = HeNormal(SEED),
           biasInitializer = HeNormal(SEED),
           padding = ConvPadding.SAME
       ),
       Conv1D(
           filters = filters,
           kernelSize = kernelSize,
           strides = longArrayOf(1, 1, 1),
           activation = Activations.Relu,
           kernelInitializer = HeNormal(SEED),
           biasInitializer = HeNormal(SEED),
           padding = ConvPadding.SAME
       ),
       MaxPool1D(
           poolSize = longArrayOf(1, poolStride, 1),
           strides = longArrayOf(1, poolStride, 1),
           padding = ConvPadding.SAME
       )
   )

When the model is ready, we can load the Free Spoken Digits Dataset (FSDD) and train the model. The FSDD dataset is a simple audio/speech dataset consisting of recordings of spoken digits in .wav files at 8 kHz.

Figure 5. WavFile visualization (randomly sampled from FSDD).

The trained model will recognize the digit from the sound recorded as a .wav file correctly. Feel free to train your pronunciation with our toy SoundNet model!

val (train, test) = freeSpokenDigits()

soundNet.use {
   it.compile(
       optimizer = Adam(),
       loss = Losses.SOFT_MAX_CROSS_ENTROPY_WITH_LOGITS,
       metric = Metrics.ACCURACY
   )

   it.init()

   var accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]
   println("Accuracy before: $accuracy")

   it.fit(dataset = train, epochs = EPOCHS, batchSize = TRAINING_BATCH_SIZE)

   accuracy = it.evaluate(dataset = test, batchSize = TEST_BATCH_SIZE).metrics[Metrics.ACCURACY]
   println("Accuracy after: $accuracy")
}

The complete code of the example can be found here.

23 new layers, 6 new activation functions, and 2 new initializers

Many of the contributors to this release have added layers to Kotlin for performing non-trivial logic. With these added layers, you can start working with neural networks that process not only photos but also sound, video, and 3D images:

Softmax activation layer (by D. Lowl)
LeakyReLU activation layer (by Masoud Kazemi)
PReLU activate layer (by Masoud Kazemi)
ELU activation layer (by Maciej Procyk)
ThresholdedReLU activation layer (by Masoud Kazemi)
Conv1D layer (by Maciej Procyk)
MaxPooling1D layer (by Masoud Kazemi)
AveragePooling1D layer (by Masoud Kazemi)
GlobalMaxPooling1D layer (by Masoud Kazemi)
GlobalAveragePooling1D layer (by Ansh Tyagi)
Conv3D layer (by Maciej Procyk)
MaxPooling3D layer (by Ansh Tyagi)
AveragePooling3D layer (by Masoud Kazemi)
GlobalAveragePooling3D layer (by Ansh Tyagi)
GlobalMaxPool2D layer (by Masoud Kazemi)
GlobalMaxPool3D layer (by Masoud Kazemi)
Cropping1D and Cropping3D layers (by Masoud Kazemi)
Permute layer (by Ansh Tyagi)
RepeatVector layer (by Stan van der Bend)
UpSampling1D, UpSampling2D, and UpSampling3D layers (by Masoud Kazemi)

Two new initializers:

Identity initializer (by Hauke Brammer)
Orthogonal initializer (by Ansh Tyagi)

And six new activation functions:

Gelu activation function (by Ansh Tyagi)
HardShrink activation function (by Ansh Tyagi)
LiSHT activation function (by Veniamin Viflyantsev)
Mish activation function (by Xa9aX ツ)
Snake activation function (by cagriyildirimR)
TanhShrink activation function(by Femi Alaka)

These activation functions are not available in the TensorFlow core package, but we decided to add them, seeing how they’ve been widely used in recent papers.

In the next release, we’re going to achieve layer parity with the current set of layers in Keras and, perhaps, go further by adding several popular layers from the SOTA implementations of recent models that are not yet included in the main Keras distribution.

We would be delighted to look at your pull requests if you would like to contribute a layer, activation function, callback, or initializer from the recent paper!

How to add KotlinDL to your project

To use the full power of KotlinDL (including the `onnx` and `visualization` modules) in your project, add the following dependencies to your build.gradle file:

repositories {
    mavenCentral()
}

dependencies {
    implementation 'org.jetbrains.kotlinx:kotlin-deeplearning-api:0.3.0'
    implementation 'org.jetbrains.kotlinx:kotlin-deeplearning-onnx:0.3.0'
    implementation 'org.jetbrains.kotlinx:kotlin-deeplearning-visualization:0.3.0'
}

Or add just one dependency if you don’t need ONNX and visualization:

dependencies {
    implementation 'org.jetbrains.kotlinx:kotlin-deeplearning-api:0.3.0'
}

You can also take advantage of KotlinDL’s functionality in any existing Java project, even if you don’t have any other Kotlin code in it yet. Here is an example of the LeNet-5 model entirely written in Java.

Learn more and share your feedback

We hope you enjoyed this brief overview of the new features in KotlinDL 0.3! For more information, including the up-to-date Readme file, visit the project’s home on GitHub. Be sure to check out the KotlinDL guide, which contains detailed information about the library’s basic and advanced features and covers many of the topics mentioned in this blog post in more detail.

If you have previously used KotlinDL, use the changelog to find out what has changed and how to upgrade your projects to the stable release.

We’d be very thankful if you would report any bugs you find to our issue tracker. We’ll try to fix all the critical issues in the 0.3.1 release.

You are also welcome to join the #kotlindl channel in Kotlin Slack (get an invite here). In this channel, you can ask questions, participate in discussions, and get notifications about the new preview releases and models in ModelHub.

Let’s Kotlin!