(re)Creating a Ktor DSL

My mission at the moment is to understand the JetBrains Ktor HTTP library. So last week I wrote some tests for one of the example projects (https://ktor.io/docs/server-create-http-apis.html), with a view to comparing Ktor with http4k, the server-as-a-function alternative.

To be a fair comparison I want to make sure that I run the same tests against Ktor and http4k. This is something of a problem, as they have a quite different approach to in-memory testing.

This week then I’ll look at reimplementing part of Ktor’s domain specific language to target http4k, so that the same tests can be used to ensure that we implement the same behaviours. On the way we’ll learn how to implement a cheap and cheerful DSL in Kotlin.

In this episode

00:00:52 Reviewing our tests 00:02:13 I have a cunning plan 00:02:38 Substitute our DSL entry point 00:03:03 Change the receiver of the DSL body block 00:04:07 Add the properties we require in the DSL block 00:04:33 Adapt http4k HttpHandler to look like Ktor HttpClient 00:06:04 We can extend assertions to avoid other conversions 00:07:37 Rinse and repeat for other operations 00:08:27 Nesting DSLs 00:09:12 Add another context for the nesting 00:14:14 That turns out to compile 00:14:29 and our test source is unchanged 00:15:33 Review

This video is in a playlist of Ktor episodes (https://www.youtube.com/playlist?list=PL1ssMPpyqochUiQEM9PZ_P-9CbV1Il81B) and http4k (https://www.youtube.com/playlist?list=PL1ssMPpyqocg5TKqmiGWlvi3O5L8XPe8Q)

If you like this, you’ll probably like my book Java to Kotlin, A Refactoring Guidebook (http://java-to-kotlin.dev). It’s about far more than just the syntax differences between the languages – it shows how to upgrade your thinking to a more functional style.

submitted by /u/dmcg
[link] [comments]

Continue Reading(re)Creating a Ktor DSL

Exploring Data Analysis with Kotlin, Part I

Hey there, Kotlin enthusiasts! Excited about delving into data manipulation and visualization? A new article has just been released, diving into the world of Kotlin DataFrame and Kandy libraries.

——> Tutorial

https://preview.redd.it/agksfl17pswc1.jpg?width=1214&format=pjpg&auto=webp&s=69a8ffe07252967f2a6d1e0783b84d54a9aa111c

📝 Overview:

The latest piece is tailored for regular Kotlin developers, offering insights into data manipulation without requiring prior data expertise.

In this article, you’ll find:

1️⃣ Importing data from CSV files to create DataFrames.

2️⃣ Common data tasks made easy.

3️⃣ Effortless presentation of findings.

💡 Kotlin Notebook Convenience:

Discover the convenience of Kotlin Notebook for a smooth learning experience, applicable to Gradle-based projects too!

🌐 Real-Life Insights:

Explore data about top Stack Overflow answerers, unraveling exciting trends.

📥 Access the Data:

Head to the StackExchange Data Explorer for sample data downloads.

Ready to explore Kotlin’s data analysis capabilities? Dive into the new article and embark on a journey of discovery!

View Poll

submitted by /u/NetHairy4282
[link] [comments]

Continue ReadingExploring Data Analysis with Kotlin, Part I

K2 Compiler Performance Benchmarks and How to Measure Them on Your Projects

With the Kotlin 2.0.0 release drawing ever closer, the K2 compiler is now available for you to try! In this blog post, we explore the performance of the K2 compiler in various projects and give you the tools to collect your own performance statistics.

The road to the Stable version of the K2 compiler has been a long one. Since we made the decision to rewrite the compiler from scratch, we’ve added a new type inference algorithm, new JVM and JS IR (Intermediate Representation) backends, and finally, a new frontend.

The fundamental change in the frontend is the use of one unified data structure that contains more semantic information. This change makes compilation more efficient and helps IntelliJ IDEA analyze your Kotlin code.

Kotlin K2 Compiler

The driving force behind this initiative was the desire to speed up the development of new language features, unify all platforms that Kotlin supports, and improve performance for Kotlin developers.

To receive the full benefits of the K2 compiler in IntelliJ IDEA, enable K2 Kotlin mode. The K2 Kotlin mode is in Alpha, so not all IDE features are supported yet.

Key performance improvements

  • The K2 compiler brings up to 94% compilation speed gains. For example, in the Anki-Android project, clean build times were reduced from 57.7 seconds in Kotlin 1.9.23 to 29.7 seconds in Kotlin 2.0.0.
  • The initialization phase is up to 488% faster with the K2 compiler. For example, in the Anki-Android project, the initialization phase for incremental builds was cut from 0.126 seconds in Kotlin 1.9.23 to just 0.022 seconds in Kotlin 2.0.0.
  • The Kotlin K2 compiler is up to 376% quicker in the analysis phase compared to the previous compiler. For example, in the Anki-Android project, analysis times for incremental builds were slashed from 0.581 seconds in Kotlin 1.9.23 to only 0.122 seconds in Kotlin 2.0.0.

Set up

The Kotlin project that we used to run performance tests is available on GitHub. If you’d like to use it to collect your own statistics from your Kotlin projects, see Collect your own measurements.

Projects

We ran our performance tests on the following open-source projects:

Tools

To collect performance measurements, we used the Gradle Profiler.

We also used Kotlin build reports to collect detailed measurements about the different compilation phases.

Test scenarios

We created three scenarios to cover the most common compilation activities in a Kotlin project:

  1. Clean build: We built the whole project from scratch without using any pre-set configuration or build cache. This scenario happens when you compile a project for the first time or after you’ve made a project configuration change.
  2. Incremental build with no ABI (application binary interface) changes: We made changes in one file, but the changes were such that no further subprojects need to be recompiled.
  3. Incremental build with ABI changes: We made changes in one file, but these changes meant that a module needed to be recompiled because the external interface of the module was modified.

We compared performance data collected from Kotlin 1.9.23 and Kotlin 2.0.0 and used Gradle 8.5 throughout.

The Gradle configuration cache, project isolation, and build cache were disabled for all scenarios. Before the performance measurements were collected, we completed 10 warm-up rounds. We performed 10 measurement rounds in total.

Types of measurements collected

In addition to the total Kotlin compilation time and Gradle build time, our tests used build reports to collect data on the time spent by the compiler in each of its compilation phases:

Phase Description
Initialization The compiler initializes and reads all available symbols from the compilation classpath.
Analysis The compiler reads the source code, character by character, and breaks it down into meaningful tokens. These are analyzed to determine the structure of the code, after which the compiler performs semantic checks.
IR translation The compiler converts the semantic information into the internal representation.
IR lowering The compiler transforms the internal representation into a simpler form and finally de-sugars all special language constructs.
IR generation The compiler translates the optimized internal representation into the final bytecode.

Results

When comparing the results across our different scenarios and projects, we saw that Gradle build speeds were consistently higher by at least 9%. The most significant improvement was seen in the Anki-Android project, where the recorded build speed gains were around 20%, regardless of the compiler scenario.

Graph with Gradle build time for clean build scenario – Anki-Android Graphs with Gradle build time for incremental build scenarios – Anki-Android

Clean builds

The benefits of the new K2 compiler architecture were most clearly seen in the clean build scenario, with both projects compiling significantly faster in Kotlin 2.0.0.

The Exposed project demonstrated an 80% increase in compiler speeds, with compilation times falling from 5.8 seconds in Kotlin 1.9.23 to 3.22 seconds in Kotlin 2.0.0. Similarly, the Anki-Android project showed a 94% increase in compiler performance, as compilation times were cut to 30 seconds in Kotlin 2.0.0 compared to 58 seconds in Kotlin 1.9.23.

Graph with Kotlin compilation time for clean build scenario – Exposed Graph with Kotlin compilation time for clean build scenario – Anki-Android

Delving deeper into the performance of the different compilation phases, we found that the biggest improvement came in the analysis phase, with the Exposed and Anki-Android projects reporting speed gains of 156% and 194%, respectively. This is a direct result of the new unified data structure that we implemented in the K2 frontend. Since the new compiler only has to manage one data structure instead of two and has additional semantic information available to it, it’s unsurprising that we saw such an improvement here.

Graph with compilation breakdown for clean build scenario – Exposed
Blue = analysis phase
Graph with compilation breakdown for clean build scenario – Anki-Android
Blue = analysis phase

Incremental builds

For the incremental build scenarios, we also saw improved compiler performance. Especially in the case of the Anki-Android project, where compilation speeds increased by as much as 275% for both incremental build scenarios.

Graph with Kotlin compilation time for incremental build scenarios – Anki-Android

However, the results for the Exposed project weren’t as impressive. We saw a gain of just 35% for incremental builds with ABI changes and 7% for incremental builds without ABI changes, so there’s still room for improvement in this area.

While investigating the performance of the different compilation phases, we observed that the biggest performance gains were in the initialization and analysis phases, with each becoming around 400% faster in the Anki-Android project.

Graph with compilation breakdown for incremental build for ABI scenario – Anki-Android 
Graph with compilation breakdown for incremental build for non-ABI scenario – Anki-Android

Again, the Exposed project demonstrated a smaller improvement in the initialization and analysis phases, with a relatively modest 12%–55% performance boost. In fact, in the case of incremental builds with ABI changes in the IR lowering and IR generation phases, there was actually a performance degradation. However, the time taken in these phases is so little to begin with that we believe the impact to be imperceptible. For example, the IR lowering phase takes  0.01 seconds in Kotlin 1.9.23, rising to 0.014 seconds in Kotlin 2.0.0.

Collect your own measurements 

As we edge closer to the release of Kotlin 2.0.0, the spotlight is firmly on the K2 compiler, and it’s your turn to put it to the test. This section guides you through collecting your own Kotlin project performance statistics.

To facilitate an easy and quick benchmarking process for your own Kotlin projects, we’ve prepared a performance management tool especially for you. This is a Kotlin-based project that allows you to gather performance measurements for any Kotlin project, local or remote. By default, this project runs benchmark tests for Kotlin versions 1.9.23 and 2.0.0-RC1. However, you have the option to customize your own test scenarios as required. For visualization purposes, you can analyze the raw data generated by these scenarios, or use the provided Kotlin notebook for a more convenient visual representation.

Step 1: Open the project in IntelliJ IDEA

  1. Clone the k2-performance-metrics repository.
  2. Download and install the latest version of IntelliJ IDEA.
  3. On the IntelliJ IDEA welcome screen, click Open or select File | Open in the menu bar.
  4. Navigate to the k2-performance-metrics folder and click Open.

Step 2: Set up your project

  1. Verify environment variables:=
    1. Confirm that the JAVA_HOME variable is set, as it will be used to compile your project.
    2. Ensure that ANDROID_HOME is set if your project involves Android development.
  2. In the gradle.properties file, add the project that you want to collect measurements for in one of the following ways:
    • Add the local path in project.path.
    • Add the GitHub URL in project.git.url and the commit that you want the tests to run from in project.git.commit.sha.
  1. For the incremental build scenarios, add the path to the file where you’re making changes:
    1. scenario.abi.changes for the incremental build with ABI changes scenario.
    2. scenario.non.abi.changes for the incremental build without ABI changes scenario.

Step 3: Collect measurements

  1. In IntelliJ IDEA, open the Gradle tool window by selecting View | Tool Windows | Gradle.
  2. In Tasks | benchmarks, select and run the runBenchmarks task.
In Tasks | benchmarks, select and run the runBenchmarks task

Alternatively, you can run the following command in the terminal from the k2-performance-measurement root directory:

./gradlew runBenchmarks -t

By default, the build reports are available in the reports/{kotlin-version}/{scenario-name} folders.

If you want to create your own custom test scenarios, you can do so by extending the task class in your build.gradle.kts file. For more information, see the Create custom build scenarios section in the k2-performance-metrics repository’s README.md.

Step 4: Analyze your results with Kotlin Notebook

You need IntelliJ IDEA Ultimate to use Kotlin notebooks.

  1. In IntelliJ IDEA, go to Settings/Preferences | Plugins and select the Marketplace tab.
  2. In the search bar, search for “Kotlin Notebook” and select Install.
  3. Open benchmarkResult.ipynb and click the Run All button in the gutter to run all cells.
Run all

Conclusion

In summary, the new K2 compiler architecture can significantly reduce compilation times in your projects. This is particularly true for clean builds and the analysis phase of compilation, though Gradle build times are also shortened by the new compiler. Nevertheless, the degree of improvement that you’ll see will depend on your specific project. Some projects may enjoy a considerable increase in performance, while others may only experience modest gains.

Throughout our time working on the new K2 compiler, we’ve been constantly testing its performance on our internal projects. Additionally, EAP champions and early adopters have been testing it on their own projects, providing invaluable feedback on its performance and usability. 

Now, we’re asking you to try the K2 compiler and provide your feedback. We encourage you to join the #k2-early-adopters channel in our public Slack (get your invite here) for support and discussions.  If you’re facing specific problems or bugs, please don’t hesitate to create an issue in the Kotlin YouTrack project. Your feedback is invaluable in refining K2 and ensuring it meets the high standards and needs of our community.

What else to read and watch

Continue ReadingK2 Compiler Performance Benchmarks and How to Measure Them on Your Projects

New book! Kotlin in Action, Second Edition

Hello,

I’m sorry for blatant advertising, but we have just released this book that I wanted to share with the community. Please remove this post if you don’t think it’s for the benefit of the community.

Written by core Kotlin language developers and Kotlin team members, “Kotlin in Action, Second Edition” by Sebastian Aigner, Roman Elizarov, Svetlana Isakova, and Dmitry Jemerov, teaches you Kotlin techniques that you can use for almost any application type, from enterprise services to Android apps. This new second edition is fully updated to include the latest innovations, and it adds new chapters dedicated to coroutines, flows, and concurrency.

You can find the book here. LiveBook should have enough free content to get you started.

Hope you find it useful.

Thank you.

Cheers,

submitted by /u/ManningBooks
[link] [comments]

Continue ReadingNew book! Kotlin in Action, Second Edition

Have you tried to generate Kotlin code using AI? Share your feedback with us!

Have you ever used AI to generate Kotlin code? Perhaps you’ve used AI to ask questions about Kotlin? Was AI helpful in these cases?

We want to learn more about your use cases and also assess the quality and effectiveness of these interactions. Your feedback will help us improve AI tools, making your work with Kotlin simpler and more productive.

Take our survey and share your experience! ➡️ https://kotl.in/ai-tool-ux-reddit

submitted by /u/daria-voronina
[link] [comments]

Continue ReadingHave you tried to generate Kotlin code using AI? Share your feedback with us!

Should I stay with Kotlin?

Hi guys! I work in a Supply Chain company, so I do not have a good understanding of what is going on in the IT area.

During my life I have completed a couple of programming courses and got some certificates in Python and Oracle SQL. I would like to make programming as my hobby to slowly but consistently improve my understanding of the area and try building some apps/tools. I have been trying C# recently and like it so much. I also tried Kotlin and Java with Android studio. Though Kotlin looks simple and convenient to use, Android studio gave me a bad experience due to its slow response.

Could you please give me some advice before I went too far in the wrong direction? If I want to try building some applications but I am an iPhone user and am not sure whether I want to work with mobile apps only, should I continue with Java/Kotlin or should I choose something like Python or JavaScript? Whether it is beneficial to start with Java/Kotlin now when there is a big chance that in a couple of weeks I decide to move to a WEB dev or something I cannot now even understand I would need and like.

submitted by /u/SanJunipero2033
[link] [comments]

Continue ReadingShould I stay with Kotlin?

A Step-by-Step Guide to Performing Data Analysis With Kotlin DataFrame

Introduction

This is the first in a series of tutorials on how to easily manipulate and visualize your data using the Kotlin DataFrame and Kandy libraries. The tutorials are aimed at regular Kotlin developers, so no previous experience of data analysis or similar frameworks (like pandas or Apache Spark) is necessary. You should, however, be familiar with the Kotlin language and have created Kotlin-based projects in IntelliJ IDEA previously.

In this tutorial, you will learn:

  • How to create a dataframe from a CSV file.
  • How to perform common operations.
  • How to display or export your results. 

You will be working within Kotlin Notebook, both for convenience and in order to have access to the Kandy plotting library. The core dataframe capabilities you will see are available in Gradle-based projects.

If you prefer, you can jump directly to the sample project, which contains data files and notebooks for all the tutorials in this series. However, developers who are new to DataFrame, may find it beneficial to build everything themselves from scratch, one step at a time. You can then compare your project against the sample project to ensure that you have built everything correctly.

All the tutorials use real-world data, which in this case, contains information about the top answerers on Stack Overflow. Please note that the results shown below are correct for the data that was downloaded at the time of writing and that is included in the sample projects. If you are fetching fresh data, then naturally this may contain updated values.

The sample data

You can obtain the sample data via the StackExchange Data Explorer. This allows you to run sample queries against a range of Q&A websites, including Stack Overflow. The results of your queries can be downloaded as a CSV file.

This sample query selects the top 500 users on Stack Overflow, based on an average score calculated against their answers to questions. Use the RunQuery button to execute the query, and then save the results via the Download CSV link:

StackExchange Top 500 answers on the site sample sata

Once you have downloaded the file, save it under the name Top500Answerers.csv.

Creating your project

Open IntelliJ IDEA and install the Kotlin Notebook plugin, as described in this blog post. Then, use the New Project wizard to create a new Kotlin project, with Gradle as the build tool. 

Copy the CSV file you created in the previous section into src/main/resources. You should be able to open the file in either the text or data views, as shown below:

The Kotlin Notebook plugin: opening a CSV file in IntelliJ IDEA

Hello, DataFrames!

Right-click on the project name and choose New | Kotlin Notebook. You can save the file as whatever you like. Then, add and run the following two lines:

%useLatestDescriptors

%use dataframe

The use command will now load and initialize the Kotlin DataFrame library for you. Many popular libraries can be loaded using their name alone, while other libraries can be loaded based on their Maven coordinates. A lot can happen when a library is loaded, so it is a good idea to do this in a separate cell.

Now you can create a new cell and add the following three lines:

val path = "./src/main/resources/Top500Answerers.csv"

val topFolks = DataFrame.read(path)

topFolks.head()

When you run these, the output should be similar to the one shown below:

Congratulations! You have just successfully built a dataframe from a CSV file and printed out the top five records.

Displaying Data in Kotlin Notebooks

Now that you have some data, let’s consider how you can display it in a Kotlin notebook.

You can render content as markup via the DISPLAY and HTML functions. The example below sorts the users, takes the top five, and prints their details as an HTML list:

fun htmlLink(text: String, url: String) = "<a href="$url">$text</a>"
fun soUrl(userID: String) = "https://stackoverflow.com/users/$userID"

val topFive = topFolks
   .sortBy { `Average Answer Score` }
   .tail()
   .reverse()

val content = buildString {
   append("<ul>")
   topFive.forEach {
       val userID = `User Link`.toString()
       val average = `Average Answer Score`
       val linkMarkup = htmlLink(userID, soUrl(userID))
       append("<li>User $linkMarkup with an average of $average</li>")
   }
   append("</ul>")
}

DISPLAY(HTML(content))

This is what should be displayed:

You should be able to click on any of the links to open a particular contributor’s page in your browser.

Displaying data as HTML is useful but requires effort to produce visually appealing results. As a simpler alternative, you can take advantage of the Kandy library to visualize the data. Let’s try it for our top five contributors using a bar chart.

First, load the Kandy library in a separate cell (for the reasons discussed above):

%use kandy

Then, plot the contributors, with their User IDs on the x-axis and their average answers on the y-axis:

plot {
   bars {
       x(topFive.map { "ID: $`User Link`" })
       y(topFive.map { `Average Answer Score` })
   }
}

This is what should be displayed:

Showing graphs using the DataFrame library

That is an impressive result for relatively little effort! Now that you have some idea of the power of the DataFrame library, let’s take a step back for a moment and review some core concepts.

What is a dataframe?

A dataframe is an abstraction for working with structured data. It is a table created by reading from a source such as a CSV file, a JSON document, or a database. The dataframe contains one or more named columns, whose content can be of different types. 

The content of a column can be any Kotlin object, including another dataframe. This feature allows you to store and manipulate hierarchical data.

The DataFrame API implements all the operations a functional programmer or database admin might require. The API is immutable, so any operation that has the potential to alter the dataframe instead produces a new instance. The underlying data is reused whenever possible for efficiency and performance.

One of the greatest strengths of the Kotlin DataFrame library is that it is typesafe. You can generate strongly typed extension properties that correspond to the columns in the dataframe. You will explore this in depth in the next section. 

Please note that when working in Kotlin Notebook, these properties are created on the fly. 

Accessing values in dataframes

The good news with dataframes is that, if you have previously used Kotlin collections or any modern data structures libraries, then you can immediately start work. All the standard operators of functional programming work as you would expect, right out of the box.

Let’s add up the total number of answers of all the contributors. Add and run a new code cell containing the expression below:

topFolks.map { Answers }.sum()

You can use the familiar map and sum operations to calculate the total. For the purposes of this demo, you could use the general-purpose reduce operator:

topFolks.map { Answers }.reduce { a, b -> a + b }

This should give you the same result as before. As discussed previously, properties have been added to the dataframe for each of the fields in the CSV file. These extension properties make it simple to access and manipulate the data in a type-safe way.

You can manage without these extension properties if required, for example, by creating column accessor functions:

val Answers by column<Int>()

topFolks.map { Answers() }.sum()

Standard operations with dataframes

Before we explore some more of the built-in operations, we should first clean up the data a little. You can rename the last two columns for clarity and convenience:

val topFolksClean = topFolks
   .rename { `Average Answer Score` }.into("AverageScore")
   .rename { `User Link` }.into("UserID")

topFolksClean

You should see that the column names have been modified:

Now, enter and run the expression below:

topFolksClean
   .filter { Answers >= 20 }
   .sortBy { AverageScore }
   .tail(3)
   .select { UserID }

As you can see, you have a chain of operations, which:

  • Filters the records to include only contributors that answered 20 or more questions.
  • Uses sortBy to sort the remaining records in ascending order by the average score.
  • Takes the top three results, which will be at the end of the sorted dataframe.
  • Extracts the IDs of the contributors into the final dataframe.

Your output should be similar to this:

If you are familiar with SQL then the select operation will make sense to you, although now it appears at the end of the statement. You can think of the filter operation as the equivalent of the WHERE clause and the sortBy operation as the equivalent of ORDER BY.

For example, given a database table in PostgreSQL, you could create and run the following query in DataGrip:

Creating and running a query in DataGrip

Note that the API contains both select and map operators for performing transformations.

distinct

From looking through the data you can see that the numbers of questions answered by contributors are not unique. For example, more than one contributor answered 15 questions. You can find out how many duplicates there are as follows:

topFolksClean
   .distinctBy { Answers }
   .count()

For the data at time of writing, this gives a result of 93, which means that the 500 contributors could be allocated into 93 groups, based on how many questions they answered. The distinctBy operation does not perform this grouping, it simply selects the first row from every group.

If you were to sort the contributors, by average answer score and in descending order, then the distinctBy operation should only select the highest-scoring contributor for each group. Let’s try to validate this, using the built-in operations.

You can first examine the original CSV file and pick a value from the Answers column, which occurs more than once. In this example, the repeated value is 15:

topFolksClean
   .filter { Answers == 15 }

This is the associated output. There are 22 rows, which the notebook will display in groups of 10.

In our results, the highest score for this group is 909.07, and the lowest is 58.33. When examining your own results, don’t forget to page through the whole dataset! 

If you sort in ascending order and use distinctBy, then there will be a single contributor with an Answers value of 15:

topFolksClean
   .sortBy { AverageScore }
   .distinctBy { Answers }
   .sortBy { Answers }

Since you are sorting in ascending order the single contributor will have a score of 58.33. As you can see from the highlighted row this is the case.

On the other hand, if you sort in descending order then the contributor shown in the results should have a score of 909.07. Let’s confirm this:

topFolksClean
   .sortByDesc { AverageScore }
   .distinctBy { Answers }
   .sortBy { Answers }

Once again, you get the expected score. 

group


Now that you have a better understanding of the data, let’s go ahead and perform the grouping using groupBy. For simplicity, you will only group the top 10 results:

topFolksClean
   .sortBy { AverageScore }
   .tail(10)
   .groupBy { Answers }

The data returned by this result is a little more complex. You should see a column containing the keys for the groups, which in this case is the number of answers. Then you will have a column containing the data itself, as a nested dataframe.

This is how the result is represented in the notebook. Note that you can click on the groups to reveal their content:

If you have used the GROUP BY clause in a SQL SELECT statement, then this operation will be familiar to you. Here’s another example from PostgreSQL and DataGrip:

You can explore the grouped data gradually, starting by viewing the keys:

val groupedData = topFolksClean
   .sortBy { AverageScore }
   .tail(10)
   .groupBy { Answers }

groupedData.keys

Then, you can print the groups:

groupedData.groups

If you expand each group to view its contents, you can see that the groups for 14 and 15 answers have two members, and all the others have one. Hence, for the top 10 results, you have eight groups in total.

filter

Let’s see if you can use the core operations to find and display the groups with more than one result:

groupedData
   .groups
   .filter { df ->
       df.rowsCount() > 1
   }.forEach { df ->
       println(df.first().Answers)
   }

This code filters the groups, to find those with more than one row. Then it iterates through each group, printing the number of answers. Each row in an individual group will have the same number of answers, so you can pick any row you like. In this example, we chose the first.

The filter returns two numbers:

14

15

Because groups is returning a column of dataframes, the signatures of the filter and forEach methods are slightly different. Each time your lambda is invoked, there will be a single parameter, whose value is the current dataframe.

Altering the DataFrame schema

When working with dataframes, you are not limited to the schema that was inferred when the dataframe was created. You can add columns, remove columns, and even change the data type of existing columns. Consider the example below:

val ratedFolks = topFolksClean
   .sortBy { AverageScore }
   .remove("Answers")
   .add("Rating") {
       when (AverageScore) {
           in 0.0 ..< 100.0 -> "Low"
           in 100.0 ..< 300.0 -> "Medium"
           else -> "High"
       }
   }

Here you take the sorted data, remove the Answers column, and add a new Rating column, which is derived from the AverageScore. This gives us a new dataframe, which you refer to as ratedFolks.

As an example, you can then view the first and last three rows by concatenating them into a new dataframe:

val topAndBottom = listOf(ratedFolks.head(3), ratedFolks.tail(3)).concat()

topAndBottom

This is what should be displayed:

Note that this is achieved via an extension function, added to the standard Iterable type. Extensions are a key feature in Kotlin for making libraries simple and convenient to use.

Visualizing your data 

As you have already seen, you can take advantage of the Kandy library to plot the data. Let’s try to visualize which values of Answers occur most frequently in the data. 

Enter and run the following:

val answersPairedWithCounts = topFolksClean
   .groupBy { Answers }
   .count()
   .filter { column<Int>("count") >= 20 }

This code will group the records based on the number of answers and then replace each group with its size. For simplicity, let’s view only those groups with 20 or more members:

answersPairedWithCounts

Now, let’s ask Kandy to plot this dataframe as a bar chart:

answersPairedWithCounts.plot {
   bars {
       x(Answers)
       y(count)
   }
}

This is the resulting chart:

You can see that the most common number of questions answered was 11, occurring 63 times in the data.

Note that you already loaded the Kandy library in an earlier example:

%use kandy

If you skipped that example, then you will need to load the library now. As we mentioned earlier, it is best to do this in a separate cell.

Exporting your results

So far, you have only viewed your results and have not saved them. Let’s now take a look at how you can export data from Kotlin Notebook. In the two expressions below, you create a new dataframe and then export it as a CSV file and JSON document:

topFolksClean
   .sortBy { AverageScore }
   .tail(10)
   .toCsv()

topFolksClean
   .sortBy { AverageScore }
   .tail(10)
   .toJson()

Exporting a dataframe to HTML works the same way. The toStandaloneHTML method produces an HTML document containing a table, with associated CSS styles and JavaScript event handlers. This document can be opened directly in your default browser:

topFolksClean
   .sortBy { AverageScore }
   .tail(10)
   .toStandaloneHTML()
   .openInBrowser()

Conclusions

Hopefully, this tutorial has demonstrated the power and utility of the Kotlin DataFrame library. Please remember that the sample project contains data files and notebooks for all the tutorials in this series. You can clone this project and easily modify the examples, or replace the sample data files with your own.

In the next tutorial, we’ll show how to work with the Stack Exchange REST API to obtain JSON data. The information in this next installment will be both more complex and hierarchical, allowing you to see more of the power of the DataFrame API.

Continue ReadingA Step-by-Step Guide to Performing Data Analysis With Kotlin DataFrame

End of content

No more pages to load