2026/03/22

Understanding Kotlin Compilation: A Practical Internal Tour

A walkthrough of lexing, parsing, FIR, IR, output generation, and compiler plugins.

Sometimes compilation can feel mysterious. You write a Kotlin code like this:

fun greet(name: String) {
	println("Hello, $name")
}

and somehow it turns into something the machine can run.

This post is about that journey.

Start With The Big Picture

Kotlin compilation is not one giant step. It is a pipeline where each phase has one job.

Compilation Pipeline

Lexing

Turn text into tokens

The compiler splits raw source code into small pieces such as keywords, names, operators, and punctuation.

Parsing

Build structure

Those tokens are organized into a syntax tree so the compiler can understand the shape of the program.

FIR

Understand meaning

Kotlin resolves names, checks types, infers missing information, and reports many errors here.

IR

Prepare transformations

The program is converted into a lower-level form that is easier for compiler passes and backends to transform.

Bytecode

Generate output

For the JVM target, Kotlin emits class files that the Java Virtual Machine can execute.

At a beginner level, you can think about the pipeline like this:

  1. Read the text.
  2. Understand the structure.
  3. Understand the meaning.
  4. Transform the program into a more compiler-friendly form.
  5. Generate the final output.

First Phases: Lexical Analysis And Parsing

The earliest compiler phases are not specific to Kotlin. Most compilers do something like this.

Lexical analysis

Lexical analysis, often called lexing, takes raw text and breaks it into tokens.

For example, the compiler can split this:

fun greet(name: String)

into pieces such as:

  • fun
  • greet
  • (
  • name
  • :
  • String
  • )

At this point, the compiler has not understood the whole program yet. It only knows the building blocks.

Parsing

Parsing takes those tokens and organizes them into a structure.

In Kotlin tooling, one of the first structured forms built from the source is the PSI.

PSI means Program Structure Interface.

You can think of PSI as a tree representation of the source file that is very close to the code you wrote. It keeps the code organized in a way that tools and the compiler frontend can navigate.

Now the compiler can say things like:

  • this is a function declaration
  • the function is named greet
  • it has one parameter called name
  • that parameter has type String

This structure is represented as a tree.

Another term you will often hear here is AST, which means Abstract Syntax Tree.

The easiest way to understand the difference is this:

  • PSI is a source-oriented tree used heavily by the Kotlin tooling and frontend infrastructure
  • AST is the more general compiler concept of a syntax tree that represents the program structure

At a beginner level, it is enough to think of both as "tree-shaped representations of your code", with PSI being the concrete structure Kotlin tooling works with early on.

PSI / AST Visual

Source Code

fun greet(name: String) {
  println("Hello, $name")
}

Parsed Tree

So the transformation looks roughly like this:

  1. Raw text
  2. Tokens from lexing
  3. PSI / syntax tree from parsing

After that, the compiler can move from just understanding structure to understanding meaning.

See One Small Function Move Through The Pipeline

This is the easiest way to build intuition:

Code Walkthrough

Source Code

fun greet(name: String) {
  println("Hello, $name")
}

Lexing

Transformation
fun | greet | ( | name | : | String | )

The compiler reads characters and groups them into tokens. It still does not know the program meaning, only the pieces that exist.

Parsing

Transformation
KtFile
	KtNamedFunction name=greet
		KtParameterList
			KtParameter name=name typeReference=String
		KtBlockExpression
			KtCallExpression callee=println
				KtValueArgumentList
					KtStringTemplateExpression
						KtLiteralStringTemplateEntry("Hello, ")
						KtSimpleNameStringTemplateEntry(name)

Now the compiler knows there is a function, it has one parameter, and it returns a String. The structure is clear.

FIR

Transformation
FirSimpleFunction name=greet returnType=Unit
	FirValueParameter name=name type=String
	FirBlock
		FirFunctionCall callee=println returnType=Unit
			argument[0]: FirStringConcatenation type=String
				FirConstExpression("Hello, ")
				FirQualifiedAccessExpression(name) type=String

FIR keeps a tree shape but enriches it with resolved symbols and inferred/checked types.

IR

Transformation
IrFunction name=greet returnType=kotlin.Unit
	IrValueParameter name=name type=kotlin.String
	IrBlockBody
		IrCall symbol=println returnType=kotlin.Unit
			valueArgument[0]: IrStringConcatenation type=kotlin.String
				IrConst("Hello, ")
				IrGetValue(name) type=kotlin.String

IR represents a lowered, backend-friendly form that compiler passes can transform before bytecode generation.

Bytecode

Transformation
public static final void greet(java.lang.String);
	0: aload_0
	1: ldc           #10                 // String name
	3: invokestatic  #16                 // Method kotlin/jvm/internal/Intrinsics.checkNotNullParameter:(Ljava/lang/Object;Ljava/lang/String;)V
	6: getstatic     #22                 // Field java/lang/System.out:Ljava/io/PrintStream;
	9: new           #24                 // class java/lang/StringBuilder
	12: dup
	13: invokespecial #27                // Method java/lang/StringBuilder.<init>:()V
	16: ldc           #29                // String Hello, 
	18: invokevirtual #33                // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
	21: aload_0
	22: invokevirtual #33                // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
	25: invokevirtual #37                // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
	28: invokevirtual #43                // Method java/io/PrintStream.println:(Ljava/lang/String;)V
	31: return

The backend emits JVM instructions in the generated .class file, including parameter null-checks and the println call sequence.

What FIR Actually Is

FIR means Front-end Intermediate Representation.

That name sounds intimidating, but the core idea is simple:

  • Kotlin has already read the code
  • Kotlin has already built structure from it
  • Now Kotlin wants to understand what the program means

Inside FIR, the compiler answers questions like:

  • what does this name refer to?
  • what type does this expression have?
  • is this function call valid?
  • is there a type mismatch here?

So if someone asks, "what is FIR?", an answer could be:

FIR is the compiler's internal model for understanding your Kotlin program at the semantic level.

But FIR is not only an "idea" of meaning. It is also a concrete tree-like structure.

For this code:

fun greet(name: String) {
	println("Hello, $name")
}

you can imagine a simplified FIR shape like:

FirSimpleFunction name=greet returnType=Unit
	FirValueParameter name=name type=String
	FirBlock
		FirFunctionCall callee=println returnType=Unit
			argument[0]: FirStringConcatenation type=String
				FirConstExpression("Hello, ")
				FirQualifiedAccessExpression(name) type=String

The key point is: FIR nodes carry semantic info.

  • each declaration has resolved symbols
  • each expression has a computed type
  • each call can be checked against argument and parameter types

This is where many diagnostics come from. For example:

val x: Int = "hello"

FIR is where the compiler can understand that the right side is a String, the left side expects an Int, and therefore a type error should be reported.

So FIR is both:

  • A tree representation of your program
  • A semantically enriched tree used for validation and diagnostics

What IR Actually Is

IR means Intermediate Representation.

IR comes after the compiler already understands the meaning of the code.

The goal of IR is different from FIR:

  • FIR is mainly about understanding and validating the program
  • IR is mainly about transforming the program into a lower-level, more uniform representation

Why does this help?

Because compilers often need a form that is easier to rewrite, optimize, and send to different backends.

So if someone asks, "what is IR?", a good answer could be:

IR is the compiler's internal model for transforming a valid program before generating final output.

And like FIR, IR is also a tree structure with concrete nodes.

A simplified IR shape for the same greet function might look like:

IrFunction name=greet returnType=Unit
	IrValueParameter name=name type=kotlin.String
	IrBlockBody
		IrCall symbol=println
			valueArgument[0]: IrStringConcatenation
				IrConst("Hello, ")
				IrGetValue(name)

Then later compiler passes can lower or rewrite that tree into forms that are easier for backends.

For example, a lowering pass might conceptually turn string concatenation into a sequence of lower-level operations:

IrCall println(
	IrCall StringBuilder.append("Hello, ")
	IrCall StringBuilder.append(name)
	IrCall StringBuilder.toString()
)

Exact shapes vary by pass and Kotlin version, but the idea is stable: IR is a transform-friendly tree.

In practice, IR is where the compiler can do many lowering and transformation steps before generating JVM bytecode.

What Happens At The End

Once the program has gone through those analysis and transformation stages, the backend can generate the final output.

For Kotlin, this is where multiplatform support becomes important: the compiler can target different runtimes.

Common targets include:

  • JVM: emits .class bytecode executed by the Java Virtual Machine.
  • JavaScript: emits JavaScript output for browser or Node.js environments.
  • Native: emits native binaries for platforms like iOS, macOS, Linux, and Windows.
  • Wasm: emits WebAssembly for modern web and runtime scenarios.

In this post, most concrete output examples are shown with a JVM-oriented lens (for example FIR/IR-to-bytecode style illustrations), because JVM output is the most familiar reference point for many Kotlin developers.

In Kotlin Multiplatform projects, shared code can pass through a common frontend pipeline, and then each backend emits target-specific output for its platform.

What Compiler Plugins Are

Compiler plugins are pieces of code that extend what the compiler can do.

They are not regular application code. They participate in the compilation process itself.

You can think of them as extra tools that can:

  • Inspect code during compilation
  • Report custom diagnostics
  • Generate extra code
  • Transform internal compiler representations

They can join the compilation in different places depending on what they need.

Where Plugins Join

Before deeper meaning is finished

Frontend / FIR-side extensions

Useful when a plugin wants to inspect declarations, validate rules, or produce diagnostics while the compiler is still understanding code.

When Kotlin is moving into IR

FIR -> IR bridge

Useful when a plugin needs to generate extra program pieces before the lower-level transformation stage begins.

After the program is already in IR

IR transformations

Useful when a plugin wants to rewrite behavior, inject calls, or reshape code before final output generation.

Examples of what plugins might do:

  • Validate rules in a codebase
  • Generate extra declarations from annotations
  • Transform IR before final output generation

K1 vs K2 (Quick Context)

You will often see two names in Kotlin compiler discussions:

  • K1: the older compiler frontend architecture.
  • K2: the newer frontend architecture built around FIR, designed for better consistency, performance, and future evolution.

In practical terms, K2 makes the frontend pipeline more unified, which helps diagnostics quality and long-term compiler extensibility.

This post explains the K2 compilation pipeline and terminology.

Real-World Examples

The compiler pipeline becomes easier to trust when you look at real features that rely on internal rewrites.

Suspend functions (language feature)

At source level, you write something simple:

suspend fun loadUser(id: String): User {
	val profile = api.fetchProfile(id)
	return User(profile)
}

Internally, Kotlin lowers this into a state-machine style form (conceptually):

  • Extra continuation parameter is introduced
  • Local state is stored across suspension points
  • Execution resumes by jumping to the correct state label

That transformation is what lets suspend code look sequential while running asynchronously.

Jetpack Compose (compiler plugin)

At source level, UI looks declarative:

@Composable
fun Greeting(name: String) {
	Text("Hello $name")
}

The Compose compiler plugin rewrites this so runtime can track recomposition efficiently (conceptually):

  • Hidden composition parameters are threaded through calls
  • Stability/change tracking data is propagated
  • Function bodies are split so only invalidated parts re-run

So Compose is a strong real-world example of plugin-driven IR transformations that directly affect runtime behavior and performance.

One more common plugin example: kotlinx.serialization

You write:

@Serializable
data class User(val id: String, val age: Int)

The serialization plugin generates serializer machinery during compilation, so encoding/decoding can work without handwritten boilerplate.

This is another case where compiler plugins do meaningful internal code generation/transformation before final backend output.

Conclusion

Kotlin compilation is easier to reason about when you stop thinking of it as one opaque step.

The compiler reads text, builds structure, enriches that structure with meaning, transforms it into backend-friendly representations, and finally emits output for a specific target.

With K2, that pipeline becomes especially useful to understand because FIR and the newer frontend architecture make the flow more consistent and easier to extend over time.

You do not need to memorize every internal node type to benefit from this model. What matters is understanding the roles of PSI, FIR, IR, and the backend, and recognizing that many Kotlin features and plugins work by rewriting code internally before final output is generated.

Once that clicks, features like suspend, Jetpack Compose, and serialization stop feeling magical and start feeling like well-structured compiler transformations.