Understanding how Java Virtual Machine (JVM) works

Hasitha Subhashana
10 min readMay 9, 2021
Designed by Freepik

Most of the developers learn JVM as a black box. You write the code. Then compile it and you get the output. But if you are a Java software engineer it can be very useful to know how Java virtual machine works.

If you want to learn about JVM, first you have to know what is a virtual machine.

The word “Virtual machine” is made of below two words.

· Virtual — appears to exist in reality, but it doesn’t really exist.

· Machine — something that helps to do the work easily.

So, what exactly is a virtual machine when both words combined ?

“Virtual machine is something that is not in the reality but still it will simulate the environment to make you feel as is real”.

There are 02 types of virtual machines.

  1. System-based virtual machines (SVM).
  2. Application-based virtual machines (AVM).

So, what’s the difference,

SVMs are designed as a substitution for physical computers. They run on a host machine and use the hardware resources of that host. Also, the same hardware resource can be shared by several virtual machines that are entirely independent of one another. (Ex: Hypervisor, Xen).

AVM on the other hand allows a single process to operate as an application on the host machine without involving hardware components. They are also known as process-based virtual machines. JVM which we will discuss today belongs to this category. You can think of Common Language Runtime as an example of an AVM if you’re familiar with C#.

Java Architecture

Code written in programming languages such as C and C++ is compiled into OS-specific machine code. So these kinds of programming languages are called compiled languages.

But in programming languages like JavaScript or Python, the computer executes the code directly without a compilation. These are called interpreted languages.

What’s special about Java is, it uses a combination of both compilation and interpretation. So, in Java, source code is first compiled to a class file with bytecode. Then this class file is executed with the Interpreter or JIT compiler.

However, JVM doesn’t come as a separate installation and it’s not something you can download and install. JVM always comes with JDK or JRE.

Let’s have a look at Java architecture to understand this clearly. There are 03 major components in Java architecture :

1. Java Development Kit (JDK).

2. Java Runtime Environment (JRE).

3. Java Virtual Machine (JVM).

Figure 2: Architecture of Java (https://www.guru99.com)

What’s the difference between JDK and JRE?

1. JDK (Java Development Kit)

JDK provides the environment to develop and execute the Java program. It has development tools to develop java programs and JRE to execute them. (Java developers use JDK).

2. JRE (Java Runtime Environment)

Java Runtime Environment provides the environment for executing a Java application So JRE is only used by those who only want to run the Java programs (Not to develop).

What is JVM?

JVM is a specification. It describes how JVM should work and how JVM should be implemented. Anybody can take that specification and build their own JVM.

JVM implementation is a java virtual machine implemented according to the JVM specification.

Below is a list of JVM implementation out in the word.

  • IBM JVM / J9 Virtual Machine.
  • HotSpot VM.
  • JRockit (discontinued).
  • Dalvik (discontinued).

Is JVM platform independent?

The answer is “NO”. Because when you install JRE on a Windows OS, JRE will deploy the code necessary to create JVM instances for a Windows OS. If you install JRE on a Linux-based PC it will only deploy the code necessary to create JVM instances for a Linux. So JVM is platform-dependent.

The below figure will give you a better understanding of this.

Figure 3: Java Compilation (source:https://www.developer.com/)

How JVM instances are created and destroyed?

Think of a HelloWorld application. Now you have to compile this to get the byte code using “javac HelloWorld.java” command. Then you execute it with “java HelloWorld” to get the output.

The movement you type “java HelloWorld” in your terminal/cmd you ask for a JVM instance from the OS. So, your class must have to “public static void main” method. And from there onwards your application will execute as it is in the main method.

JVM instances will only exist as long as your application runs. If you are executing 3 different programs, you will have 3 different JVMs running on your computer. However, if you are not running any Java applications on your PC then there will be no JVM instances as well.

Now that we have talked about how JVM instances are created let’s see how they are destroyed. There are 02 ways.

1. If all non-daemon threads are closed. (JVM exits while at least one non-daemon thread exists).

2. Calling System.exit() method.

We will talk about daemon threads and non-daemon threads later. As for now let’s dive deep into java virtual machine architecture. JVM consists of 03 main components.

  1. ClassLoader.
  2. Memory Area /Runtime Memory/Data Area.
  3. Execution Engine.
Figure 4:Java Virtual Machine Architecture (source: https://www.freecodecamp.org/)

ClassLoader

When the JVM wants to load a class, the class loader tries to load the class into the JVM using the fully qualified class name. But a single ClassLoader does not load all classes.

The ClassLoader loads the required class by type of the class and its path. Usually, first-class loaded into main memory is the class that has the main() method. However, if the required classes are not found then it returns a NoClassDefFoundError or ClassNotFoundException.

Figure 5: Structure of class loader (source: https://www.freecodecamp.org/)

You can find 03 main phases in the class loader

1. Loading.

2. Linking.

3. Initialization.

Loading

When JVM is loading a class file, it will do the following,

01. Read class information. (Mentioned below)

  • Fully qualified class name (FQCN).
  • Instance variable information.
  • Immediate parent information.
  • Whether it’s a class, interface, or an enum.

02. Create a class-type object.

When a class is loaded into JVM for the first time it will create an object from the “Class” class. So, the type of this created object is “Class” (there is a special type called “Class” in Java. So the data type is “Class”). However, if you load the same class again JVM will not create another object from the “Class” class again (Only one object per class is created). Then this “Class” type object will be stored in the Heap.

JVM has 03 class loaders.

  1. Bootstrap ClassLoader / Primodial ClassLoader.
  • This is a superclass of Extension ClassLoader and also the root classloader.
  • It loads the standard Java packages from the rt.jar file as well as other core libraries from the $JAVA HOME/jre/lib directory (Ex: java.lang, java.net, java.util, java.io).

2.Extension ClassLoader.

  • This is a subclass of Bootstrap ClassLoader and the superclass of the Application ClassLoader.
  • It loads the extensions of standard Java libraries from $JAVA_HOME/jre/lib/ext directory

3.Application / System Class Loader.

  • This is the final class loader and the subclass of Extension ClassLoader.
  • It loads the files present on the classpath (Therefore by default, the classpath is the current directory of the application).

ClassLoader in Java works on three principles (which we will discuss later).

1. Delegation

2. Visibility

3. Uniqueness

Linking

This is the 2nd phase of the class loader. Linking has 03 steps to go through.

1. Verification.

2. Preparation.

3. Resolution.

Verification

  • You might have heard that “Java is safe to execute in any environment”. So how does Java do it?
  • When you load your .class files, a subprogram named bytecode verifier will verify that your .class file,
  1. Comes from a valid compiler.
  2. Has correct structure.
  3. Has correct formatting.
  • If any of these verifications failed, the .class file is considered altered/corrupted and JVM will throw a runtime exception called verifier exception.

Preparation

  • If your class has any instance level or static variables, this is the stage where these values will be assigned with their default value. (Not the initial value).
  • If your code has a static variable like “static int year = 2020”, in this step “year” is set to 0, which is the default value of an int.
  • Ex: boolean = false, double = 0.0d, char = ‘\u0000’, int 0.

Resolution

In comparison to Assembly, Java is very programmer-friendly. So, in Java, programmers don’t have to deal with memory addresses. Because of this, in Java, you can create classes with names like “student” or “employee”. But the computer and the JVM both cannot understand such domain-specific object names. As a result, JVM replaces these symbolic links with direct links during resolution. So, class names like “Student” or “Employee” are replaced by their memory address.

Initialization

Real values are assigned to your variables in this stage (values that are declared in your code). In addition, any static blocks in the class will also be executed.

JVM is very flexible. This means that some of the above steps (Ex: Linking, Preparation) can be performed in a sequential or parallel manner, but there is one limitation: “Every class must be initialized before its active use” (I’ll go over this in more detail in a later post).

Memory Area

Figure 6: Structure of memory area (source: https://www.freecodecamp.org/)

This is the 2nd component of JVM. The memory area is also known as Runtime Data Area. It’s divided into 05 sub-areas.

01. Method area.

02. Heap.

03. Stack.

04. PC registers.

05. Native method area.

Method area and heap area are created per JVM. So, each JVM only has one method area and heap area. stack, PC registers, and the native method area, on the other hand, are generated per thread.

Method area

Store all the class information such as method data, constructor details, etc.

Heap

Stores information about all the objects. So if a new object is created, after the creation that object is stored in the heap area.

Stack

Stack is responsible for storing frames Each frame contains method data (Ex: local variables of a method).

When a method is called, a new frame (called stack frame) is created and pushed to the stack. Then if the method returns as expected or if an uncaught exception is thrown, the frame is removed/popped.

Program Counter (PC) Registers

If the method that was executed was not a native method, the pc registers will store information about the method’s next execution. A native method is a Java method that is implemented in another programming language, such as C or C++.

Native method area

If any native methods are accessed or executed, the native method area provides facility to store them.

Execution Engine

Figure 7: Structure of execution engine (source: https://www.freecodecamp.org/)

As you see above execution engine consists of 03 components.

1. Interpreter.

2. Compiler.

3. Garbage collector.

Interpreter

The interpreter reads the bytecode line by line and converts it machine code. Even though an interpreter can execute one line of byte code faster, it is slow when it comes to executing the entire code.

Another disadvantage is, when the same method is called multiple times, a new interpretation is needed each time. So, JIT Compiler is used to overcome these mentioned disadvantages.

JIT Compiler (Just In Time compiler)

The interpreter is the first choice of execution engine to execute the byte code. However, if the execution engine finds out that a method is repeated, it will use the JIT compiler instead. As a result, the JIT compiler compiles and converts the entire bytecode into native machine code.

The JIT Compiler is made up of the following components:

1. Intermediate Code Generator.

2. Code Optimizer.

3. Target Code Generator.

4. Profiler.

Garbage Collector(GC)

In some programming languages, the developer should destroy objects as well as creating them (Ex: C, Fortran, Pascal).

The issue here is, if unused objects were not destroyed, the system could run out of memory. As a result, an OutOfMemoryError exception can cause the entire application to crash. But when you use Java, you don’t have to worry about that at all. Since Java GC takes care of it.

The garbage collector runs in the background and it can remove unreferenced objects and free the space in the heap area. Hence this is a complex procedure, we’ll go over the features of Java garbage collection algorithms in a future post.

When Java was first released in 1996, all computer programs were written for a specific operating system (platform dependent), and the developer was responsible for managing program memory. But with the use of Java, developers no longer had to worry about any of these issues. As a result, Java was revolutionary at the time.

So, I hope you learned something new from this article. I was encouraged to write this article by Krish (Krishantha Dinesh) who is my mentor. Also, I referred to his YouTube playlist to write this article (Which I would like to recommend to you as well).

JVM YouTube playlist by Krish

Below are the articles that I referred to for this article.

--

--