Exploring the Enhancement of Java Bytecode

Exploring the Enhancement of Java Bytecode

Welcome everyone to pay attention to github.com/hsfxuebao/j... I hope it will be helpful to everyone. If you think it is possible, please give me a little bit of Star.

1. Bytecode

1.1 What is bytecode?

The reason why Java can be "compile once, run everywhere" is that the JVM is customized for various operating systems and platforms, and the other is that no matter what platform it is, it can compile and generate fixed-format bytecode (. ) For JVM use. Therefore, we can also see the importance of bytecode to the Java ecosystem. It is called bytecode because the bytecode file consists of hexadecimal values, and the JVM takes two hexadecimal values as a group, that is, reads in bytes. In Java, the javac command is generally used to compile the source code into a bytecode file. An example of a .java file from compiling to running is shown in Figure 1.

Figure 1 Schematic diagram of Java operation

For developers, understanding bytecode can more accurately and intuitively understand the deeper things in the Java language. For example, through bytecode, you can intuitively see how the Volatile keyword takes effect on bytecode. In addition, the application of bytecode enhancement technology in Spring AOP, various ORM frameworks, and hot deployment is not uncommon. A deep understanding of its principles is of great benefit to us. In addition, due to the existence of the JVM specification, as long as the bytecode that conforms to the specification can be generated eventually, it can run on the JVM, so this gives a variety of languages running on the JVM (such as Scala, Groovy, Kotlin). This is an opportunity to extend features that Java does not have or implement various syntactic sugars. After you understand the bytecode, you can learn these languages "against the current". From the perspective of bytecode, you can look at its design ideas, and it's easy to learn.

This article focuses on bytecode enhancement technology, starting from bytecode and working up layer by layer, from the JVM bytecode operation collection to the framework for operating bytecode in Java, and then to the principles and applications of various frameworks we are familiar with. Introduce them one by one.

1.2 Bytecode structure

After the .java file is compiled by javac, a .class file will be obtained. For example, write a simple ByteCodeDemo class, as shown in the left part of Figure 2 below:

Figure 2 Sample code (left side) and corresponding bytecode (right side)

After compilation, the ByteCodeDemo.class file is generated. After opening, it is a bunch of hexadecimal numbers, which are divided by bytes and displayed as shown in the right part of Figure 2. As mentioned above, JVM has specifications for bytecode, so what structure does the seemingly messy hexadecimal system conform to? The JVM specification requires that every bytecode file is composed of ten parts in a fixed order, and the overall structure is shown in Figure 3. Next we will introduce these ten parts one by one:

Figure 3 Bytecode structure specified by JVM

(1) Magic Number

The first four bytes of all .class files are magic numbers, and the fixed value of magic numbers is 0xCAFEBABE. The magic number is placed at the beginning of the file. JVM can judge whether the file may be a .class file according to the beginning of the file. If it is, it will continue the subsequent operations.

What's interesting is that the fixed value of the magic number is made by James Gosling, the father of Java, and it is CafeBabe (coffee baby), and the icon of Java is a cup of coffee.

(2) Version number

The version number is the 4 bytes after the magic number, the first two bytes represent the minor version number (Minor Version), and the last two bytes represent the major version number (Major Version). In Figure 2 above, the version number is "00 00 00 34", the minor version number is converted to decimal 0, and the major version number is converted to decimal 52. In the Oracle official website, the major version number corresponding to the serial number 52 is 1.8, so compile this The Java version number of the file is 1.8.0.

(3) Constant Pool

The byte immediately after the major version number is the constant pool entry. Two types of constants are stored in the constant pool: literal and symbolic references. The literal is the constant value declared as Final in the code, and the symbolic references are the global qualified names of classes and interfaces, the names and descriptors of fields, and the names and descriptors of methods. The constant pool is divided into two parts as a whole: the constant pool counter and the constant pool data area, as shown in Figure 4 below.

Figure 4 The structure of the constant pool

  • Constant pool counter (constant_pool_count): Since the number of constants is not fixed, it is necessary to place two bytes first to represent the constant pool capacity count value. The first 10 bytes of the bytecode of the sample code in Figure 2 are shown in Figure 5 below. The hexadecimal 24 is converted to the decimal value 36, and the subscript "0" is excluded, that is to say, in this class file There are 35 constants in total.

Figure 5 The first ten bytes and their meaning

  • Constant pool data area: the data area is composed of (constant_pool_count-1) cp_info structures, and one cp_info structure corresponds to a constant. There are 14 types of cp_info in the bytecode (as shown in Figure 6 below), and the structure of each type is fixed.

Figure 6 Various types of cp_info

Take CONSTANT_utf8_info as an example. Its structure is shown on the left side of Figure 7 below. 1. there is a byte "tag", its value is taken from the Tag of the corresponding item in Figure 6 above. Since its type is utf8_info, the value is "01". The next two bytes identify the length of the string Length, and then Length bytes are the specific value of the string. Extract a cp_info structure from the bytecode in Figure 2, as shown on the right side of Figure 7 below. After it is translated, its meaning is: the constant type is a utf8 string, the length is one byte, and the data is "a".

Figure 7 The structure of CONSTANT_utf8_info (left) and example (right)

Other types of cp_info structure will not be repeated in this article. The overall structure is similar. 1. the tag is used to identify the type, and then the following n bytes are used to describe the length and/or data. Knowing the reason, you can use the javap -verbose ByteCodeDemo command to view the complete constant pool after JVM decompilation, as shown in Figure 8 below. You can see that the decompilation result clearly shows the type and value of each cp_info structure.

Figure 8 Constant pool decompilation result

(4) Visit sign

The two bytes after the end of the constant pool describe whether the Class is a class or an interface, and whether it is modified by Public, Abstract, Final and other modifiers. The JVM specification specifies the access flag (Access_Flag) shown in Figure 9 below. It should be noted that JVM does not exhaust all the access flags, but uses bitwise OR operations to describe. For example, if the modifier of a certain class is Public Final, the value of the corresponding access modifier is ACC_PUBLIC | ACC_FINAL , That is, 0x0001 | 0x0010=0x0011.

Figure 9 Access logo

(5) Current class name

The two bytes after the access flag describe the fully qualified name of the current class. The value stored in these two bytes is the index value in the constant pool, and the fully qualified name of this class can be found in the constant pool according to the index value.

(6) Parent class name

The two bytes after the current class name describe the fully qualified name of the parent class. Same as above, the index value in the constant pool is also stored.

(7) Interface information

After the parent class name is a two-byte interface counter, which describes the number of interfaces implemented by the class or parent class. The next n bytes are the index values of the string constants of all interface names.

(8) Field table

The field table is used to describe variables declared in classes and interfaces, including class-level variables and instance variables, but not local variables declared inside methods. The field table is also divided into two parts, the first part is two bytes, describing the number of fields; the second part is the detailed information fields_info of each field. The field table structure is shown in the figure below:

Figure 10 Field table structure

Take the bytecode field table in Figure 2 as an example, as shown in Figure 11 below. The access flag of the field is shown in Figure 9. 0002 corresponds to Private. Through the index subscript in the constant pool in Figure 8, the field name is "a" and the descriptor is "I" (representing int). In summary, the variable private int a declared in a class can be uniquely determined.

Figure 11 Field table example

(9) Method table

After the field table ends, the method table is composed of two parts. The first part is two bytes describing the number of methods; the second part is the detailed information of each method. The detailed information of the method is more complicated, including the method's access flag, method name, method descriptor, and method attributes, as shown in the following figure:

Figure 12 Method table structure

The permission modifier of the method can still be obtained through the value query in Figure 9. The method name and the method descriptor are both index values in the constant pool, which can be found in the constant pool by the index value. The part of "method attributes" is more complicated. It can be decompiled into human-readable information with the help of javap -verbose, as shown in Figure 13. You can see that the properties include the following three parts:

  • "Code area": The source code corresponds to the JVM instruction opcode. The key operation is the "Code area" when the bytecode is enhanced.

  • "LineNumberTable": The line number table, which corresponds to the opcode in the Code area and the line number in the source code, which will be useful when debugging (the source code goes one line, how many JVM instruction opcodes are needed).

  • "LocalVariableTable": Local variable table, including This and local variables. The reason why this can be called inside each method is because the JVM implicitly passes This as the first parameter of each method. Of course, this is for non-Static methods.

Figure 13 Method table after decompilation

(10) Additional attribute table

The last part of the bytecode, this item stores the basic information of the attributes defined by the class or interface in the file.

1.3 Bytecode operation set

In Figure 13 above, the red numbers 0-17 in the Code area are the opcodes that the JVM actually executes after the source code of the method in .java is compiled. In order to help people understand, what you see after decompilation is the mnemonic corresponding to the hexadecimal opcode, the corresponding relationship between the hexadecimal value opcode and the mnemonic, and the usefulness of each opcode can be viewed in Oracle To understand the official documents, you can refer to them when you need them. For example, the first mnemonic in the above figure is iconst_2, which corresponds to the bytecode in Figure 2 as 0x05, which is used to push the int value 2 into the operand stack. By analogy, after understanding the mnemonics from 0 to 17, it is the realization of the complete add() method.

1.4 Operand stack and bytecode

The JVM's instruction set is based on the stack instead of registers. Stack-based can have good cross-platform performance (because the register instruction set is often linked to hardware), but the disadvantage is that to complete the same operation, stack-based implementation requires more instructions It can be completed (because the stack is just a FILO structure, it needs to be pushed frequently from the stack). In addition, since the stack is implemented in memory and the register is in the cache area of the CPU, the stack-based speed is much slower in comparison, which is also a sacrifice for cross-platform performance.

The opcode or operation set we mentioned above actually controls the operand stack of this JVM. In order to more intuitively feel how the opcode controls the operand stack, and understand the role of the constant pool and variable table, the operation of the add() method on the operand stack is made as a GIF, as shown in Figure 14 below, only The referenced part in the constant pool is intercepted, starting with the instruction iconst_2 and ending with ireturn, which corresponds to the instructions in the Code area 0~17 in Figure 13 one-to-one:

Figure 14 Schematic diagram of control operand stack

1.5 View bytecode tool

If you use the javap command to view the decompiled bytecode every time, it is very cumbersome. An Idea plug-in is recommended here: jclasslib. The use effect is shown in Figure 15. After the code is compiled, select "Show Bytecode With jclasslib" in the menu bar "View", and you can intuitively see the class information, constant pool, method area and other information of the current bytecode file.

Figure 15 jclasslib view bytecode

2. Bytecode enhancement

In the above, we focused on the structure of bytecode, which laid the foundation for us to understand the implementation of bytecode enhancement technology. Bytecode enhancement technology is a type of technology that modifies existing bytecodes or dynamically generates new bytecode files. Next, we will start with an in-depth analysis of the implementation of the most direct manipulation of bytecode.

Figure 16 Bytecode enhancement technology

2.1 ASM

For the need to manually manipulate the bytecode, you can use ASM, which can directly generate the .class bytecode file, or dynamically modify the class behavior before the class is loaded into the JVM (as shown in Figure 17 below). The application scenarios of ASM include AOP (Cglib is based on ASM), hot deployment, and modification of classes in other jar packages. Of course, it is more troublesome to implement such low-level steps. Next, this article will introduce the two APIs of ASM, and use ASM to implement a rough AOP. But before that, in order to let everyone understand the ASM process more quickly, it is strongly recommended that readers first understand the visitor pattern. Simply put, the visitor mode is mainly used to modify or manipulate some data with a relatively stable data structure. Through the first chapter, we know that the structure of the bytecode file is fixed by the JVM, so it is very suitable to use the visitor mode to compare words The section code file is modified.

Figure 17 ASM modified bytecode

2.1.1 ASM API Core API

The ASM Core API can analyze the SAX method in the XML file by analogy, without reading the entire structure of this class, you can use the stream method to process the bytecode file. The advantage is that it saves memory, but programming is more difficult. However, for performance reasons, Core API is generally used for programming. There are several key classes in Core API:

  • ClassReader: used to read the compiled .class files.

  • ClassWriter: Used to rebuild the compiled class, such as modifying the class name, attributes, and methods, and can also generate a bytecode file of a new class.

  • Various Visitor classes: As mentioned above, CoreAPI processes the bytecode from top to bottom. There are different Visitors for different areas in the bytecode file, such as MethodVisitor for accessing methods and FieldVisitor for accessing class variables. , AnnotationVisitor for accessing annotations, etc. In order to achieve AOP, the key to use is MethodVisitor. Tree API

The ASM Tree API can analyze the DOM in the XML file by analogy, and read the structure of the entire class into the memory. The disadvantage is that it consumes a lot of memory, but the programming is relatively simple. TreeApi is different from CoreAPI. TreeAPI maps various areas of bytecode through various Node classes. By analogy with DOM nodes, this programming method can be well understood.

2.1.2 Directly use ASM to achieve AOP

Use ASM's CoreAPI to enhance the class. We don't get entangled in the professional terms of AOP such as slices and notifications here, but only add logic before and after the method is called, which is easy to understand and easy to understand. 1. define the Base class that needs to be enhanced: it contains only one process() method, and a line of "process" is output in the method. After enhancement, what we expect is to output "start" before the method is executed, and output "end" afterwards.

public class Base { public static void process(){ System.out.println("process"); } } Copy code

In order to use ASM to implement AOP, two classes need to be defined: one is the MyClassVisitor class, which is used to visit and modify the bytecode; the other is the Generator class, in which ClassReader and ClassWriter are defined. The logic is that classReader reads Take the bytecode, and then hand it to the MyClassVisitor class for processing. After the processing is completed, the ClassWriter writes the bytecode and replaces the old bytecode. The Generator class is relatively simple, let's take a look at its implementation, as shown below, and then focus on explaining the MyClassVisitor class.

public class Generator { public static void main(String[] args) throws Exception { //Read ClassReader classReader = new ClassReader("jvm/agent/asm/Base"); ClassWriter classWriter = new ClassWriter(ClassWriter.COMPUTE_MAXS); //deal with ClassVisitor classVisitor = new MyClassVisitor(classWriter); classReader.accept(classVisitor, ClassReader.SKIP_DEBUG); byte[] data = classWriter.toByteArray(); //Output File f = new File("target/classes/jvm/javaagent/asm/Base.class"); FileOutputStream fout = new FileOutputStream(f); fout.write(data); fout.close(); System.out.println("now generator cc success!!!!!"); } } Copy code

MyClassVisitor inherits from ClassVisitor and is used to observe the bytecode. It also contains an internal class MyMethodVisitor, inherited from MethodVisitor for observation of methods in the class, the overall code is as follows:

public class MyClassVisitor extends ClassVisitor implements Opcodes { public MyClassVisitor(ClassVisitor cv) { super(ASM5, cv); } @Override public void visit(int version, int access, String name, String signature, String superName, String[] interfaces) { cv.visit(version, access, name, signature, superName, interfaces); } @Override public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) { MethodVisitor mv = cv.visitMethod(access, name, desc, signature, exceptions); //There are two methods in the Base class: no-parameter construction and process method. The construction method is not enhanced here if (!name.equals("<init>") && mv != null) { mv = new MyMethodVisitor(mv); } return mv; } class MyMethodVisitor extends MethodVisitor implements Opcodes { public MyMethodVisitor(MethodVisitor mv) { super(Opcodes.ASM5, mv); } @Override public void visitCode() { super.visitCode(); mv.visitFieldInsn(GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;"); mv.visitLdcInsn("start"); mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V", false); } @Override public void visitInsn(int opcode) { if ((opcode >= Opcodes.IRETURN && opcode <= Opcodes.RETURN) || opcode == Opcodes.ATHROW) { //The method prints "end" before returning mv.visitFieldInsn(GETSTATIC, "java/lang/System", "out", "Ljava/io/PrintStream;"); mv.visitLdcInsn("end"); mv.visitMethodInsn(INVOKEVIRTUAL, "java/io/PrintStream", "println", "(Ljava/lang/String;)V", false); } mv.visitInsn(opcode); } } } Copy code

Use this class to modify the bytecode. To interpret the code in detail, the steps to modify the bytecode are:

  • 1. use the visitMethod method in the MyClassVisitor class to determine which method the current bytecode has read. After skipping the construction method "", the method that needs to be enhanced is handed over to the inner class MyMethodVisitor for processing.

  • Next, enter the visitCode method in the internal class MyMethodVisitor, it will be called when ASM starts to access the Code area of a certain method, override the visitCode method, and put the pre-logic in AOP here.

  • MyMethodVisitor continues to read bytecode instructions. Whenever ASM accesses a parameterless instruction, it will call the visitInsn method in MyMethodVisitor. We judged whether the current instruction is a "return" instruction without parameters, and if it is, we add some instructions in front of it, that is, put the post logic of AOP in the method.

  • In summary, by rewriting the two methods in MyMethodVisitor, AOP can be achieved, and when rewriting the method, you need to use ASM to write or modify the bytecode manually. The bytecode can be inserted by calling methodVisitor's visitXXXXInsn() method, XXXX corresponds to the corresponding opcode mnemonic type, for example, the opcode corresponding to mv.visitLdcInsn("end") is ldc "end", which is a string "End" is pushed onto the stack.

After completing the two Visitor classes, run the main method in the Generator to complete the bytecode enhancement of the Base class. The result of the enhancement can be found in the compiled Target folder to view the Base.class file, and you can see the decompilation The following code has been changed (as shown on the left side of Figure 18). Then write a test class MyTest, in which new Base(), and call the base.process() method, you can see the AOP implementation effect shown on the right side of the figure below:

The results of the operation are:

Figure 18 ASM achieves the effect of AOP

2.1.3 ASM tool

When using ASM to write bytecode, you need to use a series of visitXXXXInsn() methods to write the corresponding mnemonics, so you need to first convert each line of source code into a mnemonic, and then convert it to visitXXXXInsn( ) This way of writing. The first step is to convert the source code into mnemonics. If you are not familiar with the bytecode operation set, we need to compile the code and then decompile it to get the mnemonic corresponding to the source code. When using ASM to write bytecode in the second step, how to pass parameters is also a headache. The ASM community is also aware of these two issues, so it provides the tool ASM ByteCode Outline.

After installation, right-click and select "Show Bytecode Outline", select the "ASMified" tab in the new tab page, as shown in Figure 19, you can see the corresponding ASM writing of the code in this class. The upper and lower red boxes in the figure correspond to the pre-logic and post-logic in the AOP respectively. Just copy these two blocks directly to the visitMethod() and visitInsn() methods in the Visitor.

Figure 19 ASM Bytecode Outline

2.2 Javassist

ASM operates bytecode at the instruction level. After reading the above, our intuitive feeling is that the framework for operating bytecode at the instruction level is relatively obscure. Therefore, in addition, we will briefly introduce another type of framework: Javassist, a framework that emphasizes source code level manipulation of bytecode.

When using Javassist to implement bytecode enhancement, you don't need to pay attention to the rigid structure of bytecode, and its advantage lies in simple programming. Directly use the form of Java coding, without knowing the virtual machine instructions, you can dynamically change the structure of the class or dynamically generate the class. The most important of these are the four classes of ClassPool, CtClass, CtMethod, and CtField:

  • CtClass (compile-time class): Compile-time class information, which is an abstract representation of a Class file in the code. A CtClass object can be obtained through the fully qualified name of a class to represent this class file.

  • ClassPool: From a development perspective, ClassPool is a HashTable that saves CtClass information, Key is the class name, and Value is the CtClass object corresponding to the class name. When we need to modify a class, we get the corresponding CtClass from the pool through the pool.getCtClass("className") method.

  • CtMethod, CtField: These two are easier to understand, corresponding to the methods and attributes in the class.

After understanding these four classes, we can write a small Demo to show the simple and fast features of Javassist. We are still enhancing the process() method in Base, outputting "start" and "end" before and after the method call. The implementation code is as follows. What we need to do is to get the corresponding CtClass object and the methods in it from the Pool, and then execute the method.insertBefore and insertAfter methods. The parameter is the Java code to be inserted, and then it can be passed in as a string. It's extremely simple.

First introduce the jar package:

<dependency> <groupId>org.javassist</groupId> <artifactId>javassist</artifactId> <version>3.25.0-GA</version> </dependency> public class JavassistTest { public static void main(String[] args) throws NotFoundException, CannotCompileException, IllegalAccessException, InstantiationException, IOException { new Base().process(); ClassPool cp = ClassPool.getDefault(); CtClass cc = cp.get("jvm.agent.asm.Base"); CtMethod m = cc.getDeclaredMethod("process"); m.insertBefore("{ System.out.println(\"start\"); }"); m.insertAfter("{ System.out.println(\"end\"); }"); Class c = cc.toClass(); cc.writeFile("/Users/haoshaofei/Desktop"); Base h = (Base)c.newInstance(); h.process(); } } Copy code

3. Overloading of runtime classes

3.1 The problem leads to

The last chapter focused on two different types of bytecode manipulation frameworks, and both used them to implement rough AOP. In fact, in order to make it easier for everyone to understand the bytecode enhancement technology, in the above we have avoided the important point and divided the process of ASM implementation of AOP into two Main methods: the first is to use MyClassVisitor to modify the compiled Class file, and the second is New object and call. This period does not involve the reloading of the class during the JVM runtime, but in the first Main method, the bytecode of the compiled class is replaced by ASM, and in the second Main method, the replaced Good new class information. In addition, in the implementation of Javassist, we only load the Base class once, and it does not involve reloading the class at runtime.

What happens if we first load a class in a JVM, then perform bytecode enhancement and reload it? To simulate this situation, we only need to add Base b=new Base() to the first line of the main() method in the above Javassist Demo, that is, let the JVM load the Base class before the enhancement, and then execute to c. The toClass() method will throw an error, as shown in Figure 20 below. Following the c.toClass() method, we will find that it reports an error when the native method defineClass() of the ClassLoader is called at the end. In other words, JVM is not allowed to dynamically reload a class at runtime.

Figure 20 Error message of repeated load class at runtime

Obviously, if the class can only be enhanced before the class is loaded, the usage scenarios of bytecode enhancement technology become very narrow. The desired effect is that in a JVM that is continuously running and has loaded all classes, the bytecode enhancement technology can also be used to replace and reload the class behavior. In order to simulate this situation, we rewrite the Base class, write the main method in it, call the process() method every five seconds, and output a line of "process" in the process() method.

Our purpose is to replace the process() method when the JVM is running, and print "start" and "end" before and after it. That is, when it is running, the content printed every five seconds changes from "process" to "start process end". So how to solve the problem that the JVM does not allow reloading of class information at runtime? In order to achieve this goal, we will introduce the Java class libraries that need to be used one by one.

import java.lang.management.ManagementFactory; public class Base { public static void main(String[] args) { String name = ManagementFactory.getRuntimeMXBean().getName(); String s = name.split("@")[0]; //Print the current Pid System.out.println("pid:"+s); while (true) { try { Thread.sleep(5000L); } catch (Exception e) { break; } process(); } } public static void process() { System.out.println("process"); } } Copy code

3.2 Instrument

Instrument is a class library provided by the JVM that can modify the loaded classes, and specifically provides support for instrumentation services written in the Java language. It needs to rely on the implementation of the Attach API mechanism of JVMTI. We will introduce this part of JVMTI in the next section. Before JDK 1.6, Instrument can only take effect when the JVM just starts to load classes. After JDK 1.6, Instrument supports the modification of class definitions at runtime. To use the class modification function of Instrument, we need to implement the ClassFileTransformer interface it provides and define a class file converter. The transform() method in the interface will be called when the class file is loaded, and in the Transform method, we can use ASM or Javassist above to rewrite or replace the incoming bytecode to generate a new bytecode array After returning.

We define a class TestTransformer that implements the ClassFileTransformer interface, and still use Javassist to enhance the process() method in the Base class, and print "start" and "end" before and after the code. The code is as follows:

public class TestTransformer implements ClassFileTransformer { @Override public byte[] transform(ClassLoader loader, String className, Class<?> classBeingRedefined, ProtectionDomain protectionDomain, byte[] classfileBuffer) { System.out.println("Transforming "+ className); try { ClassPool cp = ClassPool.getDefault(); CtClass cc = cp.get("jvm.agent.asm.Base"); CtMethod m = cc.getDeclaredMethod("process"); m.insertBefore("{ System.out.println(\"start\"); }"); m.insertAfter("{ System.out.println(\"end\"); }"); return cc.toBytecode(); } catch (Exception e) { e.printStackTrace(); } return null; } } Copy code

Now that there is a Transformer, how can it be injected into the running JVM? It is also necessary to define an Agent to inject the Instrument into the JVM with the help of the Agent's ability. We will introduce Agent in the next section, and now we will introduce another class Instrumentation used in Agent. After JDK 1.6, Instrumentation can do the instrument after startup, the instrument of native code (Native Code), and dynamically change the Classpath, etc. We can add the Transformer defined above to Instrumentation and specify the class to be reloaded. The code is shown below. In this way, when the Agent is attached to a JVM, it will perform the operation of replacing the class bytecode and reloading the JVM.

public class TestAgent { public static void agentmain(String args, Instrumentation inst) { //Specify our own Transformer, in which we use Javassist to do bytecode replacement inst.addTransformer(new TestTransformer(), true); try { //Redefine the class and load the new bytecode inst.retransformClasses(Base.class); System.out.println("Agent Load Done."); } catch (Exception e) { System.out.println("agent load failed!"); } } public static void main(String[] args) { } } Copy code

3.3 JVMTI & Agent & Attach API

In the previous section, we gave the code of the Agent class. To trace the source, we need to introduce JPDA (Java Platform Debugger Architecture) first. If JPDA is turned on when the JVM starts, the class is allowed to be reloaded. In this case, the old version of the class information that has been loaded can be unloaded, and then the new version of the class can be reloaded. Just like the Debugger in the JDPA name, JDPA is actually a set of standards for debugging Java programs, and any JDK must implement this standard.

JPDA defines a complete set of system, it divides the debugging system into three parts, and stipulates the communication interface between the three. The three parts from low to high are Java Virtual Machine Tool Interface (JVMTI), Java Debugging Protocol (JDWP) and Java Debugging Interface (JDI). The relationship between the three is shown in the following figure:

Figure 21 JPDA

Now back to the topic, we can use some of the capabilities of JVMTI to help dynamically reload class information. JVM TI (JVM TOOL INTERFACE, JVM tool interface) is a set of tool interfaces provided by JVM to operate JVM. A variety of operations on the JVM can be achieved through JVMTI, and then various event hooks can be registered through the interface. When the JVM event is triggered, the predefined hooks are triggered at the same time to realize the response to each JVM event. The events include class file loading, exception generation and capture, thread start and end, entry and exit of critical area, member variable modification, GC Start and end, method call entry and exit, critical section competition and waiting, VM start and exit, etc.

The Agent is an implementation of JVMTI. There are two ways to start the Agent. One is to start when the Java process is started. The commonly seen java -agentlib is this way; the other is to load the module at runtime, through the Attach API (Jar package) Dynamically attach to the Java process with the specified process id.

The role of the Attach API is to provide the ability to communicate between JVM processes. For example, in order to allow another JVM process to dump the thread of the online service, we will run the jstack or jmap process and pass the pid parameter to tell it which The process performs thread dump, which is what the Attach API does. In the following, we will dynamically attach the packaged Agent jar to the target JVM through the loadAgent() method of the Attach API. The specific steps are as follows:

  • Define the Agent and implement the AgentMain method in it, such as the TestAgent class in code block 7 defined in the previous section;

  • Then type the TestAgent class into a jar package containing MANIFEST.MF, where the Agent-Class attribute is specified as the fully qualified name of TestAgent in the MANIFEST.MF file, as shown in the following figure;

    Manifest-Version: 1.0 Agent-Class: jvm.agent.asm.TestAgent Created-By: hsfxuebao Can-Redefine-Classes: true Can-Retransform-Classes: true Boot-Class-Path: javassist-3.25.0-GA.jar Main-Class: jvm.agent.asm.TestAgent

Figure 22 Manifest.mf

  • Finally, use the Attach API to attach our packaged jar package to the specified JVM pid. The code is as follows:

    public class Attacher {

    public static void main(String[] args) throws AttachNotSupportedException, IOException, AgentLoadException, AgentInitializationException { //Pass in the target JVM pid VirtualMachine vm = VirtualMachine.attach("61576"); vm.loadAgent("/Users/hsfxuebao/IdeaProjects/java-study/out/testagent/testagent.jar"); } Copy code


  • Because the Agent-Class is specified in MANIFEST.MF, after Attach, the target JVM will go to the agentmain() method defined in the TestAgent class at runtime. In this method, we use Instrumentation to specify the word of the class The section code uses the defined class converter TestTransformer to replace the bytecode of the Base class (via javassist), and completes the reload of the class. As a result, we have achieved the goal of "change the bytecode of the class and reload the class information while the JVM is running."

The following is the effect of reloading the class at runtime: first run the main() method in Base, start a JVM, and you can see the output of "process" every five seconds in the console. Then execute the main() method in Attacher and pass in the pid of the previous JVM. Now go back to the console of the previous main() method, you can see that "start" and "end" will be output before and after "process" is output every five seconds, which means that the runtime bytecode enhancement is completed. , And reloaded this class.

Figure 23 The effect of reloading the class at runtime

3.4 Usage scenarios

At this point, the applicable scope of bytecode enhancement technology is no longer limited to the JVM loading class. Through the above several class libraries, we can modify and reload the classes in the JVM at runtime. By this means, there are many things that can be done:

  • Hot deployment: Modify online services without deploying services. You can do operations such as managing and adding logs.

  • Mock: Mock certain services during testing.

  • Performance diagnostic tools: For example, bTrace uses Instrument to track a running JVM non-invasively, and monitor class and method-level status information.

4. Summary

Bytecode enhancement technology is equivalent to a key to open the runtime JVM, which can be used to dynamically modify the running program, and can also track the status of the JVM running program. In addition, the dynamic proxy and AOP that we usually use are also closely related to bytecode enhancement. In essence, they use various means to generate bytecode files that meet the specifications. In summary, after mastering the bytecode enhancement, you can efficiently locate and quickly fix some thorny problems (such as online performance problems, problems with uncontrollable access parameters in the method, urgent addition of logs, etc.), and can also be reduced in development Redundant code greatly improves development efficiency.

5. References

Transfer from: Exploring the Java Bytecode Enhancement


Oracle: The class File Format

Oracle: The Java Virtual Machine Instruction Set

Javassist tutorial

JVM Tool Interface-Version 1.2