TeachToJava: 2024

Saturday, November 9, 2024

JDK, JRE, JVM, JVM Architecture

JDK (Java Development Kit) is a Kit that provides the environment to develop and execute(run) the Java program. JDK is a kit(or package) that includes two things

Development Tools (to provide an environment to develop your java programs)
JRE (to execute your java program).

JRE (Java Runtime Environment) is an installation package that provides an environment to only run(not develop) the java program(or application)onto your machine. JRE is only used by those who only want to run Java programs that are end-users of your system.

JVM (Java Virtual Machine) is a very important part of both JDK and JRE because it is contained or inbuilt in both. Whatever Java program you run using JRE or JDK goes into JVM and JVM is responsible for executing the java program line by line, hence it is also known as an interpreter.

JVM is responsible for converting the byte code to machine code.
JVM takes (.class) files and executes it by managing the memory
JVM loads, verifies and executes the code and provides the runtime environment
JVM plays a major role in java memory management

JVM Architecture

1) Classloader

Classloader is a subsystem of JVM which is used to load class files. Whenever we run the java program, it is loaded first by the classloader.

The classLoading mechanism consists of three main steps as follows.

Loading
Linking
Initialization

Loading

Whenever JVM loads a file, it will load and read,

Fully qualified class name
Variable information (instance variables)
Immediate parent information
Whether class or interface or enum

There are three built-in classloaders in Java.

Bootstrap ClassLoader: This is the first classloader which is the super class of Extension classloader. It loads the rt.jar file which contains all class files of Java Standard Edition like java.lang package classes, java.net package classes, java.util package classes, java.io package classes, java.sql package classes etc.
Extension ClassLoader: This is the child classloader of Bootstrap and parent classloader of System classloader. It loades the jar files located inside $JAVA_HOME/jre/lib/ext directory.
System/Application ClassLoader: This is the child classloader of Extension classloader. It loads the classfiles from classpath. By default, classpath is set to current directory. You can change the classpath using "-cp" or "-classpath" switch. It is also known as Application classloader.

Linking

This is the process of linking the data in the class file into the memory area. It begins with verification to ensure this class file and the compiler.

Verification: This phase checks the structural correctness of the .class file by checking it against a set of constraints or rules. If verification fails for some reason, we get a VerifyException.

For example, if the code has been built using Java 11, but is being run on a system that has Java 8 installed, the verification phase will fail.

Prepare – For all static variables memory will be allocated and assigned with default values.

Resolve – All symbolic memory references are replaced with the original references from Method Area.

Initialization

This is the final phase of ClassLoading; here, all static variables will be assigned with the original values, and the static block will be executed

JVM Memory Areas

Method area: All the class level data such as the run-time constant pool, field, and method data, and the code for methods and constructors, are stored here.

If the memory available in the method area is not sufficient for the program startup, the JVM throws an OutOfMemoryError.

For example, assume that you have the following class definition:

public class Employee {

private String name;

private int age;

public Employee(String name, int age) {

this.name = name;

this.age = age;

}

In this code example, the field level data such as name and age and the constructor details are loaded into the method area.

The method area is created on the virtual machine start-up, and there is only one method area per JVM.

Heap area: Information of all objects is stored in the heap area. There is also one Heap Area per JVM. It is also a shared resource.

For example assume that you are declaring the following instance:

Employee employee = new Employee();

In this code example, an instance of Employee is created and loaded into the heap area.

The heap is created on the virtual machine start-up, and there is only one heap area per JVM.

Stack area: For every thread, JVM creates one run-time stack which is stored here. After a thread terminates, its run-time stack will be destroyed by JVM. It is not a shared resource.

PC Registers: The JVM supports multiple threads at the same time. Each thread has its own PC Register to hold the address of the currently executing JVM instruction. Once the instruction is executed, the PC register is updated with the next instruction.

Native method stacks: The JVM contains stacks that support native methods. These methods are written in a language other than the Java, such as C and C++. For every new thread, a separate native method stack is also allocated

Execution Engine

Execution engine executes the “.class” (bytecode). It reads the byte-code line by line, uses data and information present in various memory area and executes instructions. It can be classified into three parts:

Interpreter: It interprets the bytecode line by line and then executes. The disadvantage here is that when one method is called multiple times, every time interpretation is required.

Just-In-Time Compiler(JIT) : It is used to increase the efficiency of an interpreter. It compiles the entire bytecode and changes it to native code so whenever the interpreter sees repeated method calls, JIT provides direct native code for that part so re-interpretation is not required, thus efficiency is improved.

Garbage Collector: The Garbage Collector (GC) collects and removes unreferenced objects from the heap area. It is the process of reclaiming the runtime unused memory automatically by destroying them.’

Garbage collection makes Java memory efficient because it removes the unreferenced objects from heap memory and makes free space for new objects. It involves two phases:

Mark - in this step, the GC identifies the unused objects in memory
Sweep - in this step, the GC removes the objects identified during the previous phase

Garbage Collections is done automatically by the JVM at regular intervals and does not need to be handled separately. It can also be triggered by calling System.gc(), but the execution is not guaranteed.

The JVM contains 3 different types of garbage collectors:

Serial GC - This is the simplest implementation of GC, and is designed for small applications running on single-threaded environments. It uses a single thread for garbage collection. When it runs, it leads to a "stop the world" event where the entire application is paused. The JVM argument to use Serial Garbage Collector is -XX:+UseSerialGC
Parallel GC - This is the default implementation of GC in the JVM, and is also known as Throughput Collector. It uses multiple threads for garbage collection, but still pauses the application when running. The JVM argument to use Parallel Garbage Collector is -XX:+UseParallelGC.
Garbage First (G1) GC - G1GC was designed for multi-threaded applications that have a large heap size available (more than 4GB). It partitions the heap into a set of equal size regions, and uses multiple threads to scan them. G1GC identifies the regions with the most garbage and performs garbage collection on that region first. The JVM argument to use G1 Garbage Collector is -XX:+UseG1GC

Note: There is another type of garbage collector called Concurrent Mark Sweep (CMS) GC. However, it has been deprecated since Java 9 and completely removed in Java 14 in favour of G1GC.

Java Native Interface (JNI)

At times, it is necessary to use native (non-Java) code (for example, C/C++). This can be in cases where we need to interact with hardware, or to overcome the memory management and performance constraints in Java. Java supports the execution of native code via the Java Native Interface (JNI).

JNI acts as a bridge for permitting the supporting packages for other programming languages such as C, C++, and so on. This is especially helpful in cases where you need to write code that is not entirely supported by Java, like some platform specific features that can only be written in C

Native Method Libraries: Native Method Libraries are libraries that are written in other programming languages, such as C, C++, and assembly. These libraries are usually present in the form of .dll or .so files. These native libraries can be loaded through JNI.

#Common JVM Errors

ClassNotFoundExcecption - This occurs when the Class Loader is trying to load classes using Class.forName(), ClassLoader.loadClass() or ClassLoader.findSystemClass() but no definition for the class with the specified name is found.
NoClassDefFoundError - This occurs when a compiler has successfully compiled the class, but the Class Loader is not able to locate the class file at the runtime.
OutOfMemoryError - This occurs when the JVM cannot allocate an object because it is out of memory, and no more memory could be made available by the garbage collector.
StackOverflowError - This occurs if the JVM runs out of space while creating new stack frames while processing a thread.

Java introduction, Types of java applications, java platforms

Java:

Java was developed by Sun Microsystems (which is now the subsidiary of Oracle) in the year 1995. James Gosling is known as the father of Java. Before Java, its name was Oak. Since Oak was already a registered company, so James Gosling and his team changed the name from Oak to Java.

Platform: Any hardware or software environment in which a program runs, is known as a platform. Since Java has a runtime environment (JRE) and API, it is called a platform.

Types of java applications:

Type	Description
Standalone Applications	Single-user applications with GUI (e.g., Swing, JavaFX), ex: media players, editors etc..
Web Applications	Server-side applications running in browsers (e.g., JSP, Servlets)
Enterprise Applications	Large-scale applications for businesses (e.g., Java Enterprise edition j2EE, Spring Boot, Ejb, struts, jsf)
Mobile Applications	Applications running on mobile devices (e.g., Android apps)

Java Platforms

Platform	Purpose	Key Features	Typical Use Cases
Java SE	General-purpose platform for desktop and standalone applications	Core libraries, GUI (Swing/JavaFX), networking, I/O, etc.	Desktop applications, utilities, command-line tools
J2EE	Enterprise applications, scalable web apps, and services	Web services, EJB, JPA, JMS, and enterprise frameworks	Large-scale business systems (CRM, ERP), web services
Java ME	Mobile and embedded applications on resource-constrained devices	CLDC, MIDP for mobile, small embedded systems	Mobile apps (feature phones), IoT, smart devices
JavaFX	Rich client applications with advanced UIs	Scene graph, animations, rich graphics, declarative UI with FXML	Rich desktop apps, interactive tools, media players

Java Topics

1. Java introduction, Types of java applications, java platforms

2. JDK, JRE, JVM, JVM Architecture

3. JVM Architecture

4. OOPS concepts

5. String concepts

6. Exception handling

7. Collections

8. Java 1.8 features, 11, 17, 21

9. File operations

10. Thread

11. Garbage collection

Sunday, September 15, 2024

Spring JPA interview questions

How can you create custom queries in Spring Data JPA?

Custom queries can be created using:

1. Query Methods: You can define query methods directly in your repository interface. Spring Data JPA automatically generates the queries based on the method names.

public interface UserRepository extends JpaRepository<User, Long> {

List<User> findByName(String name);

List<User> findByEmailContaining(String emailFragment);

}

2. Query Annotation

Use the @Query annotation to define custom JPQL (Java Persistence Query Language) or SQL queries.

public interface UserRepository extends JpaRepository<User, Long> {

@Query("SELECT u FROM User u WHERE u.name = :name")

List<User> findByName(@Param("name") String name);

@Query(value = "SELECT * FROM users WHERE email = :email", nativeQuery = true)

User findByEmail(@Param("email") String email);

}

3. JPQL (Java Persistence Query Language): Use @Query annotation with JPQL for complex queries.

4. Named Queries

Named queries are predefined JPQL queries that you define using @NamedQuery or @NamedQueries in your entity class.

@Entity

@NamedQuery(name = "User.findByEmail", query = "SELECT u FROM User u WHERE u.email = :email")

public class User {

@Id

private Long id;

private String name;

private String email;

// Getters and setters

}

public interface UserRepository extends JpaRepository<User, Long> {

@Query(name = "User.findByEmail")

User findByEmail(@Param("email") String email);

}

What are the different fetching strategies in JPA?

The fetching strategies are:

EAGER: The related entities are fetched immediately with the parent entity.

LAZY: The related entities are fetched only when accessed, i.e., on demand.

How to handle exceptions in Spring boot?

1. Controller Advice with @ExceptionHandler

Using @ControllerAdvice in combination with @ExceptionHandler allows you to handle exceptions globally across your application or for specific controllers.

@ControllerAdvice

public class GlobalExceptionHandler {

@ExceptionHandler(ResourceNotFoundException.class)

@ResponseStatus(HttpStatus.NOT_FOUND)

public ResponseEntity<String> handleResourceNotFound(ResourceNotFoundException ex) {

return new ResponseEntity<>(ex.getMessage(), HttpStatus.NOT_FOUND);

}

@ExceptionHandler(Exception.class)

@ResponseStatus(HttpStatus.INTERNAL_SERVER_ERROR)

public ResponseEntity<String> handleGeneralException(Exception ex) {

return new ResponseEntity<>(ex.getMessage(), HttpStatus.INTERNAL_SERVER_ERROR);

}

@ControllerAdvice: Designates the class as a global exception handler.

@ExceptionHandler: Specifies the type of exception to handle.

@ResponseStatus: Sets the HTTP status code.

how to close database connections/file connections with out using close() method in java

1. Try-With-Resources Statement

The try-with-resources statement, introduced in Java 7, is the preferred method to ensure that resources are closed automatically. Classes that implement the AutoCloseable interface (which includes Connection, Statement, ResultSet, FileInputStream, etc.) can be used with this statement.

try (Connection connection = DriverManager.getConnection(url, user, password);

Statement statement = connection.createStatement();

ResultSet resultSet = statement.executeQuery(query)) {

// Use the connection, statement, and resultSet

} catch (SQLException e) {

e.printStackTrace();

}

In this example, the Connection, Statement, and ResultSet are automatically closed at the end of the try block.

Wednesday, August 21, 2024

Apache Kafka Interview Questions

What is Apache Kafka and what are its core components?

Apache Kafka is a distributed streaming platform used for building real-time data pipelines and streaming applications. Its core components are:

Broker: Manages and stores messages.
Producer: Sends messages to topics.
Consumer: Reads messages from topics.
Topic: A category or feed name to which messages are sent.
Partition: A topic is divided into partitions, which allow for parallelism and scalability.
Zookeeper: Manages and coordinates Kafka brokers.

Explain the difference between a Kafka topic and a Kafka partition.

A Kafka topic is a logical channel to which records are sent, while a partition is a physical storage unit within a topic. Topics can have multiple partitions, which allow Kafka to handle large volumes of data and provide parallel processing and redundancy.

How does Kafka ensure data durability and fault tolerance?

Kafka ensures data durability and fault tolerance through:

Replication: Each partition is replicated across multiple brokers. The replication factor determines the number of replicas.
Acknowledgements: Producers can configure acknowledgment settings (acks) to ensure that data is written to multiple replicas before considering it successfully written.
Log Retention: Data is stored on disk and retained based on configurable policies (e.g., time or size-based retention).

Describe the role of ZooKeeper in a Kafka cluster.

ZooKeeper is used by Kafka for managing and coordinating brokers. It handles:

Leader Election: Elects the leader for each partition.
Metadata Management: Keeps track of cluster metadata, such as broker information and topic/partition configurations.
Configuration Management: Manages broker configurations and cluster state.

What is a consumer group and how does it work in Kafka?

A consumer group is a group of consumers that work together to consume messages from a topic. Each consumer in the group processes a subset of the partitions. Kafka ensures that each partition is consumed by only one consumer in the group at a time. This allows for load balancing and parallel processing of messages.

How would you configure Kafka to handle high-throughput data streams?

To handle high-throughput data streams, you can:

Increase the number of partitions: Distribute data and load across more partitions for better parallelism.
Tune producer settings: Adjust batch size, linger time, and compression settings.
Optimize consumer settings: Configure parallel consumers and use efficient deserialization methods.
Scale brokers: Add more brokers to handle increased load.

What strategies can be used to troubleshoot Kafka performance issues?

Strategies to troubleshoot Kafka performance issues include:

Monitoring Metrics: Use tools like JMX, Grafana, and Prometheus to monitor broker, producer, and consumer metrics.
Analyzing Logs: Check Kafka logs for errors or warnings.
Reviewing Configuration: Ensure proper configurations for memory, disk I/O, and network settings.
Testing Latency and Throughput: Use tools like Kafka's performance testing tool (kafka-producer-perf-test and kafka-consumer-perf-test) to benchmark performance.

How can you manage schema evolution in Kafka?

Schema evolution in Kafka can be managed using:

Schema Registry: A centralized repository for schemas. It supports schema versioning and validation.
Compatibility Modes: Define compatibility rules (e.g., backward, forward, full) to manage schema changes.
Avro or Protobuf: Use Avro or Protobuf for schema management, which integrates with Kafka's Schema Registry.

What is Kafka Streams and how does it differ from traditional stream processing frameworks?

Kafka Streams is a library for building real-time applications that process data streams within a Kafka ecosystem. It differs from traditional stream processing frameworks in that:

Integration with Kafka: It is tightly integrated with Kafka and uses Kafka topics for input and output.
Ease of Use: Provides a high-level DSL for defining stream processing logic.
Stateful Processing: Supports stateful operations with local state stores.

Explain the concept of exactly-once semantics in Kafka. How is it achieved?

Exactly-once semantics (EOS) ensure that records are neither lost nor processed more than once. It is achieved through:

Idempotent Producers: Ensures that duplicate messages are not written to a topic.
Transactional Messaging: Producers and consumers can use transactions to ensure that records are processed exactly once. This involves using Kafka’s transaction APIs to commit or abort transactions atomically.

How does Kafka handle security?

Kafka provides security features including:

Authentication: Supports SASL (Simple Authentication and Security Layer) for authenticating clients.
Authorization: Uses ACLs (Access Control Lists) to manage permissions for topics and other resources.
Encryption: Supports encryption of data in transit using SSL/TLS and encryption at rest through disk encryption.

How would you design a Kafka-based data pipeline for a real-time analytics application?

Designing a Kafka-based data pipeline involves:

Data Ingestion: Use producers to send data to Kafka topics from various sources.
Stream Processing: Implement stream processing using Kafka Streams or another processing framework to analyze and transform data.
Data Storage: Store processed data in data stores or data lakes.
Data Visualization: Feed processed data to visualization tools or dashboards for real-time insights.

What are some common challenges you might face when deploying Kafka in a production environment, and how would you address them?

Common challenges include:

Data Loss: Ensure proper replication and backup strategies.
Performance Bottlenecks: Monitor and optimize configurations, and scale brokers and partitions as needed.
Broker Failures: Implement and test failover strategies and monitor for broker health.