What is Spring Batch?
Spring Batch is an open-source framework for batch processing
– execution of a series of jobs. Spring Batch provides classes and APIs to
read/write resources, transaction management, job processing statistics, job
restart, and partitioning techniques to process high volume data.
What are features of Spring Batch?
The features of Spring Batch are as follows -
·
Transaction management
·
Chunk based processing
·
Declarative I/O
·
Start/Stop/Restart
·
Retry/Skip
·
Web based administration interface
What are disadvantages of Spring Batch?
The disadvantages of Spring Batch are as
follows-
·
Spring Batch code is complex. If one does not understand the
framework then it can be difficult to understand the flow.
·
The performance will not be good if not preperly implemented
·
Exception handling can be complex.
·
Logs for Spring Batch may not reflect/return the exception or
issue we are looking for.
What are use cases of Spring Batch?
·
We can automate complex logic and efficiently process data
without user interaction. We can run these jobs daily
·
Bulk data can be processed efficiently
·
Data transformation and other operations can be performed in a
transactional operation.
What is Spring Batch Admin ?
Spring Batch Admin provides a web-based user
interface (UI) that allows you to manage Spring Batch jobs. Spring Cloud Data Flow is now the recommended replacement for managing and
monitoring Spring Batch jobs.
Job is the batch process to be executed in a Spring Batch application
without interruption from start to finish. This Job is further broken down into
steps.
A Job is made up of many steps and each step is a READ-PROCESS-WRITE task or a
single operation task (tasklet).
What
is Step in job?
Spring Batch Step is an independent part of a job. As per above Spring Batch
structure diagram, each Step consist of an ItemReader, ItemProcessor (optional)
and an ItemWriter. Note: A Job can have one or more steps.
What
is ItemReader?
An
ItemReader reads data into a Spring Batch application from a particular source.
What
is ItemWriter?
ItemWriter
writes data from the Spring Batch application to a particular destination.
What
is ItemProcessor?
After reading the input data using itemReader, ItemProcessor applies
business logic on that input data and then writes to the file / database by
using itemWriter.
What
is Job Repository in Spring Batch?
Job Repository is used to persist all meta-data related to the execution of
the Job.
How
to configure the job in Spring Batch?
There are many ways we can configure the Sprin Batch Job. We are using here
builder abstract way to call the jobs. Job needs JobRepository to configure the
job. If you see below Job has three steps to load the notes, to load the task
and to process those tasks.
@Bean
public
Job employeeJob() {
return this.jobBuilderFactory.get("notesJob")
.start(LoadNotes())
.next(LoadTasks())
.next(processTasks())
.end()
.build();
}
What
is batch processing architecture?
Job
has complete batch process and one or more steps are included in the job. A job
set up to run in JSL (Job Specification Language) sequence.
What
is execution context in Spring Batch?
In case you want to restart a batchrun for error like fatal exception, the
Spring Batch continues with the stored ExecutionContext.
What
is StepScope in Spring Batch?
The objects whose scope is StepScope, for those objects Spring Batch will
use the spring container to create a new instance of that object for each step
execution.
What
is Step Partition in Spring Batch?
Spring batch can be handled in a single-process job, however to have a
multi-process job, we can use Partitioning a Step. In Spring Batch Step
Partitioning, Step is divided into a number of child steps, which may be used
either as remote instances or as local execution threads.
What
is Spring batch job launcher?
Spring batch job launcher is an interface for running the jobs, which uses
run method with two parameters.
Example:
JobLauncher
jobLauncher = context.getBean(JobLauncher.class);
Job
testJob = context.getBean(TestJob.class);
jobLauncher.run(
testJob,
new
JobParametersBuilder()
.addString("inputFile",
"file:./notes.txt")
.addDate("date",
new Date())
.toJobParameters()
);
What
is remote chunking in Spring Batch?
In spring batch remote chunking, Master Step reads the date and pass over to
slaves for processing.
What
is the difference between Step vs Tasklet vs Chunk?
While Step is
an independent phase of execution, Tasklet and Chunk are
different processing structures used within a Step.
What
are different types of process flow for Step execution?
- Tasklet Model
- Chunk Model
How
to choose between Tasklet model and Chunk model?
Typically, when the Step execution task is simple, we
choose Tasklet model and if the task processing is complex, we go for Chunk
Model.
How
do I schedule a Spring Batch job?
- Enable
Scheduling with @EnableScheduling annotation.
- Annotate
method with @Scheduled annotation.
With
this, the method execution will happen at a schedule mentioned in @Scheduled
annotation.
For
example: @Schedules(cron=”0 */1 * * * ?”) will run after every 1 minute.
How to implement security for Spring Batch ?
We will need to implement a method to
authenicate the user -
publicAuthentication
authenticateUser(String
username
,String
password
){
ProviderManager
providerManager
=(ProviderManager)
applicationContext
.getBean("authenticationManager");
Authentication
authentication
=providerManager
.authenticate(newUsernamePasswordAuthenticationToken(
username
,password
));
setAuthentication(
authentication
);
return
authentication
;
}
What is Spring Batch Partitioning ?
In some scenarios single threaded application
will may not give proper performance. In such a scenario spring batch
partitioning is one way for scaling batch jobs that can improve performance In
Spring Batch, "partitioning" is multiple threads to process range of
data each. Lets take example, You have 100 records in table, which has primary
id assigned from 1 to 100 and you want to access all 100 records. We can make
use of Spring Batch Partitioning in such a scenario.
Explain the conditions processing in Spring Batch?
Spring batch follows the traditional batch
architecture where a job repository does the work of scheduling and interacting
with the job. A job can have more than one steps and every step typically
follows the sequence of reading data, processing it and writing it.Spring Batch
provides reusable functions that are essential in processing large volumes of
records, including logging transaction management, job processing statistics,
job restart, skip, and resource management.
For example, a step may read data from a CSV file
process it and write it into the database. Spring Batch provides many made
Classes to read/write CSV, XML and database.
Explain the
Spring Batch framework architecture?
Spring Batch is a lightweight, comprehensive
batch framework designed to enable the development of robust batch applications
vital for the daily operations of enterprise systems. High-volume batch jobs
can leverage the framework in a highly scalable manner to process significant
volumes of information.
Application-This component contains all the jobs
and the code we write using the Spring Batch framework.
Batch Core-This component contains all the API
classes that are needed to control and launch a Batch Job.
What is ExecutionContext in Spring Batch?
An ExecutionContext is a set of key-value pairs
containing information that is scoped to either StepExecution or
JobExecution.Spring Batch persists the ExecutionContext , which helps in cases
where you want to restart a batch run.
For example: When a fatal error has occurred, etc.
What are the typical processing strategies in Spring Batch?
·
cessing during offline
·
Concurrent batch or online processing
·
Parallel processing of many different batch or jobs at the same
time
·
Partitioning
What is spring batch listener?
Spring Batch listeners are a way of
intercepting the execution of a Job or a Step to perform some meaningful
operations or logging the progress.
We will see some samples and eventually see the
execution of these various listeners.
How can a person configure a job in Spring Batch framework?
A Job in Spring Batch contains a sequence of
one or more Steps. Each Step can be configured with the list of
parameters/attribute required to execute each step.
next : next step to execute
tasklet : task or chunk to execute. A chunk can be configured
with a Item Reader, Item Processor and Item Writer.
decision : Decide which steps need to executed.
What is difference between Remote Partitioning and Remote Chunking
in Spring Batch?
Remote Partitioning allows data to be partitioned and executed parallely.
For example, we can say partition is divided into set of data, like if have 30
rows, so first data set would 1-10 rows, second data set will have 11-20 and so
on.. Master Step have all meta data like all partition data sets and slave
executes those meta data and send result back to master for aggregation.
Remote Chunking read the data and has control to pass the
data to its Slaves for processing. Once slaves process data,the result of the
ItemProcessor is returned to the master for writing.
What is Commandlinejobrunner in Spring Batch?
CommandLineJobRunner
is one of the ways to bootstrap your Spring batch Job. The xml script launching
the job needs a Java class main method as as entry point and
CommandLineJobRunner helps you to start your job directly using the XML script.
The CommandLineJobRunner performs 4 tasks:
Load the appropriate ApplicationContext.
Parse command line arguments into JobParameters.
Locate the appropriate job based on arguments.
Use the JobLauncher provided in the application
context to launch the job.
What is Tasklet, and what is a
Chunk?
The Tasklet is a simple interface
with one method to execute. A tasklet can be used to perform single tasks like
running queries, deleting files, etc. In Spring Batch, the tasklet is an
interface that can be used to perform unique tasks like clean or set up
resources before or after any step execution.
Spring
Batch uses a ‘Chunk Oriented’ processing style within its most common
implementation. Chunk Oriented Processing refers to reading the data one at a
time and creating chunks that will be written out, within a transaction
boundary.
No comments:
Post a Comment