a. Integrated
Integrate is simply defined as combining or merge. Here, Spark SQL queries are integrated with Spark programs. Through Spark SQL we are allowed to query structured data inside Spark programs. This is possible by using SQL or a DataFrame that can be used in Java, Scala.
We can run streaming computation through it. Developers write a batch computation against the DataFrame / Dataset API to run it. After that to run it in a streaming fashion Spark itself increments the computation. Developers leverage the advantage of it that they don’t have to manage state, failures on own. Even no need keep the application in sync with batch jobs. Despite, the streaming job always gives the same answer as a batch job on the same data.
b. Unified Data Access
To access a variety of data sources DataFrames and SQL support a common way. Data Sources like Hive, Avro, Parquet, ORC, JSON, as well as JDBC. It helps to join the data from these sources. To accommodate all the existing users into Spark SQL, it turns out to be very helpful.
c. High compatibility
We are allowed to run unmodified Hive queries on existing warehouses in Spark SQL. With existing Hive data, queries and UDFs, Spark SQL offers full compatibility. Also, rewrites the MetaStore and Hive frontend.
d. Standard Connectivity
We can easily connect Spark SQL through JDBC or ODBC. For connectivity for business intelligence tools, Both turned as industry norms. Also, includes industry standard JDBC and ODBC connectivity with server mode.
e. Scalability
It takes advantage of RDD model, to support large jobs and mid-query fault tolerance. For interactive as well as long queries, it uses the same engine.
f. Performance Optimization
In Spark SQL, query optimization engine converts each SQL query into a logical plan. Afterwards, it converts to many physical execution plans. At the time of execution, it selects the most optimal physical plan, among the entire plan. It ensures fast execution of HIVE queries.
g. For batch processing of Hive tables
While working with Hive tables, we can use Spark SQL for Batch Processing in them.