in Education by
I am working on a project with spark and scala and I am new to both but with lot of help from stackoverflow I have done all the data processing and stored the processed data in mysql. Now at last I am facing a problem and I don't understand how to tackle it. First time when I processed the data then I stored the dataframe using this method and first time table is empty. df.write.mode("append").jdbc("dburl", "tablename", "dbproperties"); Let say that my processed data is look like this in database. id name eid number_of_visitis last_visit_date 1 John C110 12 2016-01-13 00:00:00 2 Root C111 24 2016-04-27 00:00:00 3 Michel C112 8 2016-07-123 00:00:00 4 Jonny C113 45 2016-06-10 00:00:00 Now person named 'Root' with eid 'C111' visit office 2 times on '2016-08-30 00:00:00' now after processing this new data I need to update only this person record in database. How I will do that. Now the updated table should look like this. id name eid number_of_visitis last_visit_date 1 John C110 12 2016-01-13 00:00:00 2 Root C111 26 2016-08-30 00:00:00 3 Michel C112 8 2016-07-123 00:00:00 4 Jonny C113 45 2016-06-10 00:00:00 I have million of data in this table and if I load the full table in spark dataframe and update the desired record then it will take more time and also it does not make sense because why I load the full table when I want to update only one row.I tried this code but it added the new row to table rather than updating the row. df.write.mode("append").jdbc("dburl", "tablename", "dbproperties"); Is there any way to do that in spark? I have seen this on Internet can I do like this for update. val numParallelInserts = 10 val batchSize = 1000 new CoalescedRDD(sessions, numParallelInserts) mapPartitionsWithSplit { (split, iter) => Iterator((split, iter)) } foreach { case (split, iter) => val db = connect() val sql = "INSERT INTO sessions (id, ts) VALUES (?, ?)" val stmt = db.prepareStatement(sql) iter.grouped(batchSize).zipWithIndex foreach { case (batch, batchIndex) => batch foreach { session => stmt.setString(1, session.id) stmt.setString(2, TimestampFormat.print(session.ts)) stmt.addBatch() } stmt.executeBatch() db.commit(); logInfo("Split " + (split+1) + "/" + numParallelInserts + " inserted batch " + batchIndex + " with " + batch.size + " elements") } db.close(); JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
You can try using sql to do that. Store the updated (and even new) data in a new temporary table and then merge the temporary table into the main table. One way to do that is - Update all the records in the main table using the temporary table update main_table set visits = main_table.visits + temp_table.visits from temp_table where main_table.eid = temp_table.eid; Delete all duplicate records from temporary table (that leaves only new records in the temporary table) delete from temp_table where main_table.eid = temp_table.eid; Insert all records from temporary table into main table insert into main_table select * from temp_table; Drop the temporary table drop table temp_table;

Related questions

0 votes
    I have a 2 column (1 int and 1 double) dataframe "fit_comparison", of predicted values and linear ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 9, 2022 in Education by JackTerrance
0 votes
    In Mercury I can use: A = B^some_field := SomeValue to bind A to a copy of B, except that ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 4, 2022 in Education by JackTerrance
0 votes
    In Mercury I can use: A = B^some_field := SomeValue to bind A to a copy of B, except that ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 4, 2022 in Education by JackTerrance
0 votes
    In Mercury I can use: A = B^some_field := SomeValue to bind A to a copy of B, except that ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 4, 2022 in Education by JackTerrance
0 votes
    I have a generic method which a generic type parameter T which is a subclass of MyClass. Inside that method, I want ... of type erasure): object Demo extends App { def myMethod[T...
asked Jun 30, 2022 in Education by JackTerrance
0 votes
    In scala, it is OK to convert a variable in the Seq, but if I construct the Seq with :: it ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 4, 2022 in Education by JackTerrance
0 votes
    It's a sad fact of life on Scala that if you instantiate a List[Int], you can verify that your ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Mar 4, 2022 in Education by JackTerrance
0 votes
    Definition says: RDD is immutable distributed collection of objects I don't quite understand what does it mean. Is ... one please help. Select the correct answer from above options...
asked Jan 30, 2022 in Education by JackTerrance
0 votes
    How can I convert an RDD to a dataframe? I converted a data frame to rdd using .rdd. After processing it I ... convert it back to rdd Select the correct answer from above options...
asked Jan 21, 2022 in Education by JackTerrance
0 votes
    I work with Spark often, and it would save me a lot of time if the compiler could ensure that a type is serializable. ... T to be serializable } It's not enough to constrain T...
asked Jul 3, 2022 in Education by JackTerrance
0 votes
    I'm struggling to get custom defined mapping between my case classes and database tables due to type ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 5, 2022 in Education by JackTerrance
0 votes
    I have to retrieve Derived class objects stored in a Map given the respective class name as key. As ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 27, 2022 in Education by JackTerrance
0 votes
    I have to retrieve Derived class objects stored in a Map given the respective class name as key. As ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 27, 2022 in Education by JackTerrance
0 votes
    We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array ... .txt").getLines.toArray var happyCount=0 for (e...
asked Apr 26, 2022 in Education by JackTerrance
0 votes
    We are building sentiment analysis application and we converted our tweets dataframe to an array. We created another array ... .txt").getLines.toArray var happyCount=0 for (e...
asked Apr 24, 2022 in Education by JackTerrance
...