in Education by
I have two data frames with the closest matching DateTime index, sometimes matching. An object is to merge two of them using one index as a reference and appending a second to the closest matching (within 1 minute) on the first one. My code and output: import pandas as pd masterdf = pd.DataFrame({"AA":[77.368607,77.491655,77.425134,76.490991]}) masterdf.index = ['2019-10-01 07:52:07','2019-10-01 07:53:01','2019-10-01 07:53:54','2019-10-01 07:54:47'] masterdf.index.name = 'datetime' slavedf = pd.DataFrame({"BB":[50,60,70,80]}) slavedf.index = ['2019-10-01 07:53:00','2019-10-01 07:53:54','2019-10-01 10:54:47','2019-10-01 10:00:00'] slavedf.index.name = 'datetime' maindf = masterdf.merge(slavedf,left_index=True,right_index=True) Present output: masterdf = AA datetime 2019-10-01 07:52:07 77.368607 2019-10-01 07:53:01 77.491655 2019-10-01 07:53:54 77.425134 2019-10-01 07:54:47 76.490991 slavedf = BB datetime 2019-10-01 07:53:00 50 2019-10-01 07:53:54 60 2019-10-01 10:54:47 70 2019-10-01 10:00:00 80 maindf = datetime AA BB 2019-10-01 07:53:54 77.425134 60 Expected output: maindf = datetime AA BB 2019-10-01 07:53:01 77.491655 50 2019-10-01 07:53:54 77.425134 60 How do I achieve this? Select the correct answer from above options

1 Answer

0 votes
by
 
Best answer
Logic here use a merge_asof , we need to adjust it due to , merge_asof will use the 2nd dataframe mutiple times , then we need additional key here is datetime to drop a duplicate masterdf.index=pd.to_datetime(masterdf.index) masterdf=masterdf.sort_index().reset_index() slavedf.index=pd.to_datetime(slavedf.index) slavedf=slavedf.sort_index().reset_index() slavedf['datetime2']=slavedf['datetime'] slavedf['key']=slavedf.index newdf=pd.merge_asof(masterdf,slavedf,on='datetime',tolerance=pd.Timedelta('60s'),direction='nearest') newdf['diff']=(newdf.datetime-newdf.datetime2).abs() newdf=newdf.sort_values('diff').drop_duplicates('key') newdf Out[35]: datetime AA BB datetime2 diff 2 2019-10-01 07:53:54 77.425134 60 2019-10-01 07:53:54 00:00:00 1 2019-10-01 07:53:01 77.491655 50 2019-10-01 07:53:00 00:00:01 If you are a beginner and want to know more about Data Science the do check out the Data Science course

Related questions

0 votes
    I have 2 data frames df1 Name 2010 2011 0 Jack 25 35 1 Jill 15 20 df2 Name 2010 2011 0 Berry 45 25 1 ... used the code df1.add(df2) Select the correct answer from above options...
asked Jan 18, 2022 in Education by JackTerrance
0 votes
    I am trying to merge 2 data frames by using the id and then trying to save the result in the JSON file. | |id|a|b ... |3|15|3, 12|6, 15 Select the correct answer from above options...
asked Jan 11, 2022 in Education by JackTerrance
0 votes
    To the data frame df: Player Team Points Mean Price Value Gameweek 1 Jim Leeds 4.4 4.40 10.44 0.44 2 Jim ... scalar What am I missing? Select the correct answer from above options...
asked Jan 18, 2022 in Education by JackTerrance
0 votes
    I have a 20 x 4000 dataframe in python using pandas. Two of these columns are named Year and quarter. I'd ... anyone help with that? Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    I have a set of oil wells compiled in the panda's data frame. It looks like this: wells = pd.DataFrame({'date ... -01-01 FIELDZ 10.8 Select the correct answer from above options...
asked Jan 19, 2022 in Education by JackTerrance
0 votes
    I am getting errors for the code when running it on a 'tips' dataset but I can run it on a tulips dataset ... . Am I missing something? Select the correct answer from above options...
asked Jan 19, 2022 in Education by JackTerrance
0 votes
    I have the dataset that has the no_employees column that is the str object. whats is a best way to create the new ... 1-5 |Very Small Select the correct answer from above options...
asked Jan 19, 2022 in Education by JackTerrance
0 votes
    I have a dataframe with 2 columns and I want to add the new column; This new column should be updated based on ... +=1 is not working. Select the correct answer from above options...
asked Jan 11, 2022 in Education by JackTerrance
0 votes
    I have the list of sets like so. I basically want to convert this to the dictionary and to address duplicate keys, ... ] = val return d Select the correct answer from above options...
asked Jan 9, 2022 in Education by JackTerrance
0 votes
    I have this dataframe and trying to select the last n (=2) rows if the present value is True so I code as ... , I should select 50,40 Select the correct answer from above options...
asked Jan 8, 2022 in Education by JackTerrance
0 votes
    Here's a puzzle... I have two databases of the same 50000+ electronic products and I want to match products ... I tackle this problem? Select the correct answer from above options...
asked Jan 29, 2022 in Education by JackTerrance
0 votes
    Suppose I have a Tensorflow tensor. How do I get the dimensions (shape) of the tensor as integer values? I ... 'Dimension' instead. Select the correct answer from above options...
asked Feb 8, 2022 in Education by JackTerrance
0 votes
    While training a tensorflow seq2seq model I see the following messages : W tensorflow/core/common_runtime/gpu/pool_allocator ... GB GPU Select the correct answer from above options...
asked Feb 8, 2022 in Education by JackTerrance
0 votes
    While training a tensorflow seq2seq model I see the following messages : W tensorflow/core/common_runtime/gpu/pool_allocator ... GB GPU Select the correct answer from above options...
asked Feb 5, 2022 in Education by JackTerrance
0 votes
    Classification problems, such as logistic regression or multinomial logistic regression, optimize a cross-entropy loss. ... jungle. Select the correct answer from above options...
asked Feb 4, 2022 in Education by JackTerrance
...