in Education by
Say I have a DataFrame full of positive samples and context features for a given user: target user cashtag sector industry 0 1 170 4979 3 70 1 1 170 5539 3 70 2 1 170 7271 3 70 3 1 170 7428 3 70 4 1 170 686 7 139 where a positive sample is a user having interacted with a cashtag and is denoted by target = 1. What is a quick way for me to generate negative samples in the ratio 1:2 (+ve:-ve) for each interaction, denoted by target = -1? EDIT: Sample for clarity below (for the first two positive samples) target user cashtag sector industry 0 1 170 4979 3 70 1 -1 170 3224 7 181 2 -1 170 4331 7 180 3 1 170 5539 3 70 4 -1 170 9304 4 59 5 -1 170 3833 6 185 For instance, for each cashtag a user has interacted with, I'd like to pick at random 2 other cashtags that they haven't interacted with and add them as negative samples to the dataframe; effectively increasing the size of the dataframe to 3 times its original size. It would also be helpful to check if the negative sample hasn't already been entered for that user, cashtag combination. JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
Here my solution: data=""" target user cashtag sector industry 1 170 4979 3 70 1 170 5539 3 70 1 170 7271 3 70 1 170 7428 3 70 1 170 686 7 139 """ df = pd.read_csv(pd.compat.StringIO(data), sep='\s+') df1 = pd.DataFrame(columns = df.columns) cashtag = df['cashtag'].values.tolist() #function to randomize some numbers def randomnumber(v): return np.random.randint(v, size=1) def addNewRow(x): for i in range(2): #add 2 new rows cash = cashtag[0] while cash in cashtag: #check if cashtag already used cash = randomnumber(5000)[0] #random number between 0 and 5000 cashtag.append(cash) sector = randomnumber(10)[0] industry = randomnumber(200)[0] df1.loc[df1.shape[0]] = [-1, x.user, cash, sector, industry] df.apply(lambda x: addNewRow(x), axis=1) df = df.append(df1).reset_index() print(df) output: index target user cashtag sector industry 0 0 1 170 4979 3 70 1 1 1 170 5539 3 70 2 2 1 170 7271 3 70 3 3 1 170 7428 3 70 4 4 1 170 686 7 139 5 0 -1 170 544 2 59 6 1 -1 170 3202 8 165 7 2 -1 170 2673 0 40 8 3 -1 170 4021 1 30 9 4 -1 170 682 6 3 10 5 -1 170 2446 1 80 11 6 -1 170 4026 9 193 12 7 -1 170 4070 9 197 13 8 -1 170 2900 1 57 14 9 -1 170 3287 0 21 The new random rows are put at the end of dataframe

Related questions

0 votes
    This question already has answers here: Pandas Merging 101 (6 answers) Closed 3 years ago. Sorry guys, ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 23, 2022 in Education by JackTerrance
0 votes
    This question already has answers here: Pandas Merging 101 (6 answers) Closed 3 years ago. Sorry guys, ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 14, 2022 in Education by JackTerrance
0 votes
    In [60]: print(row.index) Int64Index([15], dtype='int64') I already know that the row number is ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Jun 12, 2022 in Education by JackTerrance
0 votes
    I'm having an issue currently with pulling from an API, getting a JSON dict, then flattening it, ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 7, 2022 in Education by JackTerrance
0 votes
    I'm having an issue currently with pulling from an API, getting a JSON dict, then flattening it, ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 26, 2022 in Education by JackTerrance
0 votes
    I would like to use the .assign method with multiple lambda functions to multiple datasets. So far, I' ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 23, 2022 in Education by JackTerrance
0 votes
    My data looks as follows: ID my_val db_val a X X a X X a Y X b X Y b Y Y b ... JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 21, 2022 in Education by JackTerrance
0 votes
    How can I remove emojis that start with '\x' when reading a csv file using pandas in Python? The ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 14, 2022 in Education by JackTerrance
0 votes
    0 2 ['name:', 'Atlanta', 'GA:', 'Hartsfield-Jackson', 'Atlanta', 'International'] 35 ['name: ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 2, 2022 in Education by JackTerrance
0 votes
    Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the ... way to do this? Select the correct answer from above options...
asked Feb 3, 2022 in Education by JackTerrance
0 votes
    I am trying to groupby a column and compute value counts on another column. import pandas as pd dftest = pd. ... Amt, already exists Select the correct answer from above options...
asked Feb 1, 2022 in Education by JackTerrance
0 votes
    I'm starting with input data like this df1 = pandas.DataFrame( { "Name" : ["Alice", "Bob", "Mallory", ... Any hints would be welcome. Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    I have a 20 x 4000 dataframe in python using pandas. Two of these columns are named Year and quarter. I'd ... anyone help with that? Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    I'm starting from the pandas DataFrame docs here: http://pandas.pydata.org/pandas-docs/stable/dsintro.html I'd ... =1)] print valdict Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    How do I get the index column name in python pandas? Here's an example dataframe: Index Title Column 1 Apples 1 ... how to do this? Select the correct answer from above options...
asked Jan 27, 2022 in Education by JackTerrance
...