in Education by
How can I divide data into training and validation sets, Should I divide it 50%-50% for both or is there another criteria to divide it in training data and validation data or this thing depends over application? Currently I am using 80% training data and 20% validation data, Is there anyone who's experienced in machine learning advice me on this? Select the correct answer from above options

1 Answer

0 votes
by
@kavita,There are two main concerns regarding the division- 1.With Less training data,your parameter estimates have greater variance. 2.With Less testing data, your performance statistic will have greater variance. It should be divided in such a way that neither variance is too high.According to Pareto principle, 80/20 is the common occurring ratio. Let’s assume you have enough data for a proper split, following are some instructive ways to get a handle on variances: split the data into training and testing. Then slit the training data into validation and training. Subsample random selections of training data, train the classify and then record a performance on the validation set. Try a different type of splits, you will notice greater performance with more data. To get a handle on variance follows the same procedure but in reverse. If you are a beginner and want to know more about Machine Learning, then check out this course by Intellipaat which will teach you ML from basics: Machine Learning Course

Related questions

0 votes
    What is the right amount of data to allocate for training, validation, and test sets?...
asked Mar 11, 2021 in Technology by JackTerrance
0 votes
    I have worked all the tutorials and searched for "load csv tensorflow" but just can't get the logic of it all. ... and test the net. Select the correct answer from above options...
asked Feb 1, 2022 in Education by JackTerrance
0 votes
    I want to save the history to a file, in Keras I have model.fit history = model.fit(Q_train, W_train, ... =(Q_test, W_test)) Select the correct answer from above options...
asked Jan 24, 2022 in Education by JackTerrance
0 votes
    I know the basics of feedforward neural networks, and how to train them using the backpropagation algorithm, but I'm ... , even better. Select the correct answer from above options...
asked Feb 8, 2022 in Education by JackTerrance
0 votes
    It's noticed by me that introduction of NAN S has been occurring frequently in training. I think that it's ... these methods are used? Select the correct answer from above options...
asked Jan 24, 2022 in Education by JackTerrance
0 votes
    I've just started using R and I'm not sure how to incorporate my dataset with the following sample ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 26, 2022 in Education by JackTerrance
0 votes
    I'm reading a book, "AI for Game Developers" by Glenn Seemann and David M Bourg, where they use video game ... based system? Thanks! Select the correct answer from above options...
asked Feb 1, 2022 in Education by JackTerrance
0 votes
    I wish to divide pandas dataframe to 3 separate sets. I know by using train_test_split from sklearn.cross_validation, ... ? kindly help Select the correct answer from above options...
asked Jan 22, 2022 in Education by JackTerrance
0 votes
    How to split data into 3 sets (train, validation and test)?...
asked Nov 20, 2020 in Education by Editorial Staff
0 votes
    Slide Rule allowed the operator to multiply, divide and calculate square and cube roots by moving the rods ... constructed boards. Select the correct answer from above options...
asked Dec 10, 2021 in Education by JackTerrance
0 votes
    Previously, I have implemented a variety of machine learning & statistical algorithms in C++ and MATLAB but I ... learning in Python? Select the correct answer from above options...
asked Jan 22, 2022 in Education by JackTerrance
0 votes
    Which of the following can be used to impute data sets based only on information in the training set? ... questions and answers pdf, Data Science interview questions for beginners...
asked Oct 28, 2021 in Education by JackTerrance
0 votes
    In the chapter seven of this book "TensorFlow Machine Learning Cookbook" the author in pre-processing data ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 17, 2022 in Education by JackTerrance
0 votes
    I have a Python dictionary like the following: {u'2012-06-08': 388, u'2012-06-09': 388, u'2012-06-10 ... (my_dict,index=my_dict.keys()) Select the correct answer from above options...
asked Jan 27, 2022 in Education by JackTerrance
0 votes
    While training my neural network using Theano or tensorflow, a variable called loss per epoch was reported. Now ... neural network? Select the correct answer from above options...
asked Jan 22, 2022 in Education by JackTerrance
...