1 Answer

0 votes
by
 
Best answer
Text Normalizationin Text Normalization, we undergo several steps to normalize the text to a lower level. Sentence Segmentation - Under sentence segmentation, the whole corpus is divided into sentences. Each sentence is taken as a different data so now the whole corpus gets reduced to sentences. Tokenisation- After segmenting the sentences, each sentence is then further divided into tokens. Tokens is a term used for any word or number or special character occurring in a sentence. Under tokenisation, every word, number and special character is considered separately and each of them is now a separate token. Removing Stop words, Special Characters and Numbers - In this step, the tokens which are not necessary are removed from the token list. Converting text to a common case -After the stop words removal, we convert the whole text into a similar case, preferably lower case. This ensures that the case-sensitivity of the machine does not consider same words as different just because of different cases. Stemming In this step, the remaining words are reduced to their root words. In other words, stemming is the process in which the affixes of words are removed and the words are converted to their base form. Lemmatization -in lemmatization, the word we get after affix removal (also known as lemma) is a meaningful one. With this we have normalized our text to tokens which are the simplest form of words present in the corpus. Now it is time to convert the tokens into numbers. For this, we would use the Bag of Words algorithm

Related questions

0 votes
    Draw the graphical representation of Classification AI model. Explain in brief. Select the correct answer from above options...
asked Nov 12, 2021 in Education by JackTerrance
0 votes
    a. Understand and inspect the web page to find the HTML markers associated with the information we want. b. Use ... d) Web scraping Select the correct answer from above options...
asked Nov 13, 2021 in Education by JackTerrance
0 votes
    If three distinct number are chosen randomly from the first 100 natural numbers, then the probability that all three of ... D. 4 1155 Select the correct answer from above options...
asked Nov 13, 2021 in Education by JackTerrance
...