in Education by
How can I remove emojis that start with '\x' when reading a csv file using pandas in Python? The CSV file has lots of emojis in the text and I want to remove them. However, the normal pattern matching regex for emojis doesn't work on it. Here is an example: Thx WP for performing key democratic function. Trump wants to live in post truth world where words don't matter. D\xe2\x80\xa6 |\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3\xef\xbf\xa3|\n ME LA PELAS \n DONALD TRUMP \n|\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf\xef\xbc\xbf| \n (\\__/) ||\n (\xe2\x80\xa2\xe3\x85\x85\xe2\x80\xa2) ||\n / \xe3\x80\x80 \xe3\x81\xa5 Here is an example of the code that works on normal emojis but not these ones: import re text = u'This dog \xe2\x80\x9d \xe2\x80\x9c' print(text) # with emoji emoji_pattern = re.compile("[" u"\U0001F600-\U0001F64F" # emoticons u"\U0001F300-\U0001F5FF" # symbols & pictographs u"\U0001F680-\U0001F6FF" # transport & map symbols u"\U0001F1E0-\U0001F1FF" # flags (iOS) "]+", flags=re.UNICODE) print(emoji_pattern.sub(r'', text)) # no emoji So, the following piece of code works: import unicodedata from unidecode import unidecode def deEmojify(inputString): returnString = "" for character in inputString: try: character.encode("ascii") returnString += character except UnicodeEncodeError: returnString += '' return returnString print(deEmojify("I'm loving all the trump hate on Twitter right now \xf0\x9f\x99\x8c")) But when I am reading from a csv using pandas it doesn't work and emojis are not removed: import pandas as pd df = pd.read_csv("Trump834.csv", encoding="utf-8") import unicodedata from unidecode import unidecode def deEmojify(inputString): returnString = "" for character in inputString: try: character.encode("ascii") returnString += character except UnicodeEncodeError: returnString += '' return returnString for i in range(df.shape[0]): print(df.iloc[i]['Tweet']) print(deEmojify(df.iloc[i]['Tweet'])) print("****************************************") JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

Related questions

0 votes
    I have csv file and that looks like following. I want to remove all rows before one row values [ ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 14, 2022 in Education by JackTerrance
0 votes
    __________ function is used for reading the .csv file in R language. (a) Write.csv() (b) Read.csv () ... and Debugging of R Programming Select the correct answer from above options...
asked Feb 12, 2022 in Education by JackTerrance
0 votes
    Which of the following object you get after reading CSV file? (a) DataFrame (b) Character Vector (c) ... -in-Data-Science,Data-Science-Lifecycle,Applications-of-Data-Science...
asked Oct 30, 2021 in Education by JackTerrance
0 votes
    I have a 20 x 4000 dataframe in python using pandas. Two of these columns are named Year and quarter. I'd ... anyone help with that? Select the correct answer from above options...
asked Jan 28, 2022 in Education by JackTerrance
0 votes
    I have 2 data frames df1 Name 2010 2011 0 Jack 25 35 1 Jill 15 20 df2 Name 2010 2011 0 Berry 45 25 1 ... used the code df1.add(df2) Select the correct answer from above options...
asked Jan 18, 2022 in Education by JackTerrance
0 votes
    I have the list like this where items are separated by ":". x=['john:42:engineer', 'michael:29:doctor' ... engineer 1 michael 29 doctor Select the correct answer from above options...
asked Jan 10, 2022 in Education by JackTerrance
0 votes
    I have a fixed list of services in a Linux server. I want to check the status of these services ... JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked May 4, 2022 in Education by JackTerrance
0 votes
    Can anyone tell me how do I start learning Python for Data Science? Select the correct answer from above options...
asked Jan 11, 2022 in Education by JackTerrance
0 votes
    Can anyone tell me how to start a career in Python? Select the correct answer from above options...
asked Jan 8, 2022 in Education by JackTerrance
0 votes
    I have two lists, same size, one is y_data and one is x_data x_data is a time hh:mm:ss during ... , JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Feb 18, 2022 in Education by JackTerrance
0 votes
    I have several csv columns that store lottery numbers and some other info, like date when the number was ... Questions for Interview, JavaScript MCQ (Multiple Choice Questions)...
asked Apr 6, 2022 in Education by JackTerrance
0 votes
    Working with census data, I want to replace NaNs in two columns ("workclass" and "native-country") with the ... way to do this? Select the correct answer from above options...
asked Feb 3, 2022 in Education by JackTerrance
0 votes
    def multiple_dfs(xyz_file, sheet, *args): row=2 writer = pd.ExcelWriter(xyz_file, engine='openpyxl') df = pd. ... help me over this? Select the correct answer from above options...
asked Jan 23, 2022 in Education by JackTerrance
0 votes
    To the data frame df: Player Team Points Mean Price Value Gameweek 1 Jim Leeds 4.4 4.40 10.44 0.44 2 Jim ... scalar What am I missing? Select the correct answer from above options...
asked Jan 18, 2022 in Education by JackTerrance
0 votes
    I have this following datetime64[ns] type timestamps. x1=pd.Timestamp('2018-04-25 00:00:00') x2=pd.Timestamp('2020- ... -04-25 00:53:00 Select the correct answer from above options...
asked Jan 9, 2022 in Education by JackTerrance
...