in Education by
I want to get visualized statistics from my data in mongodb using matplotlib, but the way I'm using now is really weird. I queried the mongodb 30 times for getting day-by-day data, which is already slow and dirty, especially when I'm getting the result from somewhere else instead of on the server. I wonder if there is a better/clean way to get hour-by-hour, day-by-day, month-by-month and year-by-year statistics? Here is some code I'm using now(get day-by-day statistics): from datetime import datetime, date, time, timedelta import matplotlib.pyplot as plt import matplotlib.ticker as ticker from my_conn import my_mongodb t1 = [] t2 = [] today = datetime.combine(date.today(), time()) with my_mongodb() as m: for i in range(30): day = today - timedelta(days = i) t1 = [m.data.find({"time": {"$gte": day, "$lt": day + timedelta(days = 1)}}).count()] + t1 t2 = [m.data.find({"deleted": 0, "time": {"$gte": day, "$lt": day + timedelta(days = 1)}}).count()] + t2 x = range(30) N = len(x) def format_date(x, pos=None): day = today - timedelta(days = (N - x - 1)) return day.strftime('%m/%d') plt.bar(range(len(t1)), t1, align='center', color="#4788d2") #All plt.bar(range(len(t2)), t2, align='center', color="#0c3688") #Not-deleted plt.xticks(range(len(x)), [format_date(i) for i in x], size='small', rotation=30) plt.grid(axis = "y") plt.show() JavaScript questions and answers, JavaScript questions pdf, JavaScript question bank, JavaScript questions and answers pdf, mcq on JavaScript pdf, JavaScript questions and solutions, JavaScript mcq Test , Interview JavaScript questions, JavaScript Questions for Interview, JavaScript MCQ (Multiple Choice Questions)

1 Answer

0 votes
by
UPDATE: I fundamentally misunderstood the problem. Felix was querying mongoDB to figure out how many items fell into each range; therefore, my approach didn't work, because I was trying to ask mongoDB for the items. Felix has a lot of data, so this is completely unreasonable. Felix, here's an updated function which should do what you want: def getDataFromLast(num, quantum): m = my_mongodb() all = [] not_deleted = [] today = datetime.combine(date.today(), time()) for i in range(num+1)[-1]: # start from oldest day = today - i*quantum time_query = {"$gte":day, "$lt": day+quantum} all.extend(m.data.find({"time":time_query}).count()) not_deleted.extend(m.data.find({"deleted":0, "time":time_query}).count()) return all, not_deleted Quantum is the "step" to look back by. For instance, if we wanted to look at the last 12 hours, I'd set quantum = timedelta(hours=1) and num = 12. An updated example usage where we get the last 30 days would be: from datetime import datetime, date, time, timedelta import matplotlib.pyplot as plt import matplotlib.ticker as ticker from my_conn import my_mongodb #def getDataFromLast(num, quantum) as defined above def format_date(x, N, pos=None): """ This is your format_date function. It now takes N (I still don't really understand what it is, though) as an argument instead of assuming that it's a global.""" day = date.today() - timedelta(days=N-x-1) return day.strftime('%m%d') def plotBar(data, color): plt.bar(range(len(data)), data, align='center', color=color) N = 30 # define the range that we want to look at all, valid = getDataFromLast(N, timedelta(days=1)) # get the data plotBar(all, "#4788d2") # plot both deleted and non-deleted data plotBar(valid, "#0c3688") # plot only the valid data plt.xticks(range(N), [format_date(i) for i in range(N)], size='small', rotation=30) plt.grid(axis="y") plt.show() Original: Alright, this is my attempt at refactoring for you. Blubber has suggested learning JS and MapReduce. There's no need as long as you follow his other suggestions: create an index on the time field, and reduce the number of queries. This is my best attempt at that, along with a slight refactoring. I have a bunch of questions and comments though. Starting in: with my_mongodb() as m: for i in range(30): day = today - timedelta(days = i) t1 = [m.data.find({"time": {"$gte": day, "$lt": day + timedelta(days = 1)}}).count()] + t1 t2 = [m.data.find({"deleted": 0, "time": {"$gte": day, "$lt": day + timedelta(days = 1)}}).count()] + t2 You're making a mongoDB request to find all the data from each day from the past 30 days. Why don't you just use one request? And once you have all of the data, why not just filter out the deleted data? with my_mongodb() as m: today = date.today() # not sure why you were combining this with time(). It's the datetime representation of the current time.time() start_date = today -timedelta(days=30) t1 = m.find({"time": {"$gte":start_date}}) # all data since start_date (30 days ago) t2 = filter(lambda x: x['deleted'] == 0, all_data) # all data since start_date that isn't deleted I'm really not sure why you were making 60 requests (30 * 2, one for all the data, one for non-deleted). Is there any particular reason you built up the data day-by-day? Then, you have: x = range(30) N = len(x) Why not: N = 30 x = range(N) len(range(x) is equal to x, but takes up time to compute. The way you wrote it originally is just a little... weird. Here's my crack at it, with the changes I've suggested made in a way that is as general as possible. from datetime import datetime, date, time, timedelta import matplotlib.pyplot as plt import matplotlib.ticker as ticker from my_conn import my_mongodb def getDataFromLast(delta): """ Delta is a timedelta for however long ago you want to look back. For instance, to find everything within the last month, delta should = timedelta(days=30). Last hour? timedelta(hours=1).""" m = my_mongodb() # what exactly is this? hopefully I'm using it correctly. today = date.today() # was there a reason you didn't use this originally? start_date = today - delta all_data = m.data.find({"time": {"$gte": start_date}}) valid_data = filter(lambda x: x['deleted'] == 0, all) # all data that isn't deleted return all_data, valid_data def format_date(x, N, pos=None): """ This is your format_date function. It now takes N (I still don't really understand what it is, though) as an argument instead of assuming that it's a global.""" day = date.today() - timedelta(days=N-x-1) return day.strftime('%m%d') def plotBar(data, color): plt.bar(range(len(data)), data, align='center', color=color) N = 30 # define the range that we want to look at all, valid = getDataFromLast(timedelta(days=N)) plotBar(all, "#4788d2") # plot both deleted and non-deleted data plotBar(valid, "#0c3688") # plot only the valid data plt.xticks(range(N), [format_date(i) for i in range(N)], size='small', rotation=30) plt.grid(axis="y") plt.show()

Related questions

0 votes
    Assume that I am using Tableau Desktop and have a live connection to Cloudera Hadoop data. I need to press ... the visualization every x minutes instead of pressing F5 everytime?...
asked Oct 30, 2020 in Technology by JackTerrance
0 votes
    I was trying to write a quick-and-dirty script to generate plots easily. For that, I was using the following code ( ... do I do that? Select the correct answer from above options...
asked Jan 30, 2022 in Education by JackTerrance
0 votes
    Create a two 2-D array. Plot it using matplotlib...
asked Apr 24, 2021 in Technology by JackTerrance
0 votes
    The ________ project builds on top of pandas and matplotlib to provide easy plotting of data. (a) yhat ... and answers pdf, Data Science interview questions for beginners...
asked Oct 28, 2021 in Education by JackTerrance
0 votes
    _________ generate summary statistics of different variables in the data frame, possibly within strata. (a) rename ( ... R Programming Select the correct answer from above options...
asked Feb 15, 2022 in Education by JackTerrance
0 votes
    The _________ function is used to generate summary statistics from the data frame within strata defined by a ... R Programming Select the correct answer from above options...
asked Feb 13, 2022 in Education by JackTerrance
0 votes
    __________statistics provides the summary statistics of the data. (a)Descriptive (b)Inferential...
asked Oct 7, 2020 in Technology by JackTerrance
0 votes
    In order to visually represent relationship among 3 variables, which of the following data visualization should be used? Select the correct answer from above options...
asked Dec 25, 2021 in Education by JackTerrance
0 votes
    What are all the major functions of data analyst?(options are checkbox) A. Automated decision making B.super ... . Building dashboards Select the correct answer from above options...
asked Dec 19, 2021 in Education by JackTerrance
0 votes
    Which of the following reads a data.frame and creates text output referring to the Google Visualization API? (a ... and answers pdf, Data Science interview questions for beginners...
asked Oct 28, 2021 in Education by JackTerrance
0 votes
    Which of the following reads a data.frame and creates text output referring to the Google Visualization API? (a ... gvisTimeLine (c) gvisAnnotatedTimeLine (d) none of the mentioned...
asked Oct 5, 2021 in Technology by JackTerrance
0 votes
    What is data visualization in Tableau?...
asked Nov 1, 2020 in Technology by JackTerrance
0 votes
    What is data visualization in Tableau?...
asked Oct 29, 2020 in Technology by JackTerrance
+1 vote
    Which database solution is integrated with Tableau Data Visualization Tool...
asked Oct 11, 2020 in Education by anonymous
+1 vote
    data visualization tool supports powerful business intelligence operations, analytics and enterprise reporting capabilities....
asked Oct 11, 2020 in Education by anonymous
...