UPDATE:
I fundamentally misunderstood the problem. Felix was querying mongoDB to figure out how many items fell into each range; therefore, my approach didn't work, because I was trying to ask mongoDB for the items. Felix has a lot of data, so this is completely unreasonable.
Felix, here's an updated function which should do what you want:
def getDataFromLast(num, quantum):
m = my_mongodb()
all = []
not_deleted = []
today = datetime.combine(date.today(), time())
for i in range(num+1)[-1]: # start from oldest
day = today - i*quantum
time_query = {"$gte":day, "$lt": day+quantum}
all.extend(m.data.find({"time":time_query}).count())
not_deleted.extend(m.data.find({"deleted":0, "time":time_query}).count())
return all, not_deleted
Quantum is the "step" to look back by. For instance, if we wanted to look at the last 12 hours, I'd set quantum = timedelta(hours=1) and num = 12. An updated example usage where we get the last 30 days would be:
from datetime import datetime, date, time, timedelta
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from my_conn import my_mongodb
#def getDataFromLast(num, quantum) as defined above
def format_date(x, N, pos=None):
""" This is your format_date function. It now takes N
(I still don't really understand what it is, though)
as an argument instead of assuming that it's a global."""
day = date.today() - timedelta(days=N-x-1)
return day.strftime('%m%d')
def plotBar(data, color):
plt.bar(range(len(data)), data, align='center', color=color)
N = 30 # define the range that we want to look at
all, valid = getDataFromLast(N, timedelta(days=1)) # get the data
plotBar(all, "#4788d2") # plot both deleted and non-deleted data
plotBar(valid, "#0c3688") # plot only the valid data
plt.xticks(range(N), [format_date(i) for i in range(N)], size='small', rotation=30)
plt.grid(axis="y")
plt.show()
Original:
Alright, this is my attempt at refactoring for you. Blubber has suggested learning JS and MapReduce. There's no need as long as you follow his other suggestions: create an index on the time field, and reduce the number of queries. This is my best attempt at that, along with a slight refactoring. I have a bunch of questions and comments though.
Starting in:
with my_mongodb() as m:
for i in range(30):
day = today - timedelta(days = i)
t1 = [m.data.find({"time": {"$gte": day, "$lt": day + timedelta(days = 1)}}).count()] + t1
t2 = [m.data.find({"deleted": 0, "time": {"$gte": day, "$lt": day + timedelta(days = 1)}}).count()] + t2
You're making a mongoDB request to find all the data from each day from the past 30 days. Why don't you just use one request? And once you have all of the data, why not just filter out the deleted data?
with my_mongodb() as m:
today = date.today() # not sure why you were combining this with time(). It's the datetime representation of the current time.time()
start_date = today -timedelta(days=30)
t1 = m.find({"time": {"$gte":start_date}}) # all data since start_date (30 days ago)
t2 = filter(lambda x: x['deleted'] == 0, all_data) # all data since start_date that isn't deleted
I'm really not sure why you were making 60 requests (30 * 2, one for all the data, one for non-deleted). Is there any particular reason you built up the data day-by-day?
Then, you have:
x = range(30)
N = len(x)
Why not:
N = 30
x = range(N)
len(range(x) is equal to x, but takes up time to compute. The way you wrote it originally is just a little... weird.
Here's my crack at it, with the changes I've suggested made in a way that is as general as possible.
from datetime import datetime, date, time, timedelta
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from my_conn import my_mongodb
def getDataFromLast(delta):
""" Delta is a timedelta for however long ago you want to look
back. For instance, to find everything within the last month,
delta should = timedelta(days=30). Last hour? timedelta(hours=1)."""
m = my_mongodb() # what exactly is this? hopefully I'm using it correctly.
today = date.today() # was there a reason you didn't use this originally?
start_date = today - delta
all_data = m.data.find({"time": {"$gte": start_date}})
valid_data = filter(lambda x: x['deleted'] == 0, all) # all data that isn't deleted
return all_data, valid_data
def format_date(x, N, pos=None):
""" This is your format_date function. It now takes N
(I still don't really understand what it is, though)
as an argument instead of assuming that it's a global."""
day = date.today() - timedelta(days=N-x-1)
return day.strftime('%m%d')
def plotBar(data, color):
plt.bar(range(len(data)), data, align='center', color=color)
N = 30 # define the range that we want to look at
all, valid = getDataFromLast(timedelta(days=N))
plotBar(all, "#4788d2") # plot both deleted and non-deleted data
plotBar(valid, "#0c3688") # plot only the valid data
plt.xticks(range(N), [format_date(i) for i in range(N)], size='small', rotation=30)
plt.grid(axis="y")
plt.show()