twython - simple python script using too much cpu -
i told off vps python script using cpu (apparently script utilising entire core few hours).
my script uses twython library stream tweets
def on_success(self, data): if 'text' in data: self.counter += 1 self.tweetdatabase.save(tweet(data)) #we want commit when have batch if self.counter >= 1000: print("{0}: commiting {1} tweets".format(datetime.now(), self.counter)) self.counter = 0 self.tweetdatabase.commit()
tweet class that's job throw away meta data tweet not need:
class tweet(): def __init__(self, json): self.user = {"id" : json.get('user').get('id_str'), "name" : json.get('user').get('name')} self.timestamp = datetime.datetime.strptime(json.get('created_at'), '%a %b %d %h:%m:%s %z %y') self.coordinates = json.get('coordinates') self.tweet = { "id" : json.get('id_str'), "text" : json.get('text').split('#')[0], "entities" : json.get('entities'), "place" : json.get('place') } self.favourite = json.get('favorite_count') self.retweet = json.get('retweet_count')
it has __str__
method return super compact string representation of object
the tweetdatabase.commit()
saves tweets file while tweetdatabase.save()
saves tweet list:
def save(self, tweet): self.tweets.append(tweet.__str__()) def commit(self): open(self.path, mode='a', encoding='utf-8') f: f.write('\n'.join(self.tweets)) self.tweets = []
whats best way keep cpu low? if sleep losing tweets time program spent not listening twitters api. dispite tried sleeping second after program writes file did nothing bring cpu down. record saving file every 1000 tweets on once minute.
many thanks
try checking if need commit first in on_success(). then, check if tweet has data want save. might want consider race conditions on self.counter variable, , should have update self.count wrapped in mutex or similar.
Comments
Post a Comment