25 Commits

Author SHA1 Message Date
Noah Levitt
f1d07ad921 use urlcanon library for canonicalization, surtification, scope match rules 2017-03-15 09:33:50 -07:00
Noah Levitt
842bfd651c rethinkstuff -> doublethink 2017-03-02 15:06:26 -08:00
Noah Levitt
c9e403585b switching from host limits to domain limits, which apply in aggregate to the host and subdomains 2016-06-29 14:56:14 -05:00
Noah Levitt
fabd732b7f couple of fixes for host limits 2016-06-24 21:58:37 -05:00
Noah Levitt
2fe0c2f25b support for tallying substats of a configured bucket by host, and enforcing limits host limits using those stats, with tests 2016-06-24 20:04:27 -05:00
Noah Levitt
d48e2c462d add a start() method to the two classes that save data to rethinkdb periodically in batches, instead of starting the timer in __init__ 2016-06-16 00:04:59 +00:00
Noah Levitt
2c65ff89fa add license headers 2016-04-06 19:37:55 -07:00
Noah Levitt
1e0a3f0135 import dbm only if used 2016-01-27 21:18:02 +00:00
Noah Levitt
fb58244c4f update stats in rethinkdb only every 2.0 seconds instead of every 0.5 2016-01-26 18:47:08 -08:00
Noah Levitt
e3a5717446 hidden --profile option to enable profiling of warc writer thread and periodic logging of memory usage info; at shutdown, close stats db and unregister from service registry; logging improvements 2016-01-26 18:47:08 -08:00
Noah Levitt
9af17ba7c3 update stats batch every 0.5 seconds, since rethinkdb updates were falling way behind sometimes 2016-01-26 18:47:08 -08:00
Noah Levitt
afdb6cf557 log status in close() 2016-01-26 18:47:08 -08:00
Noah Levitt
f1362e4da0 use only one worker thread for asynchronous rethinkdb stats updates, to fix race condition causing some numbers to be lost 2016-01-26 18:47:08 -08:00
Noah Levitt
95e611a5d0 update stats in RethinkDb asynchronously, since profiling shows this to be a bottleneck in WarcWriterThread (which in turn makes it a bottleneck for the whole app) 2016-01-26 18:47:08 -08:00
Noah Levitt
f806cd3e4a use Rethinker.dbname to avoid conflict with rethinkdb.db 2016-01-26 18:47:08 -08:00
Noah Levitt
69d641cd50 avoid attempting to create tables with more shards or replicas than the number of servers 2016-01-26 18:47:08 -08:00
Noah Levitt
0171cdd01d fixes for python 2.7 2016-01-26 18:47:08 -08:00
Noah Levitt
3b9345e7d7 use nicer rethinkdbstuff.Rethinker api 2016-01-26 18:47:08 -08:00
Noah Levitt
f90c3a6403 Rethinker class moved to its own pyrethink project 2016-01-26 18:47:08 -08:00
Noah Levitt
a9986e4ce3 fix NameError, quiet logging 2016-01-26 18:47:08 -08:00
Noah Levitt
022f6e7215 wrap rethinkdb operations and retry if appropriate (as best as we can tell) 2016-01-26 18:47:08 -08:00
Noah Levitt
c430f81883 some refactoring to prep for big rethinkdb capture table 2016-01-26 18:47:08 -08:00
Noah Levitt
f000d413a2 quiet stats logging 2016-01-26 18:46:13 -08:00
Noah Levitt
df38cf856d rethinkdb for stats 2016-01-26 18:46:13 -08:00
Noah Levitt
4ce89e6d03 basic limits enforcement is working 2016-01-26 18:46:13 -08:00