Noah Levitt
|
f1d07ad921
|
use urlcanon library for canonicalization, surtification, scope match rules
|
2017-03-15 09:33:50 -07:00 |
|
Noah Levitt
|
842bfd651c
|
rethinkstuff -> doublethink
|
2017-03-02 15:06:26 -08:00 |
|
Noah Levitt
|
c9e403585b
|
switching from host limits to domain limits, which apply in aggregate to the host and subdomains
|
2016-06-29 14:56:14 -05:00 |
|
Noah Levitt
|
fabd732b7f
|
couple of fixes for host limits
|
2016-06-24 21:58:37 -05:00 |
|
Noah Levitt
|
2fe0c2f25b
|
support for tallying substats of a configured bucket by host, and enforcing limits host limits using those stats, with tests
|
2016-06-24 20:04:27 -05:00 |
|
Noah Levitt
|
d48e2c462d
|
add a start() method to the two classes that save data to rethinkdb periodically in batches, instead of starting the timer in __init__
|
2016-06-16 00:04:59 +00:00 |
|
Noah Levitt
|
2c65ff89fa
|
add license headers
|
2016-04-06 19:37:55 -07:00 |
|
Noah Levitt
|
1e0a3f0135
|
import dbm only if used
|
2016-01-27 21:18:02 +00:00 |
|
Noah Levitt
|
fb58244c4f
|
update stats in rethinkdb only every 2.0 seconds instead of every 0.5
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
e3a5717446
|
hidden --profile option to enable profiling of warc writer thread and periodic logging of memory usage info; at shutdown, close stats db and unregister from service registry; logging improvements
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
9af17ba7c3
|
update stats batch every 0.5 seconds, since rethinkdb updates were falling way behind sometimes
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
afdb6cf557
|
log status in close()
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
f1362e4da0
|
use only one worker thread for asynchronous rethinkdb stats updates, to fix race condition causing some numbers to be lost
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
95e611a5d0
|
update stats in RethinkDb asynchronously, since profiling shows this to be a bottleneck in WarcWriterThread (which in turn makes it a bottleneck for the whole app)
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
f806cd3e4a
|
use Rethinker.dbname to avoid conflict with rethinkdb.db
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
69d641cd50
|
avoid attempting to create tables with more shards or replicas than the number of servers
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
0171cdd01d
|
fixes for python 2.7
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
3b9345e7d7
|
use nicer rethinkdbstuff.Rethinker api
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
f90c3a6403
|
Rethinker class moved to its own pyrethink project
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
a9986e4ce3
|
fix NameError, quiet logging
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
022f6e7215
|
wrap rethinkdb operations and retry if appropriate (as best as we can tell)
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
c430f81883
|
some refactoring to prep for big rethinkdb capture table
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
f000d413a2
|
quiet stats logging
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
df38cf856d
|
rethinkdb for stats
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
4ce89e6d03
|
basic limits enforcement is working
|
2016-01-26 18:46:13 -08:00 |
|