Noah Levitt
|
e3a5717446
|
hidden --profile option to enable profiling of warc writer thread and periodic logging of memory usage info; at shutdown, close stats db and unregister from service registry; logging improvements
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
7eb82ab8a2
|
adding missing import, remove unused method, logging tweaks, avoid exception at shutdown joining unstarted timer thread
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
f38ce708bf
|
set PYTHONDONTWRITEBYTECODE in one place
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
18cc818cf0
|
more timing tweaks to make sure tests pass, improved logging etc
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
fcaaa7b09b
|
include tid in thread name for more threads (linux only) for correlation with top -H
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
a9fc550453
|
oops, argparse.SUPPRESS isn't supposed to be in quotes
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
95ef8b80b0
|
make sure load score for service registry is a float; comment out memory debugging call; close dedup db after warc writer thread finishes
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
9af17ba7c3
|
update stats batch every 0.5 seconds, since rethinkdb updates were falling way behind sometimes
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
783e730e52
|
insert captures entries in batch every 0.5 seconds, since rethinkdb updates were falling way behind sometimes
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
afdb6cf557
|
log status in close()
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
93a2e4ff85
|
.travis.yml - disable pypy (not working because of cryptography library), require docker service
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
9f84c20274
|
test with rethinkdb flags too
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
3e2696525b
|
make sure svcreg is set
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
248d110f81
|
add port to service registry, fix bug with service hearbeat
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
e67c7be5bc
|
service registry init
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
2ecd2facd9
|
surt 0.3b2 is in pypi now, no need for devpi
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
4dcaedb5d9
|
py.test the right thing
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
d7d992731c
|
register self for service discovery
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
7e731d40bc
|
try new travis docker-based infrastructure, more versions of python
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
465cf1ef45
|
./tests/run-tests.sh is better than tox
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
ca4c62fc6d
|
don't load dedup info for empty payload
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
3363b2ec95
|
continue after unexpected error
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
fd847f01cd
|
log error but don't give up if there is >1 record with same digest
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
3e1566cd6f
|
update big captures table asynchronously
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
f1362e4da0
|
use only one worker thread for asynchronous rethinkdb stats updates, to fix race condition causing some numbers to be lost
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
4930cc2d24
|
try to avoid conflicts with *.pyc files from outside of the docker tests
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
818bdda687
|
fix NameError, twiddles
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
6476262f11
|
run warc writer thread with profiling enabled, dump results when shutting down
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
e0fe06c891
|
make warcprox finish writing all urls in the queue before shutting down
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
1b8d83203c
|
tweaks to memory debugging
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
03c506dade
|
stop after first failing test, use py.test -s
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
2169369dab
|
working on benchmarking code... so far they seem to reveal that warcprox behaves poorly under load (perhaps timeouts are configured too short?)
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
95e611a5d0
|
update stats in RethinkDb asynchronously, since profiling shows this to be a bottleneck in WarcWriterThread (which in turn makes it a bottleneck for the whole app)
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
6b3cd9de2e
|
make note of extra packages needed on ubuntu
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
a41c426b0a
|
giving up on using git revision in version number :( latest issue is when installing a package that calls git to compute a version number, but cwd is some other git project, you get the wrong thing
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
97a30eb319
|
back to setup.py now that we have devpi
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
f806cd3e4a
|
use Rethinker.dbname to avoid conflict with rethinkdb.db
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
28d213fb18
|
spin up rethinkdb in docker, run tests in there
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
69d641cd50
|
avoid attempting to create tables with more shards or replicas than the number of servers
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
0171cdd01d
|
fixes for python 2.7
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
abc2d28787
|
report actual exception, avoid incomprehensible error message "TypeError: NoneType object is not callable" in python2 (apparently due to fact that BaseHTTPServer.BaseHTTPRequestHandler is an old-style class)
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
4c380dcc41
|
move tests out of installed package dir
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
dd1c7b5f7d
|
don't implement __del__, maybe it can cause mem leaks; bunch of logging to try to detect leaks
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
3b9345e7d7
|
use nicer rethinkdbstuff.Rethinker api
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
f90c3a6403
|
Rethinker class moved to its own pyrethink project
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
2e482d67cc
|
more patience waiting for warc writer thread
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
12432b23ae
|
for captures table generate canonical surt with scheme://
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
686a297f98
|
fixes to let screenshot recordss be saved in big capture tables for wayback playback
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
c02c98e369
|
make sure warc headers are bytes
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
6da3dd50ac
|
include thread pid in thread name (linux-specific, not sure what happens on other systems)
|
2016-01-26 18:47:08 -08:00 |
|