Noah Levitt
|
2c65ff89fa
|
add license headers
|
2016-04-06 19:37:55 -07:00 |
|
Noah Levitt
|
46887f7594
|
better handle exceptions from listeners
|
2016-03-03 18:59:13 +00:00 |
|
Noah Levitt
|
927419645b
|
use rethinkdb native time type for captures table timestamp
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
e3a5717446
|
hidden --profile option to enable profiling of warc writer thread and periodic logging of memory usage info; at shutdown, close stats db and unregister from service registry; logging improvements
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
fcaaa7b09b
|
include tid in thread name for more threads (linux only) for correlation with top -H
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
3363b2ec95
|
continue after unexpected error
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
6476262f11
|
run warc writer thread with profiling enabled, dump results when shutting down
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
e0fe06c891
|
make warcprox finish writing all urls in the queue before shutting down
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
dd1c7b5f7d
|
don't implement __del__, maybe it can cause mem leaks; bunch of logging to try to detect leaks
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
12432b23ae
|
for captures table generate canonical surt with scheme://
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
6da3dd50ac
|
include thread pid in thread name (linux-specific, not sure what happens on other systems)
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
fee200c72c
|
get rid of silly _decode because we know which fields are bytes and which str
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
44a62111fb
|
support for deduplication buckets specified in warcprox-meta header {"captures-bucket":...,...}
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
c430f81883
|
some refactoring to prep for big rethinkdb capture table
|
2016-01-26 18:47:08 -08:00 |
|
Noah Levitt
|
e66dc3a9fb
|
rethinkdb dedup
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
4ce89e6d03
|
basic limits enforcement is working
|
2016-01-26 18:46:13 -08:00 |
|
Noah Levitt
|
274a2f6b1d
|
refactor warc writing, deduplication for somewhat cleaner separation of concerns
|
2016-01-26 18:45:36 -08:00 |
|