17 Commits

Author SHA1 Message Date
Noah Levitt
2c65ff89fa add license headers 2016-04-06 19:37:55 -07:00
Noah Levitt
46887f7594 better handle exceptions from listeners 2016-03-03 18:59:13 +00:00
Noah Levitt
927419645b use rethinkdb native time type for captures table timestamp 2016-01-26 18:47:08 -08:00
Noah Levitt
e3a5717446 hidden --profile option to enable profiling of warc writer thread and periodic logging of memory usage info; at shutdown, close stats db and unregister from service registry; logging improvements 2016-01-26 18:47:08 -08:00
Noah Levitt
fcaaa7b09b include tid in thread name for more threads (linux only) for correlation with top -H 2016-01-26 18:47:08 -08:00
Noah Levitt
3363b2ec95 continue after unexpected error 2016-01-26 18:47:08 -08:00
Noah Levitt
6476262f11 run warc writer thread with profiling enabled, dump results when shutting down 2016-01-26 18:47:08 -08:00
Noah Levitt
e0fe06c891 make warcprox finish writing all urls in the queue before shutting down 2016-01-26 18:47:08 -08:00
Noah Levitt
dd1c7b5f7d don't implement __del__, maybe it can cause mem leaks; bunch of logging to try to detect leaks 2016-01-26 18:47:08 -08:00
Noah Levitt
12432b23ae for captures table generate canonical surt with scheme:// 2016-01-26 18:47:08 -08:00
Noah Levitt
6da3dd50ac include thread pid in thread name (linux-specific, not sure what happens on other systems) 2016-01-26 18:47:08 -08:00
Noah Levitt
fee200c72c get rid of silly _decode because we know which fields are bytes and which str 2016-01-26 18:47:08 -08:00
Noah Levitt
44a62111fb support for deduplication buckets specified in warcprox-meta header {"captures-bucket":...,...} 2016-01-26 18:47:08 -08:00
Noah Levitt
c430f81883 some refactoring to prep for big rethinkdb capture table 2016-01-26 18:47:08 -08:00
Noah Levitt
e66dc3a9fb rethinkdb dedup 2016-01-26 18:46:13 -08:00
Noah Levitt
4ce89e6d03 basic limits enforcement is working 2016-01-26 18:46:13 -08:00
Noah Levitt
274a2f6b1d refactor warc writing, deduplication for somewhat cleaner separation of concerns 2016-01-26 18:45:36 -08:00