22 Commits

Author SHA1 Message Date
Noah Levitt
734b2f5396 limit max number of threads to 500; make sure connection with proxy client has a timeout; log errors from connection with proxy client 2016-01-26 18:47:08 -08:00
Noah Levitt
e3a5717446 hidden --profile option to enable profiling of warc writer thread and periodic logging of memory usage info; at shutdown, close stats db and unregister from service registry; logging improvements 2016-01-26 18:47:08 -08:00
Noah Levitt
a9fc550453 oops, argparse.SUPPRESS isn't supposed to be in quotes 2016-01-26 18:47:08 -08:00
Noah Levitt
3e2696525b make sure svcreg is set 2016-01-26 18:47:08 -08:00
Noah Levitt
d7d992731c register self for service discovery 2016-01-26 18:47:08 -08:00
Noah Levitt
2169369dab working on benchmarking code... so far they seem to reveal that warcprox behaves poorly under load (perhaps timeouts are configured too short?) 2016-01-26 18:47:08 -08:00
Noah Levitt
a41c426b0a giving up on using git revision in version number :( latest issue is when installing a package that calls git to compute a version number, but cwd is some other git project, you get the wrong thing 2016-01-26 18:47:08 -08:00
Noah Levitt
3b9345e7d7 use nicer rethinkdbstuff.Rethinker api 2016-01-26 18:47:08 -08:00
Noah Levitt
d98f03012b kafka capture feed, for druid 2016-01-26 18:47:08 -08:00
Noah Levitt
44a62111fb support for deduplication buckets specified in warcprox-meta header {"captures-bucket":...,...} 2016-01-26 18:47:08 -08:00
Noah Levitt
6d673ee35f tests pass with big rethinkdb captures table 2016-01-26 18:47:08 -08:00
Noah Levitt
c430f81883 some refactoring to prep for big rethinkdb capture table 2016-01-26 18:47:08 -08:00
Noah Levitt
df38cf856d rethinkdb for stats 2016-01-26 18:46:13 -08:00
Noah Levitt
e66dc3a9fb rethinkdb dedup 2016-01-26 18:46:13 -08:00
Noah Levitt
a876152026 fix exception, make some tweaks 2016-01-26 18:46:13 -08:00
Noah Levitt
4ce89e6d03 basic limits enforcement is working 2016-01-26 18:46:13 -08:00
Noah Levitt
274a2f6b1d refactor warc writing, deduplication for somewhat cleaner separation of concerns 2016-01-26 18:45:36 -08:00
Noah Levitt
771383d0a6 refactor proxy handler to use do_* methods for custom http verbs; refactor warc writer thread to use new WarcWriterPool class 2016-01-26 18:45:36 -08:00
Noah Levitt
084bd75ed6 dump thread tracebacks on sigquit, more logging and exception handling tweaks 2016-01-26 18:45:12 -08:00
Noah Levitt
0647c0c76d support for writing to different warcs based on Warcprox-Meta http request header warc-prefix setting 2016-01-26 18:44:16 -08:00
Ilya Kreymer
574f1f3f52 remove certauth.py and use the seperate certauth package release 2015-03-30 09:32:10 -07:00
Noah Levitt
a2c25d4242 split into even more source files 2014-11-20 00:04:43 -08:00