Noah Levitt
|
1bca9d0324
|
don't use http.client.HTTPResponse.getheader() to get the content-type header, because it can return a comma-delimited string
|
2017-09-18 14:45:16 -07:00 |
|
Noah Levitt
|
a8adaaf527
|
Merge pull request #30 from trifle/master
allow zero warc_writer_threads
|
2017-09-12 13:46:12 -07:00 |
|
Noah Levitt
|
a3f84097ee
|
Merge branch 'master' into crawl-log
* master:
no SIGQUIT on windows, so no SIGQUIT handler
https://github.com/internetarchive/warcprox/pull/32 warrants a version bump
fix --size option (https://github.com/internetarchive/warcprox/issues/31)
fix --playback-port option (https://github.com/internetarchive/warcprox/issues/29)
|
2017-09-07 12:28:07 -07:00 |
|
Noah Levitt
|
b89f834ce3
|
no SIGQUIT on windows, so no SIGQUIT handler
|
2017-09-07 12:01:51 -07:00 |
|
Noah Levitt
|
3003c46c10
|
https://github.com/internetarchive/warcprox/pull/32 warrants a version bump
|
2017-09-07 10:33:21 -07:00 |
|
Noah Levitt
|
c73fdd91f8
|
Merge pull request #32 from internetarchive/trough
hello --plugin, goodbye kafka feed
|
2017-09-07 10:31:42 -07:00 |
|
Noah Levitt
|
db0f36c745
|
fix --size option (https://github.com/internetarchive/warcprox/issues/31)
|
2017-09-05 12:43:55 -07:00 |
|
Noah Levitt
|
7e55568851
|
fix --playback-port option (https://github.com/internetarchive/warcprox/issues/29)
|
2017-09-05 12:20:22 -07:00 |
|
Pascal Jürgens
|
940af4e888
|
fix zero-indexing of warc_writer_threads so they can be disabled via empty list
|
2017-08-18 15:52:34 +02:00 |
|
Noah Levitt
|
bac45a9df2
|
create crawl log dir at startup if it doesn't exist
|
2017-08-08 11:54:57 -07:00 |
|
Noah Levitt
|
30b69c5838
|
make test pass with py27
|
2017-08-07 16:21:08 -07:00 |
|
Noah Levitt
|
8a768dcd44
|
fix crawl log test to avoid any dedup collisions
|
2017-08-07 14:06:53 -07:00 |
|
Noah Levitt
|
edcc2cc296
|
fix crawl log test
|
2017-08-07 13:23:51 -07:00 |
|
Noah Levitt
|
ecb07fc9cd
|
heritrix-style crawl log support
|
2017-08-07 13:07:54 -07:00 |
|
Noah Levitt
|
7aed867c90
|
disallow slash and backslash in warc-prefix
|
2017-08-07 11:30:52 -07:00 |
|
Noah Levitt
|
0cf283f058
|
can't see any reason to split the main() like this (anymore?)
|
2017-08-03 15:19:57 -07:00 |
|
Noah Levitt
|
027a242e19
|
add missing dependency warcio to tests_require
|
2017-08-03 15:18:20 -07:00 |
|
Noah Levitt
|
c0cb59e5af
|
Merge branch 'master' into trough
* master:
hidden argument --rethinkdb-big-table-name
try to fix https://github.com/internetarchive/warcprox/issues/27
|
2017-08-03 11:22:27 -07:00 |
|
Noah Levitt
|
13ee68ce4a
|
hidden argument --rethinkdb-big-table-name
|
2017-07-20 12:53:59 -07:00 |
|
Noah Levitt
|
b1a8fecd9d
|
try to fix https://github.com/internetarchive/warcprox/issues/27
|
2017-07-07 14:54:55 -07:00 |
|
Noah Levitt
|
ad3e6f405d
|
call stop() at shutdown if present on plugins
|
2017-06-28 16:40:20 -07:00 |
|
Noah Levitt
|
9ea3540d63
|
fix misuse of +=
|
2017-06-28 14:19:06 -07:00 |
|
Noah Levitt
|
2c95a1f2ee
|
remove kafka feed code
|
2017-06-28 13:12:30 -07:00 |
|
Noah Levitt
|
4c32394256
|
new option --plugin
|
2017-06-28 12:53:34 -07:00 |
|
Noah Levitt
|
e31302a6e3
|
hide kafka options as first step toward removing them
|
2017-06-28 12:03:48 -07:00 |
|
Noah Levitt
|
5a8d1610e6
|
try to work around stupid travis build error, see https://blog.travis-ci.com/2017-06-21-trusty-updates-2017-Q2-launch
|
2017-06-23 14:12:04 -07:00 |
|
Noah Levitt
|
b23e485898
|
simplify recovery of stats batch in case of exception saving them (not sure what was wrong with summy_merge, but this is simpler)
|
2017-06-22 16:54:04 -07:00 |
|
Noah Levitt
|
c0ee9c6093
|
avoid holding the lock, which makes all warc writer threads block, while doing rethinkdb operations, in RethinkStatsDb
|
2017-06-22 16:17:25 -07:00 |
|
Noah Levitt
|
24082c2e8c
|
don't wait for queue to be empty to do idle rollovers, because sometimes warcprox can stay busy for a long, long time
|
2017-06-22 15:04:01 -07:00 |
|
Noah Levitt
|
2f0c4454ac
|
try not to let problems responding to kill -QUIT (which prints stack trace of each thread) kill the whole process
|
2017-06-12 16:51:50 -07:00 |
|
Noah Levitt
|
808950abb4
|
recover properly from exception updating stats in rethinkdb
|
2017-06-12 16:51:45 -07:00 |
|
Noah Levitt
|
1500341875
|
use %r instead of calling repr()
|
2017-06-07 16:05:47 -07:00 |
|
Noah Levitt
|
2f93cdcad9
|
use locking to ensure consistency and avoid this kind of test failure https://travis-ci.org/internetarchive/warcprox/jobs/235819316
|
2017-05-25 17:38:20 +00:00 |
|
Noah Levitt
|
00b982aa24
|
Merge pull request #25 from nlevitt/sqlite
get rid of dbm, switch to sqlite, for easier portability, clarity aro…
|
2017-05-24 14:25:45 -07:00 |
|
Noah Levitt
|
95dfa54968
|
get rid of dbm, switch to sqlite, for easier portability, clarity around threading
|
2017-05-24 13:57:09 -07:00 |
|
Noah Levitt
|
99dd840d20
|
use "ttl" for updated doublethink svc reg api
|
2017-05-23 10:37:39 -07:00 |
|
Noah Levitt
|
aca0b881c6
|
make sure records are written to warc in a predictable order to make tests pass consistently
|
2017-05-19 16:34:27 -07:00 |
|
Noah Levitt
|
ef5dd2e4ae
|
multiple warc writer threads (hacked in with little thought to code organization)
|
2017-05-19 16:10:44 -07:00 |
|
Noah Levitt
|
515dd84aed
|
lock to certauth < 1.2 until we port
|
2017-05-19 15:44:00 -07:00 |
|
Noah Levitt
|
a3dde3d97f
|
fix mistake (incorrect interpration of concurrent.futures.ThreadPoolExecutor internals) that caused unnecessary waits, and unnecessarily long waits, before calling socket.accept()
|
2017-05-12 14:18:35 -07:00 |
|
Noah Levitt
|
fd770b71bc
|
revert stuff accidentally committed as part of eea582c6db9ed6d :(
|
2017-05-11 11:56:01 -07:00 |
|
Noah Levitt
|
621ebb91ea
|
use request count and payload size to specify length of benchmark run
|
2017-05-10 18:58:19 +00:00 |
|
Noah Levitt
|
2a0c8c28c9
|
improvements to run-benchmark.py, primarily to actually make multiple requests in parallel
|
2017-05-10 18:01:56 +00:00 |
|
Noah Levitt
|
eea582c6db
|
rewrite run-benchmarks.py for aiohttp2
|
2017-05-08 20:56:32 -07:00 |
|
Noah Levitt
|
c87ff90bc1
|
move more stuff in do_COMMAND inside the try block so that exceptions result in a 500 response
|
2017-05-05 13:44:46 -07:00 |
|
Noah Levitt
|
c642565ad8
|
bump up the socket backlog argument to try to stop kernel closing attempted connections on linux
|
2017-05-05 18:49:56 +00:00 |
|
Noah Levitt
|
b2f08535ae
|
set method when creating ProxyingRecordingHTTPResponse so that it knows when to close the connection, and HEAD requests don't sit around trying to read more data until socket timeout
|
2017-05-04 12:54:04 -07:00 |
|
Noah Levitt
|
11e11f4e68
|
early trace-level logging of the requestline
|
2017-05-03 18:39:57 -07:00 |
|
Noah Levitt
|
c0e6c219ca
|
python2 fixes
|
2017-04-28 11:12:17 -07:00 |
|
Noah Levitt
|
338e5cd878
|
comment out debug logging thing
|
2017-04-28 11:08:41 -07:00 |
|