253 Commits

Author SHA1 Message Date
Noah Levitt
e993b0c28c fix shutdown
at shutdown, abort active connections, but allow completed fetches to
finish processing

this should fix race condition issue at shutdown, where postfetch
processor B would shut down, then postfetch processor A would try to
enqueue more urls, filling up the queue to the point where it blocks
forever, since B is no longer pulling urls off the queue
2018-10-26 13:21:15 -07:00
Noah Levitt
4f01772782 enforce limits on WARCPROX_WRITE_RECORD requests
should make test from previous commit pass
2018-10-10 18:24:54 -07:00
Noah Levitt
57e1b82e3d bump version after merge 2018-09-19 13:03:59 -07:00
Noah Levitt
8f51ba4ab9 bump dev version number after merge 2018-08-16 17:09:35 -07:00
Noah Levitt
f8b86a0122 update cryptography dep version
github tells me there's a vulnerability <2.3
2018-08-16 12:54:30 -07:00
Noah Levitt
17a5fabb75 use SpooledTemporaryFile for WARCPROX_WRITE_RECORD
payloads. because as of https://github.com/internetarchive/brozzler/pull/115
brozzler will be sending big videos via WARCPROX_WRITE_RECORD
2018-08-16 11:08:36 -07:00
Noah Levitt
fbce243787 bump dev version after pull request 2018-07-19 11:18:31 -05:00
Noah Levitt
2df82bd403 record request method in crawl log if not GET 2018-07-17 13:47:52 -05:00
Noah Levitt
8c22c55955 back to dev version number 2018-07-17 12:04:08 -05:00
Noah Levitt
6786a668b1 2.4b2 for pypi 2018-07-17 12:03:26 -05:00
Noah Levitt
8022257a57 setuptools likes README.rst not readme.rst 2018-07-17 16:35:05 +00:00
Noah Levitt
ec7a0bf569 log exception and continue 🤞 if schema reg fails
at trough dedup startup
2018-05-31 16:57:37 -07:00
Noah Levitt
e8cb3afa71 bump dev version after merge 2018-05-31 16:52:37 -07:00
Noah Levitt
b7ebc38491 rename README.rst -> readme.rst 2018-05-21 22:18:28 +00:00
Noah Levitt
997d4341fe add some debug logging in BatchTroughLoader 2018-05-18 17:29:38 -07:00
Noah Levitt
b762d6468b just one should_dedup() for trough dedup
fixes failing test and clarifies things
2018-05-16 14:25:01 -07:00
Noah Levitt
e23af32e94 we want to save all captures to the big "captures"
table, even if we don't want to dedup against them
2018-05-15 15:33:52 -07:00
Noah Levitt
af863c6dba default values for dedup_min_text_size et al
because they may be missing in case warcprox is used as a library
2018-05-15 11:22:10 -07:00
Noah Levitt
15830fc5a2 support "captures-bucket" for backward compatibility 2018-05-09 15:43:39 -07:00
Noah Levitt
6f6a88fc0b bump dev version number after #86 2018-05-03 12:36:16 -07:00
Noah Levitt
a1930495af default to 100 proxy threads, 1 warc writer thread
see https://github.com/internetarchive/warcprox/wiki/benchmarking-number-of-threads
2018-04-12 12:31:04 -07:00
Noah Levitt
ea4fc0f10a include warc writer worker threads in profiling 2018-04-11 22:35:37 +00:00
Noah Levitt
cc8fb4c608 cap the number of urls queued for warc writing 2018-04-11 22:29:50 +00:00
Noah Levitt
cb0dea3739 oops! /status has been lying about queued urls 2018-04-11 22:05:31 +00:00
Noah Levitt
ebf5453c2f bump dev version number after PR 2018-04-06 13:26:56 -07:00
Noah Levitt
cff8423bef bump dev version number after PR 2018-04-06 12:09:33 -07:00
Noah Levitt
385014c322 always call socket.shutdown() to close connections 2018-04-04 17:49:08 -07:00
Noah Levitt
ab52e81019 bump dev version number 2018-04-04 15:45:50 -07:00
Noah Levitt
e989b2f667 work around odd problem (see comment in code) 2018-04-03 11:12:25 -07:00
Noah Levitt
7f1c7f532e stop swallowing exception on _proxy_request() 2018-03-28 18:04:54 -07:00
Noah Levitt
41486f5f82 logging tweaks 2018-03-27 12:51:37 -07:00
Noah Levitt
c79b89108a bump version number after PR #72 2018-03-20 10:53:04 -07:00
Noah Levitt
9bb2018fd2 bump dev version after PR #75 2018-03-12 11:22:05 -07:00
Noah Levitt
45c06eab58 bump dev version number 2018-03-08 16:35:25 -08:00
Noah Levitt
c2172c6b5b make sure to roll over idle warcs
even when warcprox is idle itself
2018-02-28 13:02:03 -08:00
Noah Levitt
8a7ed0cf57 bump dev version number after merge 2018-02-28 11:45:10 -08:00
Noah Levitt
d316569196 bump dev version after revert 2018-02-27 17:28:44 -08:00
Noah Levitt
d29a367db6 bump dev version number after PR merge 2018-02-27 10:33:02 -08:00
Noah Levitt
f3e270b796 make test_method_filter() pass by waiting
in test_limit_large_resource() for url processing to finish, to prevent
stats from affecting the subsequent test
2018-02-20 14:54:58 -08:00
Noah Levitt
6d6f2c9aa0 fix sqlite3 string escaping 2018-02-12 11:42:35 -08:00
Noah Levitt
b2a1f15bf6 clean up test infrastructure
- fix crufty, broken test in setup.py
- include tests in sdist tarball for pypi
2018-02-07 16:06:46 -08:00
Noah Levitt
688e53d889 bump version number after pull request 2018-02-07 15:49:35 -08:00
Noah Levitt
e68be9354d back to dev version number 2018-02-07 15:48:42 -08:00
Noah Levitt
2ceedd3fd2 2.4b1 for pypi 2018-02-07 15:48:42 -08:00
Noah Levitt
322512dab6 bump version number after latest pull request 2018-02-07 15:48:42 -08:00
Noah Levitt
824c194142 make plugin api more flexible 2018-01-24 16:07:45 -08:00
Noah Levitt
5b414102ba respect CA-related command line options 2018-01-24 10:27:40 -08:00
Noah Levitt
1cfb4d46c6 bump version number after pull request 2018-01-22 12:50:16 -08:00
Noah Levitt
41b531e398 use trick to avoid dns looking up local ip 2018-01-21 19:47:15 -08:00
Noah Levitt
de327450ea close open warcs at shutdown 2018-01-21 19:46:31 -08:00