302 Commits

Author SHA1 Message Date
Noah Levitt
4f01772782 enforce limits on WARCPROX_WRITE_RECORD requests
should make test from previous commit pass
2018-10-10 18:24:54 -07:00
Noah Levitt
57e1b82e3d bump version after merge 2018-09-19 13:03:59 -07:00
Noah Levitt
8f51ba4ab9 bump dev version number after merge 2018-08-16 17:09:35 -07:00
Noah Levitt
f8b86a0122 update cryptography dep version
github tells me there's a vulnerability <2.3
2018-08-16 12:54:30 -07:00
Noah Levitt
17a5fabb75 use SpooledTemporaryFile for WARCPROX_WRITE_RECORD
payloads. because as of https://github.com/internetarchive/brozzler/pull/115
brozzler will be sending big videos via WARCPROX_WRITE_RECORD
2018-08-16 11:08:36 -07:00
Noah Levitt
fbce243787 bump dev version after pull request 2018-07-19 11:18:31 -05:00
Noah Levitt
2df82bd403 record request method in crawl log if not GET 2018-07-17 13:47:52 -05:00
Noah Levitt
8c22c55955 back to dev version number 2018-07-17 12:04:08 -05:00
Noah Levitt
6786a668b1 2.4b2 for pypi 2018-07-17 12:03:26 -05:00
Noah Levitt
8022257a57 setuptools likes README.rst not readme.rst 2018-07-17 16:35:05 +00:00
Noah Levitt
ec7a0bf569 log exception and continue 🤞 if schema reg fails
at trough dedup startup
2018-05-31 16:57:37 -07:00
Noah Levitt
e8cb3afa71 bump dev version after merge 2018-05-31 16:52:37 -07:00
Noah Levitt
b7ebc38491 rename README.rst -> readme.rst 2018-05-21 22:18:28 +00:00
Noah Levitt
997d4341fe add some debug logging in BatchTroughLoader 2018-05-18 17:29:38 -07:00
Noah Levitt
b762d6468b just one should_dedup() for trough dedup
fixes failing test and clarifies things
2018-05-16 14:25:01 -07:00
Noah Levitt
e23af32e94 we want to save all captures to the big "captures"
table, even if we don't want to dedup against them
2018-05-15 15:33:52 -07:00
Noah Levitt
af863c6dba default values for dedup_min_text_size et al
because they may be missing in case warcprox is used as a library
2018-05-15 11:22:10 -07:00
Noah Levitt
15830fc5a2 support "captures-bucket" for backward compatibility 2018-05-09 15:43:39 -07:00
Noah Levitt
6f6a88fc0b bump dev version number after #86 2018-05-03 12:36:16 -07:00
Noah Levitt
a1930495af default to 100 proxy threads, 1 warc writer thread
see https://github.com/internetarchive/warcprox/wiki/benchmarking-number-of-threads
2018-04-12 12:31:04 -07:00
Noah Levitt
ea4fc0f10a include warc writer worker threads in profiling 2018-04-11 22:35:37 +00:00
Noah Levitt
cc8fb4c608 cap the number of urls queued for warc writing 2018-04-11 22:29:50 +00:00
Noah Levitt
cb0dea3739 oops! /status has been lying about queued urls 2018-04-11 22:05:31 +00:00
Noah Levitt
ebf5453c2f bump dev version number after PR 2018-04-06 13:26:56 -07:00
Noah Levitt
cff8423bef bump dev version number after PR 2018-04-06 12:09:33 -07:00
Noah Levitt
385014c322 always call socket.shutdown() to close connections 2018-04-04 17:49:08 -07:00
Noah Levitt
ab52e81019 bump dev version number 2018-04-04 15:45:50 -07:00
Noah Levitt
e989b2f667 work around odd problem (see comment in code) 2018-04-03 11:12:25 -07:00
Noah Levitt
7f1c7f532e stop swallowing exception on _proxy_request() 2018-03-28 18:04:54 -07:00
Noah Levitt
41486f5f82 logging tweaks 2018-03-27 12:51:37 -07:00
Noah Levitt
c79b89108a bump version number after PR #72 2018-03-20 10:53:04 -07:00
Noah Levitt
9bb2018fd2 bump dev version after PR #75 2018-03-12 11:22:05 -07:00
Noah Levitt
45c06eab58 bump dev version number 2018-03-08 16:35:25 -08:00
Noah Levitt
c2172c6b5b make sure to roll over idle warcs
even when warcprox is idle itself
2018-02-28 13:02:03 -08:00
Noah Levitt
8a7ed0cf57 bump dev version number after merge 2018-02-28 11:45:10 -08:00
Noah Levitt
d316569196 bump dev version after revert 2018-02-27 17:28:44 -08:00
Noah Levitt
d29a367db6 bump dev version number after PR merge 2018-02-27 10:33:02 -08:00
Noah Levitt
f3e270b796 make test_method_filter() pass by waiting
in test_limit_large_resource() for url processing to finish, to prevent
stats from affecting the subsequent test
2018-02-20 14:54:58 -08:00
Noah Levitt
6d6f2c9aa0 fix sqlite3 string escaping 2018-02-12 11:42:35 -08:00
Noah Levitt
b2a1f15bf6 clean up test infrastructure
- fix crufty, broken test in setup.py
- include tests in sdist tarball for pypi
2018-02-07 16:06:46 -08:00
Noah Levitt
688e53d889 bump version number after pull request 2018-02-07 15:49:35 -08:00
Noah Levitt
e68be9354d back to dev version number 2018-02-07 15:48:42 -08:00
Noah Levitt
2ceedd3fd2 2.4b1 for pypi 2018-02-07 15:48:42 -08:00
Noah Levitt
322512dab6 bump version number after latest pull request 2018-02-07 15:48:42 -08:00
Noah Levitt
824c194142 make plugin api more flexible 2018-01-24 16:07:45 -08:00
Noah Levitt
5b414102ba respect CA-related command line options 2018-01-24 10:27:40 -08:00
Noah Levitt
1cfb4d46c6 bump version number after pull request 2018-01-22 12:50:16 -08:00
Noah Levitt
41b531e398 use trick to avoid dns looking up local ip 2018-01-21 19:47:15 -08:00
Noah Levitt
de327450ea close open warcs at shutdown 2018-01-21 19:46:31 -08:00
Noah Levitt
4b53c10132 bump minor version after these big changes 2018-01-19 14:37:53 -08:00