1
0
mirror of https://github.com/internetarchive/warcprox.git synced 2025-01-18 13:22:09 +01:00

320 Commits

Author SHA1 Message Date
Barbara Miller
3023484cfc Merge branch 'limit_revisits' into qa 2023-08-03 14:10:29 -07:00
Barbara Miller
a86169c56c pep0440 version id (I think) 2023-08-03 14:09:23 -07:00
Barbara Miller
b5ca3b6db6 Merge branch 'idna_unpeg' into qa 2023-07-11 13:28:43 -07:00
Vangelis Banos
6eb2bd1265 Drop idna==2.10 version lock
There is no need to use such an old `idna` version.
The latest works with py35+ and all tests pass.
Newer `idna` supports the latest Unicode standard and latest python
versions.
https://github.com/kjd/idna/blob/master/HISTORY.rst
2023-07-09 10:02:13 +00:00
Vangelis Banos
83c109bc9b Change cryptography version limit to >=2.3,<40 2023-06-22 12:22:24 +00:00
Barbara Miller
ccb67e03b4 bump qa version 2023-06-19 14:30:33 -07:00
Barbara Miller
98f7032419 cryptography <40 2023-06-19 14:30:33 -07:00
Vangelis Banos
aa8948e612 Limit dependency version cryptography>=2.3,<=39.0.0
cryptography 41.0.0 crashes warcprox with the following exception:
```
File "/opt/spn2/lib/python3.8/site-packages/warcprox/main.py", line 317, in main
  cryptography.hazmat.backends.openssl.backend.activate_builtin_random()
AttributeError: 'Backend' object has no attribute 'activate_builtin_random'
```

Also, cryptography==40.0.0 isn't OK because when I try to use it I get:
```
pyopenssl 23.2.0 requires cryptography!=40.0.0,!=40.0.1,<42,>=38.0.0, but you have cryptography 40.0.0 which is incompatible.
```

So, the version should be <=39.0.0
2023-06-19 14:30:33 -07:00
Vangelis Banos
1cc08233d6 Limit dependency version cryptography>=2.3,<=39.0.0
cryptography 41.0.0 crashes warcprox with the following exception:
```
File "/opt/spn2/lib/python3.8/site-packages/warcprox/main.py", line 317, in main
  cryptography.hazmat.backends.openssl.backend.activate_builtin_random()
AttributeError: 'Backend' object has no attribute 'activate_builtin_random'
```

Also, cryptography==40.0.0 isn't OK because when I try to use it I get:
```
pyopenssl 23.2.0 requires cryptography!=40.0.0,!=40.0.1,<42,>=38.0.0, but you have cryptography 40.0.0 which is incompatible.
```

So, the version should be <=39.0.0
2023-06-18 09:09:07 +00:00
Barbara Miller
d98896094a bump qa version 2023-06-05 13:47:14 -07:00
Barbara Miller
9973d28de9 bump version 2022-08-04 17:28:33 -07:00
Barbara Miller
ee9e375560 zlib decompression 2022-08-04 11:14:33 -07:00
Barbara Miller
c008c2eca7
bump version 2022-07-01 14:18:17 -07:00
Adam Miller
daa925db17
Bump version 2022-04-26 09:55:48 -07:00
Adam Miller
d96dd5d842 Adjust rfc3986 package version for deployment across more versions 2022-04-21 18:37:27 +00:00
Adam Miller
1e3d22aba4 Better handle non-ascii urls for crawl log hop info 2022-04-20 22:48:28 +00:00
Barbara Miller
a969430b37
Merge pull request from internetarchive/idna2_10
idna==2.10
2021-12-28 13:50:23 -08:00
Barbara Miller
aeecb6515f
bump version 2021-12-28 11:58:30 -08:00
Barbara Miller
e61099ff5f idna==2.10 2021-04-27 10:26:45 -07:00
Barbara Miller
ae11daedc1
bump version 2020-08-18 09:29:57 -07:00
Christian Clauss
c649355285
setup.py: Add Python 3.8 2020-08-06 17:58:00 +02:00
Noah Levitt
de9219e646 require more recent urllib3
to avoid this error: https://github.com/internetarchive/warcprox/issues/148

2020-01-28 14:42:44,851 2023 ERROR MitmProxyHandler(tid=2037,started=2020-01-28T20:42:44.834551,client=127.0.0.1:49100) warcprox.warcprox.WarcProxyHandler.do_COMMAND(mitmproxy.py:442) problem processing request 'GET / HTTP/1.1': TypeError("connection_from_host() got an unexpected keyword argument 'pool_kwargs'",)
Traceback (most recent call last):
  File "/usr/local/lib/python3.5/dist-packages/warcprox/mitmproxy.py", line 413, in do_COMMAND
    self._connect_to_remote_server()
  File "/usr/local/lib/python3.5/dist-packages/warcprox/warcproxy.py", line 189, in _connect_to_remote_server
    return warcprox.mitmproxy.MitmProxyHandler._connect_to_remote_server(self)
  File "/usr/local/lib/python3.5/dist-packages/warcprox/mitmproxy.py", line 277, in _connect_to_remote_server
    pool_kwargs={'maxsize': 12, 'timeout': self._socket_timeout})
TypeError: connection_from_host() got an unexpected keyword argument 'pool_kwargs'
2020-02-06 10:10:53 -08:00
Noah Levitt
90fba01514 make trough dependency optional 2020-01-08 13:37:01 -08:00
Noah Levitt
a8cd53bfe4 bump version, trough dep version 2020-01-08 13:24:00 -08:00
Noah Levitt
469b41773a fix logging config which trough interfered with 2020-01-07 15:19:03 -08:00
Noah Levitt
91fcc054c4 bump version after merge 2020-01-07 14:42:40 -08:00
Noah Levitt
f54e1b37c7 bump version after merge 2020-01-07 14:40:58 -08:00
Noah Levitt
fe19bb268f use trough.client instead of warcprox.trough
less redundant code!
trough.client was based off of warcprox.trough but has been improved
since then
2019-11-19 11:45:14 -08:00
Noah Levitt
f77c152037 bump version after merge 2019-09-26 11:49:07 -07:00
Noah Levitt
1f852f5f36 bump version after merges 2019-09-23 11:55:00 -07:00
Noah Levitt
88a7f79a7e bump version 2019-09-13 10:58:16 -07:00
Noah Levitt
1aa6b0c5d6 log remote host/ip/port on SSLError 2019-08-16 18:31:35 +00:00
Noah Levitt
fce1c3d722 requests/urllib3 version conflict from april must
be obsolete by now...
2019-07-26 14:03:36 -07:00
Noah Levitt
932001c921 bump version after merge 2019-06-20 14:57:36 -07:00
Noah Levitt
81a945e840 bump version after a few small PRs 2019-06-11 10:58:52 -07:00
Noah Levitt
8c31ec2916 bigger connection pool, for Vangelis 2019-05-15 16:06:42 -07:00
Noah Levitt
bbf3fad1dc avoid using warcproxy.py stuff in mitmproxy.py 2019-05-15 15:58:47 -07:00
Noah Levitt
f51f2ec225 some tweaks to error responses
use 502, 504 when appropriate, and don't send `str(e)` as in the http
status line, because that is often an ugly jumble
2019-05-14 15:51:11 -07:00
Noah Levitt
2772b80fab bump version after merge 2019-05-14 15:50:59 -07:00
Vangelis Banos
89d987a181 Cache bad target hostname:port to avoid reconnection attempts
If connection to a hostname:port fails, add it to a `TTLCache` with
60 sec expiration time. Subsequent requests to the same hostname:port
return really quickly as we check the cache and avoid trying a new
network connection.

The short expiration time guarantees that if a host becomes OK again,
we'll be able to connect to it quickly.

Adding `cachetools` dependency was necessary as there isn't any other
way to have an expiring in-memory cache using stdlib. The library
doesn't have any other dependencies, it has good test coverage and seems
maintained. It also supports Python 3.7.
2019-05-09 10:03:16 +00:00
Noah Levitt
41d7f0be53 bump version after merges 2019-05-06 16:49:35 -07:00
Noah Levitt
dfc081fff8 do not write incorrect warc-payload-digest to...
... request records

see https://github.com/webrecorder/warcio/issues/74#issuecomment-487816378
2019-05-02 14:25:29 -07:00
Noah Levitt
38d6e4337d handle graceful shutdown failure
print stack trace and kill myself -9
2019-04-24 13:14:12 -07:00
Noah Levitt
de01d498cb requests/urllib3 version conflict 2019-04-24 12:11:20 -07:00
Noah Levitt
3298128e0c deal with bad content-type header
we had bad stuff get into a crawl log because of a url that returned a
Content-Type header value with spaces in it (but no semicolon)
2019-04-24 10:40:22 -07:00
Noah Levitt
f207e32f50 followup on IncompleteRead 2019-04-15 00:17:50 -07:00
Noah Levitt
5de2569430 bump version after merge 2019-04-13 18:11:02 -07:00
Noah Levitt
98b3c1f80b bump version after merge 2019-04-09 21:52:31 +00:00
Noah Levitt
2ca84ae023 bump version after merge 2019-04-08 11:50:27 -07:00
Noah Levitt
794cc29c80 bump version 2019-03-21 16:04:05 -07:00