1015 Commits

Author SHA1 Message Date
vbanos
bfe18aeaf1 Do not generate an RSA private key for every https connection
We can reuse the RSA private key we create or load on
`CertificateAuthority.__init__`. There is no need to create another one
for each host we connect to.

`rsa.generate_private_key` is a very slow function.
2024-12-05 16:28:08 +01:00
Barbara Miller
6028e523f3
Merge pull request #206 from internetarchive/trough_dep
update extras trough dependency for pypi
2024-11-05 19:15:07 -08:00
Barbara Miller
7ce00f001c update extras trough dependency for pypi 2024-11-05 19:11:55 -08:00
Barbara Miller
0e565889e1
Merge pull request #205 from internetarchive/for_pypi
updates for pypi update v.2.6.0
2024-11-05 18:11:37 -08:00
Barbara Miller
01832c3cc5 for pypi v.2.6.0 2024-11-05 18:05:51 -08:00
Barbara Miller
ef774f5f29
Merge pull request #204 from galgeek/doublethink_up
update doublethink dependency
2024-10-31 11:29:36 -07:00
Barbara Miller
c3ce3b160a update doublethink dependency 2024-10-31 11:10:47 -07:00
Barbara Miller
14d2a0c005
Merge pull request #201 from vbanos/pyopenssl-cryptography
Upgrade cryptography dependency to >=39,<40
2024-07-28 10:15:35 -07:00
Vangelis Banos
aef8ca7012 Upgrade cryptography dependency to >=39,<40
warcprox crashes with the following error when using
`cryptography==35.0.0`.

```
ValueError: Valid PEM but no BEGIN CERTIFICATE/END CERTIFICATE delimiters. Are you sure this is a certificate?
Traceback (most recent call last):
  File "/opt/spn2/bin/warcprox", line 8, in <module>
    sys.exit(main())
  File "/opt/spn2/lib/python3.8/site-packages/warcprox/main.py", line 330, in main
    controller = warcprox.controller.WarcproxController(options)
  File "/opt/spn2/lib/python3.8/site-packages/warcprox/controller.py", line 145, in __init__
    self.proxy = warcprox.warcproxy.WarcProxy(
  File "/opt/spn2/lib/python3.8/site-packages/warcprox/warcproxy.py", line 561, in __init__
    SingleThreadedWarcProxy.__init__(
  File "/opt/spn2/lib/python3.8/site-packages/warcprox/warcproxy.py", line 509, in __init__
    warcprox.mitmproxy.SingleThreadedMitmProxy.__init__(
  File "/opt/spn2/lib/python3.8/site-packages/warcprox/mitmproxy.py", line 861, in __init__
    self.ca = CertificateAuthority(
  File "/opt/spn2/lib/python3.8/site-packages/warcprox/certauth.py", line 69, in __init__
    self.cert, self.key = self.read_pem(ca_file)
  File "/opt/spn2/lib/python3.8/site-packages/warcprox/certauth.py", line 210, in read_pem
    cert = x509.load_pem_x509_certificate(f.read(), default_backend())
  File "/opt/spn2/lib/python3.8/site-packages/cryptography/x509/base.py", line 436, in load_pem_x509_certificate
    return rust_x509.load_pem_x509_certificate(data)
ValueError: Valid PEM but no BEGIN CERTIFICATE/END CERTIFICATE delimiters. Are you sure this is a certificate?
```
2024-07-28 10:01:01 +00:00
Barbara Miller
701b659510
Merge pull request #200 from vbanos/pyopenssl-cryptography
Thank you, @vbanos!

Replace PyOpenSSL with cryptography
2024-07-27 09:09:29 -07:00
Vangelis Banos
10d36cc943 Replace PyOpenSSL with cryptography
PyOpenSSL is deprecated. We replace it with `cryptography` following
their recommendation at: https://pypi.org/project/pyOpenSSL/

We drop the `pyopenssl` dependency.
2024-07-26 13:04:15 +00:00
Barbara Miller
a65b8b82b9
bump version 2024-07-24 17:10:27 -07:00
Barbara Miller
6756ba60fa
Merge pull request #199 from vbanos/add-certauth
Create warcprox.certauth and drop certauth dependency
2024-07-24 17:09:19 -07:00
Vangelis Banos
2068c037ea Create warcprox.certauth and drop certauth dependency
Copy certauth.py and tests_certauth.gr from `certauth==1.1.6`
b526eb2bfd

Change only imports.

Drop unused imports.

Update setup.py: drop `certauth` and add `pyopenssl`.
2024-07-09 11:56:06 +00:00
Barbara Miller
f00ca5c336
Update copyright 2024-06-04 11:48:25 -07:00
Barbara Miller
c0ea6ef00f
bump version 2024-06-04 11:46:59 -07:00
Barbara Miller
f7d4286b54
Merge pull request #198 from vbanos/subdir-prefix
New option --subdir-prefix
2024-06-04 11:46:07 -07:00
Vangelis Banos
56e0b17dc9 New option --subdir-prefix
Save WARCs in subdirectories equal to the current value of Warcprox-Meta['warc-prefix'].
E.g. if warc-prefix=='spn2' and --dir=/warcs, save them in /warcs/spn2/.
2024-06-03 21:21:19 +00:00
Barbara Miller
af52dec469
bump version 2023-10-17 09:19:56 -07:00
Barbara Miller
848c089afa
Merge pull request #194 from vbanos/socksproxy
Thank you, @vbanos!
2023-10-17 09:18:11 -07:00
Vangelis Banos
9fd5a22502 fix typo 2023-10-17 06:12:28 +00:00
Vangelis Banos
3d653e023c Add SOCKS proxy options
Add options `--socks-proxy`, `--socks-proxy-username,
`--socks-proxy-password`.

If enabled, all traffic is routed throught the SOCKS proxy.
2023-10-16 18:33:42 +00:00
Barbara Miller
4cb8e0d5dc
Merge pull request #192 from internetarchive/Py311
updates for 3.11 (and back to 3.8)
@vbanos and @avdempsey have agreed this PR is ok to merge
2023-09-27 12:03:26 -07:00
Barbara Miller
a20ad226cb
update version to 2.5, for Python version updates 2023-09-27 11:58:39 -07:00
Barbara Miller
bc0da12c48
bump version for Py311 2023-09-20 10:57:54 -07:00
Barbara Miller
8f0039de02 internetarchive/doublethink.git@Py311 2023-09-19 13:57:34 -07:00
Barbara Miller
c620d7dd19 use galgeek for now 2023-09-13 18:03:38 -07:00
Barbara Miller
4fbf523a3e get doublethink from github.com/internetarchive 2023-09-12 16:05:23 -07:00
Barbara Miller
3b5d9d8ef0 update rethinkdb import 2023-09-12 14:39:09 -07:00
Barbara Miller
5e779af2e9 trough and doublethink updates 2023-09-11 17:38:10 -07:00
Barbara Miller
a90c9c3dd4 trough 0.20 maybe 2023-09-11 17:01:02 -07:00
Barbara Miller
99a825c055 initial commit, trying trough branch jammy+focal 2023-09-11 16:40:39 -07:00
Barbara Miller
c01d58df78
Merge pull request #189 from vbanos/idna-update
Thank you, @vbanos!
2023-07-11 14:13:47 -07:00
Vangelis Banos
6eb2bd1265 Drop idna==2.10 version lock
There is no need to use such an old `idna` version.
The latest works with py35+ and all tests pass.
Newer `idna` supports the latest Unicode standard and latest python
versions.
https://github.com/kjd/idna/blob/master/HISTORY.rst
2023-07-09 10:02:13 +00:00
Barbara Miller
d864ea91ee
Merge pull request #187 from vbanos/cryptography-limit
Thanks, @vbanos!
2023-06-22 08:55:33 -07:00
Vangelis Banos
83c109bc9b Change cryptography version limit to >=2.3,<40 2023-06-22 12:22:24 +00:00
Vangelis Banos
1cc08233d6 Limit dependency version cryptography>=2.3,<=39.0.0
cryptography 41.0.0 crashes warcprox with the following exception:
```
File "/opt/spn2/lib/python3.8/site-packages/warcprox/main.py", line 317, in main
  cryptography.hazmat.backends.openssl.backend.activate_builtin_random()
AttributeError: 'Backend' object has no attribute 'activate_builtin_random'
```

Also, cryptography==40.0.0 isn't OK because when I try to use it I get:
```
pyopenssl 23.2.0 requires cryptography!=40.0.0,!=40.0.1,<42,>=38.0.0, but you have cryptography 40.0.0 which is incompatible.
```

So, the version should be <=39.0.0
2023-06-18 09:09:07 +00:00
Barbara Miller
ca02c22ff7
Merge pull request #180 from cclauss/patch-1
Thanks, @cclauss!
2023-04-12 11:45:41 -07:00
Barbara Miller
1fd3b2c7a1
update readme — rm travis 2023-04-12 11:44:01 -07:00
Christian Clauss
ba14480a2d
Delete .travis.yml 2023-04-12 11:37:56 +02:00
Barbara Miller
50a4f35e5f
Merge pull request #177 from internetarchive/blocks-shrink
@adam-miller ok'd this elsewhere
2022-08-05 15:44:05 -07:00
Barbara Miller
9973d28de9 bump version 2022-08-04 17:28:33 -07:00
Barbara Miller
ee9e375560 zlib decompression 2022-08-04 11:14:33 -07:00
Barbara Miller
c008c2eca7
bump version 2022-07-01 14:18:17 -07:00
Barbara Miller
7958921053
Merge pull request #175 from vbanos/random-tls-fingerprint
Thanks, @vbanos!
2022-07-01 14:16:05 -07:00
Vangelis Banos
329fef31a8 Randomize TLS fingerprint
Create a random TLS fingerprint per HTTPS connection to avoid TLS
fingerprinting.
2022-07-01 17:39:49 +00:00
Barbara Miller
d253ea85c3
Merge pull request #173 from internetarchive/increase_batch_sec
tune MIN_BATCH_SEC, MAX_BATCH_SEC for fewer dedup errors
2022-06-24 11:13:18 -07:00
Barbara Miller
8418fe10ba add explanatory comment 2022-06-24 11:07:35 -07:00
Adam Miller
fcd9b2b3bd
Merge pull request #172 from internetarchive/adds-canonicalization-tests
Adding url canonicalization tests and handling of edge cases to reduc…
2022-04-27 09:57:03 -07:00
Adam Miller
731cfe80cc Adding url canonicalization tests and handling of edge cases to reduce log noise 2022-04-26 23:48:54 +00:00