1103 Commits

Author SHA1 Message Date
Barbara Miller
bc143a57fe
Merge branch 'qa' into qa 2023-06-05 13:51:19 -07:00
Barbara Miller
d98896094a bump qa version 2023-06-05 13:47:14 -07:00
Barbara Miller
1dc7de7dd8 skip duplicate revisits, per ait-job-id 2023-06-05 13:40:21 -07:00
Barbara Miller
50a4f35e5f
Merge pull request #177 from internetarchive/blocks-shrink
@adam-miller ok'd this elsewhere
2022-08-05 15:44:05 -07:00
Barbara Miller
9973d28de9 bump version 2022-08-04 17:28:33 -07:00
Barbara Miller
58f3e58531 bump qa version 2022-08-04 15:13:43 -07:00
Barbara Miller
460773a4a6 Merge branch 'blocks-shrink' into qa 2022-08-04 15:12:58 -07:00
Barbara Miller
ee9e375560 zlib decompression 2022-08-04 11:14:33 -07:00
Barbara Miller
8f10fce93a resetting to Jul 1 updates 2022-08-03 19:58:54 -07:00
Barbara Miller
20789e4edb bump qa version 2022-08-03 16:53:52 -07:00
Barbara Miller
5ed59a82d2 Merge branch 'qa' of github.com:internetarchive/warcprox into qa 2022-08-03 16:53:17 -07:00
Barbara Miller
e7fb1474ad Merge branch 'blocks-shrink' into qa 2022-08-03 16:52:44 -07:00
Barbara Miller
1205589fae decompress and json.loads 2022-08-03 16:52:31 -07:00
Barbara Miller
053a42a371
bump qa version 2022-08-03 16:30:49 -07:00
Barbara Miller
babbb8ca74 Merge branch 'blocks-shrink' into qa 2022-08-03 15:50:53 -07:00
Barbara Miller
2cdfceade1 decompress and split 2022-08-03 15:49:54 -07:00
Barbara Miller
09347c903e Merge branch 'master' of github.com:internetarchive/warcprox into blocks-shrink 2022-08-03 15:46:36 -07:00
Barbara Miller
a232ffc6ba
bump qa version 2022-08-03 15:04:39 -07:00
Barbara Miller
ea5ba9b007 Merge branch 'qa' of github.com:internetarchive/warcprox into qa 2022-08-03 15:01:13 -07:00
Barbara Miller
d617b4850c Merge branch 'blocks-shrink' into qa 2022-08-03 15:01:01 -07:00
Barbara Miller
3e8102221d use 'compressed_blocks' 2022-08-03 14:59:36 -07:00
Barbara Miller
106cb905db zlib decompression 2022-08-03 11:15:25 -07:00
Barbara Miller
c008c2eca7
bump version 2022-07-01 14:18:17 -07:00
Barbara Miller
7958921053
Merge pull request #175 from vbanos/random-tls-fingerprint
Thanks, @vbanos!
2022-07-01 14:16:05 -07:00
Barbara Miller
d3fdcbe152
bump qa version 2022-07-01 11:28:28 -07:00
Barbara Miller
6ad9a3e448 Merge branch 'qa' of github.com:internetarchive/warcprox into qa 2022-07-01 11:13:24 -07:00
Barbara Miller
ab172189fd Merge branch 'tls-fingerprint' into qa 2022-07-01 11:11:54 -07:00
Vangelis Banos
329fef31a8 Randomize TLS fingerprint
Create a random TLS fingerprint per HTTPS connection to avoid TLS
fingerprinting.
2022-07-01 17:39:49 +00:00
Barbara Miller
d253ea85c3
Merge pull request #173 from internetarchive/increase_batch_sec
tune MIN_BATCH_SEC, MAX_BATCH_SEC for fewer dedup errors
2022-06-24 11:13:18 -07:00
Barbara Miller
8418fe10ba add explanatory comment 2022-06-24 11:07:35 -07:00
Adam Miller
fcd9b2b3bd
Merge pull request #172 from internetarchive/adds-canonicalization-tests
Adding url canonicalization tests and handling of edge cases to reduc…
2022-04-27 09:57:03 -07:00
Adam Miller
aa4a550b12 Merge branch 'adds-canonicalization-tests' into qa 2022-04-26 23:49:12 +00:00
Adam Miller
731cfe80cc Adding url canonicalization tests and handling of edge cases to reduce log noise 2022-04-26 23:48:54 +00:00
Adam Miller
9521042a23
Merge pull request #171 from internetarchive/adds-hop-path-logging
Adds hop path logging
2022-04-26 12:11:11 -07:00
Adam Miller
daa925db17
Bump version 2022-04-26 09:55:48 -07:00
Adam Miller
31693c5472 Merge branch 'adds-hop-path-logging' into qa 2022-04-21 18:37:39 +00:00
Adam Miller
d96dd5d842 Adjust rfc3986 package version for deployment across more versions 2022-04-21 18:37:27 +00:00
Adam Miller
0f2c94ab9e Merge branch 'adds-hop-path-logging' into qa 2022-04-20 22:50:08 +00:00
Adam Miller
1e3d22aba4 Better handle non-ascii urls for crawl log hop info 2022-04-20 22:48:28 +00:00
Adam Miller
caf7f3b30f Merge branch 'qa' of github.com:internetarchive/warcprox into qa 2022-03-24 21:41:19 +00:00
Adam Miller
28bec1bb14 Merge branch 'adds-hop-path-logging' into qa 2022-03-24 21:41:11 +00:00
Adam Miller
5ae1291e37 Refactor of hop path referer logic 2022-03-24 21:40:55 +00:00
Barbara Miller
a614df69fd Merge branch 'qa' of github.com:internetarchive/warcprox into qa 2022-03-03 18:49:24 -08:00
Barbara Miller
e48a8dda05 Merge branch 'increase_batch_sec' into qa 2022-03-03 18:47:04 -08:00
Barbara Miller
05daafa19e increase MIN_BATCH_SEC, MAX_BATCH_SEC 2022-03-03 18:46:20 -08:00
Adam Miller
c8563b9407 Merge branch 'adds-hop-path-logging' into qa 2022-03-04 02:02:18 +00:00
Adam Miller
ade2373711 Fixing referer on request with null hop path 2022-03-04 02:01:55 +00:00
Adam Miller
60bd2ea2bd Merge branch 'adds-hop-path-logging' into qa 2022-03-03 00:19:00 +00:00
Adam Miller
3a234d0cec Refactor hop_path metadata 2022-03-03 00:18:16 +00:00
Adam Miller
dea2d1c8fa
Merge pull request #168 from internetarchive/adds-hop-path-logging
Adds hop path logging
2022-02-09 10:55:12 -08:00