984 Commits

Author SHA1 Message Date
Barbara Miller
64a152ee8c lru_cache 2023-07-11 16:35:38 -07:00
Barbara Miller
b91a7d1d89 more updates qa prototyping 2023-06-28 17:34:26 -07:00
Barbara Miller
ef75164f8b fixes for qa prototyping 2023-06-27 17:19:40 -07:00
Barbara Miller
d9145eefb5 LimitRecords, more LimitRevisitsPGMixin 2023-06-26 22:49:33 -07:00
Barbara Miller
08f2903f14 LimitRevisitsPGMixin 2023-06-22 19:29:53 -07:00
Barbara Miller
5075920415 limit revisits mixin 2023-06-21 17:25:41 -07:00
Barbara Miller
ca02c22ff7
Merge pull request #180 from cclauss/patch-1
Thanks, @cclauss!
2023-04-12 11:45:41 -07:00
Barbara Miller
1fd3b2c7a1
update readme — rm travis 2023-04-12 11:44:01 -07:00
Christian Clauss
ba14480a2d
Delete .travis.yml 2023-04-12 11:37:56 +02:00
Barbara Miller
50a4f35e5f
Merge pull request #177 from internetarchive/blocks-shrink
@adam-miller ok'd this elsewhere
2022-08-05 15:44:05 -07:00
Barbara Miller
9973d28de9 bump version 2022-08-04 17:28:33 -07:00
Barbara Miller
ee9e375560 zlib decompression 2022-08-04 11:14:33 -07:00
Barbara Miller
c008c2eca7
bump version 2022-07-01 14:18:17 -07:00
Barbara Miller
7958921053
Merge pull request #175 from vbanos/random-tls-fingerprint
Thanks, @vbanos!
2022-07-01 14:16:05 -07:00
Vangelis Banos
329fef31a8 Randomize TLS fingerprint
Create a random TLS fingerprint per HTTPS connection to avoid TLS
fingerprinting.
2022-07-01 17:39:49 +00:00
Barbara Miller
d253ea85c3
Merge pull request #173 from internetarchive/increase_batch_sec
tune MIN_BATCH_SEC, MAX_BATCH_SEC for fewer dedup errors
2022-06-24 11:13:18 -07:00
Barbara Miller
8418fe10ba add explanatory comment 2022-06-24 11:07:35 -07:00
Adam Miller
fcd9b2b3bd
Merge pull request #172 from internetarchive/adds-canonicalization-tests
Adding url canonicalization tests and handling of edge cases to reduc…
2022-04-27 09:57:03 -07:00
Adam Miller
731cfe80cc Adding url canonicalization tests and handling of edge cases to reduce log noise 2022-04-26 23:48:54 +00:00
Adam Miller
9521042a23
Merge pull request #171 from internetarchive/adds-hop-path-logging
Adds hop path logging
2022-04-26 12:11:11 -07:00
Adam Miller
daa925db17
Bump version 2022-04-26 09:55:48 -07:00
Adam Miller
d96dd5d842 Adjust rfc3986 package version for deployment across more versions 2022-04-21 18:37:27 +00:00
Adam Miller
1e3d22aba4 Better handle non-ascii urls for crawl log hop info 2022-04-20 22:48:28 +00:00
Adam Miller
5ae1291e37 Refactor of hop path referer logic 2022-03-24 21:40:55 +00:00
Barbara Miller
05daafa19e increase MIN_BATCH_SEC, MAX_BATCH_SEC 2022-03-03 18:46:20 -08:00
Adam Miller
ade2373711 Fixing referer on request with null hop path 2022-03-04 02:01:55 +00:00
Adam Miller
3a234d0cec Refactor hop_path metadata 2022-03-03 00:18:16 +00:00
Adam Miller
366ed5155f Merge branch 'master' into adds-hop-path-logging 2022-02-09 18:18:32 +00:00
Barbara Miller
c027659001
Merge pull request #167 from galgeek/WT-31
fix logging buglet iii
2021-12-29 12:14:56 -08:00
Barbara Miller
9e8ea5bb45 fix logging buglet iii 2021-12-29 12:06:18 -08:00
Barbara Miller
bc3d1e6d00 fix logging buglet ii 2021-12-29 11:55:39 -08:00
Barbara Miller
6b372e2f3f
Merge pull request #166 from galgeek/WT-31
fix logging buglet
2021-12-29 11:04:03 -08:00
Barbara Miller
5d8fbf7038 fix logging buglet 2021-12-29 10:25:04 -08:00
Barbara Miller
a969430b37
Merge pull request #163 from internetarchive/idna2_10
idna==2.10
2021-12-28 13:50:23 -08:00
Barbara Miller
aeecb6515f
bump version 2021-12-28 11:58:30 -08:00
Adam Miller
e1eddb8fa7
Merge pull request #165 from galgeek/WT-31
in-batch dedup
2021-12-28 11:52:41 -08:00
Barbara Miller
d7aec77597 faster, likely 2021-12-16 18:36:00 -08:00
Barbara Miller
bcaf293081 better logging 2021-12-09 12:19:45 -08:00
Barbara Miller
7d4c8dcb4e recorded_url.do_not_archive = True 2021-12-08 11:04:09 -08:00
Barbara Miller
da089e0a92 bytes not str 2021-12-06 20:33:16 -08:00
Barbara Miller
3eeccd0016 more hash_plus_url 2021-12-06 19:43:27 -08:00
Barbara Miller
5e5a74f204 str, not object 2021-12-06 19:33:10 -08:00
Barbara Miller
b67f1ad0f3 add logging 2021-12-06 17:29:27 -08:00
Barbara Miller
e6a1a7dd7e increase trough dedup batch window 2021-12-06 17:29:02 -08:00
Barbara Miller
e744075913 python 3.5 version, mostly 2021-12-02 11:46:39 -08:00
Barbara Miller
1476bfec8c discard batch hash+url match 2021-12-02 11:17:59 -08:00
Adam Miller
b57ec9c589 Check warcprox meta headers for hop information necessary to record a hop path if provided 2021-08-31 17:09:06 +00:00
Barbara Miller
e61099ff5f idna==2.10 2021-04-27 10:26:45 -07:00
Barbara Miller
0e23a31a31
Merge pull request #161 from internetarchive/fixes-malformed-crawl-log-lines
Checking for content type header consiting of only empty spaces and r…
2021-04-21 15:31:17 -07:00
Adam Miller
7f406b7942 Trying to fix tests that only fail during ci 2021-04-01 00:01:47 +00:00