573 Commits

Author SHA1 Message Date
Barbara Miller
68dd6dbb78 Merge branch 'Py311' into qa 2023-09-12 14:39:28 -07:00
Barbara Miller
3b5d9d8ef0 update rethinkdb import 2023-09-12 14:39:09 -07:00
Barbara Miller
e0b92bc901 Merge branch 'limit_revisits' into qa 2023-09-11 16:03:16 -07:00
Barbara Miller
f82eb1f6d5 minor edits post-deploy 2023-09-11 16:02:52 -07:00
Barbara Miller
93667c7f7b Merge branch 'limit_revisits' into qa 2023-08-29 13:29:24 -07:00
Barbara Miller
976ff1b20d wrapper cache_true 2023-08-20 13:11:13 -04:00
Barbara Miller
f0b69dd74e Merge branch 'limit_revisits' into qa 2023-08-15 16:12:07 -07:00
Barbara Miller
15271835f6 format in limit_revisits 2023-08-15 16:11:45 -07:00
Barbara Miller
da69503ed1 Merge branch 'limit_revisits' into qa 2023-08-15 15:50:29 -07:00
Barbara Miller
887680b0ec try iso-8859-1 2023-08-15 15:50:02 -07:00
Barbara Miller
4e88c90f4d Merge branch 'limit_revisits' into qa 2023-08-15 14:27:01 -07:00
Barbara Miller
533f5c0af2 limit_revisits wants str, not bytes 2023-08-15 14:26:14 -07:00
Barbara Miller
3c64ee1529 Merge branch 'limit_revisits' into qa 2023-07-21 13:37:38 -07:00
Barbara Miller
f83e82c900 limit_revisit check before dedup 2023-07-21 13:37:08 -07:00
Barbara Miller
0dc80c6044 Merge branch 'limit_revisits' into qa 2023-07-13 10:34:02 -07:00
Barbara Miller
e5b2561821 disable prepared statements: prepare_threshold=None 2023-07-13 10:19:01 -07:00
Barbara Miller
8d684b7e12 Merge branch 'limit_revisits' into qa 2023-07-12 17:57:43 -07:00
Barbara Miller
548c4e5cab initial deploy fixes 2023-07-12 17:56:39 -07:00
Barbara Miller
af4c8b071a lru_cache skip_revisit 2023-07-12 17:05:29 -07:00
Barbara Miller
47811977ef lru_cache skip_revisit 2023-07-12 17:04:07 -07:00
Barbara Miller
3de580e352 Merge branch 'limit_revisits' into qa 2023-07-11 16:37:09 -07:00
Barbara Miller
64a152ee8c lru_cache 2023-07-11 16:35:38 -07:00
Barbara Miller
8563b95ff6 Merge branch 'limit_revisits' into qa 2023-06-28 17:35:58 -07:00
Barbara Miller
b91a7d1d89 more updates qa prototyping 2023-06-28 17:34:26 -07:00
Barbara Miller
40ef6fc186 Merge branch 'limit_revisits' into qa 2023-06-28 16:26:49 -07:00
Barbara Miller
702afbd098 more updates qa prototyping 2023-06-28 16:23:32 -07:00
Barbara Miller
876a113470 Merge branch 'limit_revisits' into qa 2023-06-28 11:48:48 -07:00
Barbara Miller
dfc34e7561 more updates qa prototyping 2023-06-28 11:43:53 -07:00
Barbara Miller
ef75164f8b fixes for qa prototyping 2023-06-27 17:19:40 -07:00
Barbara Miller
ad458ddb6a backout skip_revisits 2023-06-27 11:50:18 -07:00
Barbara Miller
65d7776ec4 Merge branch 'limit_revisits' into qa 2023-06-27 11:48:28 -07:00
Barbara Miller
b3f7b09298 fixes for qa prototyping 2023-06-27 11:47:55 -07:00
Barbara Miller
d9145eefb5 LimitRecords, more LimitRevisitsPGMixin 2023-06-26 22:49:33 -07:00
Barbara Miller
0da822a555 Merge branch 'skip_revisits' into qa 2023-06-23 11:30:01 -07:00
Barbara Miller
08f2903f14 LimitRevisitsPGMixin 2023-06-22 19:29:53 -07:00
Barbara Miller
5075920415 limit revisits mixin 2023-06-21 17:25:41 -07:00
Barbara Miller
4f0644727d get bytes from payload_digest obj 2023-06-08 17:08:30 -07:00
Barbara Miller
2755a10ebc fix logging 2023-06-06 12:31:46 -07:00
Barbara Miller
1ea069ae32 fix typos 2023-06-06 12:31:46 -07:00
Barbara Miller
2765942421 fix logging 2023-06-06 12:27:13 -07:00
Barbara Miller
419e5bc536 fix typos 2023-06-05 17:57:02 -07:00
Barbara Miller
1dc7de7dd8 skip duplicate revisits, per ait-job-id 2023-06-05 13:40:21 -07:00
Barbara Miller
ee9e375560 zlib decompression 2022-08-04 11:14:33 -07:00
Vangelis Banos
329fef31a8 Randomize TLS fingerprint
Create a random TLS fingerprint per HTTPS connection to avoid TLS
fingerprinting.
2022-07-01 17:39:49 +00:00
Barbara Miller
d253ea85c3
Merge pull request #173 from internetarchive/increase_batch_sec
tune MIN_BATCH_SEC, MAX_BATCH_SEC for fewer dedup errors
2022-06-24 11:13:18 -07:00
Barbara Miller
8418fe10ba add explanatory comment 2022-06-24 11:07:35 -07:00
Adam Miller
731cfe80cc Adding url canonicalization tests and handling of edge cases to reduce log noise 2022-04-26 23:48:54 +00:00
Adam Miller
1e3d22aba4 Better handle non-ascii urls for crawl log hop info 2022-04-20 22:48:28 +00:00
Adam Miller
5ae1291e37 Refactor of hop path referer logic 2022-03-24 21:40:55 +00:00
Barbara Miller
05daafa19e increase MIN_BATCH_SEC, MAX_BATCH_SEC 2022-03-03 18:46:20 -08:00