Barbara Miller
|
68dd6dbb78
|
Merge branch 'Py311' into qa
|
2023-09-12 14:39:28 -07:00 |
|
Barbara Miller
|
3b5d9d8ef0
|
update rethinkdb import
|
2023-09-12 14:39:09 -07:00 |
|
Barbara Miller
|
e0b92bc901
|
Merge branch 'limit_revisits' into qa
|
2023-09-11 16:03:16 -07:00 |
|
Barbara Miller
|
f82eb1f6d5
|
minor edits post-deploy
|
2023-09-11 16:02:52 -07:00 |
|
Barbara Miller
|
93667c7f7b
|
Merge branch 'limit_revisits' into qa
|
2023-08-29 13:29:24 -07:00 |
|
Barbara Miller
|
976ff1b20d
|
wrapper cache_true
|
2023-08-20 13:11:13 -04:00 |
|
Barbara Miller
|
f0b69dd74e
|
Merge branch 'limit_revisits' into qa
|
2023-08-15 16:12:07 -07:00 |
|
Barbara Miller
|
15271835f6
|
format in limit_revisits
|
2023-08-15 16:11:45 -07:00 |
|
Barbara Miller
|
da69503ed1
|
Merge branch 'limit_revisits' into qa
|
2023-08-15 15:50:29 -07:00 |
|
Barbara Miller
|
887680b0ec
|
try iso-8859-1
|
2023-08-15 15:50:02 -07:00 |
|
Barbara Miller
|
4e88c90f4d
|
Merge branch 'limit_revisits' into qa
|
2023-08-15 14:27:01 -07:00 |
|
Barbara Miller
|
533f5c0af2
|
limit_revisits wants str, not bytes
|
2023-08-15 14:26:14 -07:00 |
|
Barbara Miller
|
3c64ee1529
|
Merge branch 'limit_revisits' into qa
|
2023-07-21 13:37:38 -07:00 |
|
Barbara Miller
|
f83e82c900
|
limit_revisit check before dedup
|
2023-07-21 13:37:08 -07:00 |
|
Barbara Miller
|
0dc80c6044
|
Merge branch 'limit_revisits' into qa
|
2023-07-13 10:34:02 -07:00 |
|
Barbara Miller
|
e5b2561821
|
disable prepared statements: prepare_threshold=None
|
2023-07-13 10:19:01 -07:00 |
|
Barbara Miller
|
8d684b7e12
|
Merge branch 'limit_revisits' into qa
|
2023-07-12 17:57:43 -07:00 |
|
Barbara Miller
|
548c4e5cab
|
initial deploy fixes
|
2023-07-12 17:56:39 -07:00 |
|
Barbara Miller
|
af4c8b071a
|
lru_cache skip_revisit
|
2023-07-12 17:05:29 -07:00 |
|
Barbara Miller
|
47811977ef
|
lru_cache skip_revisit
|
2023-07-12 17:04:07 -07:00 |
|
Barbara Miller
|
3de580e352
|
Merge branch 'limit_revisits' into qa
|
2023-07-11 16:37:09 -07:00 |
|
Barbara Miller
|
64a152ee8c
|
lru_cache
|
2023-07-11 16:35:38 -07:00 |
|
Barbara Miller
|
8563b95ff6
|
Merge branch 'limit_revisits' into qa
|
2023-06-28 17:35:58 -07:00 |
|
Barbara Miller
|
b91a7d1d89
|
more updates qa prototyping
|
2023-06-28 17:34:26 -07:00 |
|
Barbara Miller
|
40ef6fc186
|
Merge branch 'limit_revisits' into qa
|
2023-06-28 16:26:49 -07:00 |
|
Barbara Miller
|
702afbd098
|
more updates qa prototyping
|
2023-06-28 16:23:32 -07:00 |
|
Barbara Miller
|
876a113470
|
Merge branch 'limit_revisits' into qa
|
2023-06-28 11:48:48 -07:00 |
|
Barbara Miller
|
dfc34e7561
|
more updates qa prototyping
|
2023-06-28 11:43:53 -07:00 |
|
Barbara Miller
|
ef75164f8b
|
fixes for qa prototyping
|
2023-06-27 17:19:40 -07:00 |
|
Barbara Miller
|
ad458ddb6a
|
backout skip_revisits
|
2023-06-27 11:50:18 -07:00 |
|
Barbara Miller
|
65d7776ec4
|
Merge branch 'limit_revisits' into qa
|
2023-06-27 11:48:28 -07:00 |
|
Barbara Miller
|
b3f7b09298
|
fixes for qa prototyping
|
2023-06-27 11:47:55 -07:00 |
|
Barbara Miller
|
d9145eefb5
|
LimitRecords, more LimitRevisitsPGMixin
|
2023-06-26 22:49:33 -07:00 |
|
Barbara Miller
|
0da822a555
|
Merge branch 'skip_revisits' into qa
|
2023-06-23 11:30:01 -07:00 |
|
Barbara Miller
|
08f2903f14
|
LimitRevisitsPGMixin
|
2023-06-22 19:29:53 -07:00 |
|
Barbara Miller
|
5075920415
|
limit revisits mixin
|
2023-06-21 17:25:41 -07:00 |
|
Barbara Miller
|
4f0644727d
|
get bytes from payload_digest obj
|
2023-06-08 17:08:30 -07:00 |
|
Barbara Miller
|
2755a10ebc
|
fix logging
|
2023-06-06 12:31:46 -07:00 |
|
Barbara Miller
|
1ea069ae32
|
fix typos
|
2023-06-06 12:31:46 -07:00 |
|
Barbara Miller
|
2765942421
|
fix logging
|
2023-06-06 12:27:13 -07:00 |
|
Barbara Miller
|
419e5bc536
|
fix typos
|
2023-06-05 17:57:02 -07:00 |
|
Barbara Miller
|
1dc7de7dd8
|
skip duplicate revisits, per ait-job-id
|
2023-06-05 13:40:21 -07:00 |
|
Barbara Miller
|
ee9e375560
|
zlib decompression
|
2022-08-04 11:14:33 -07:00 |
|
Vangelis Banos
|
329fef31a8
|
Randomize TLS fingerprint
Create a random TLS fingerprint per HTTPS connection to avoid TLS
fingerprinting.
|
2022-07-01 17:39:49 +00:00 |
|
Barbara Miller
|
d253ea85c3
|
Merge pull request #173 from internetarchive/increase_batch_sec
tune MIN_BATCH_SEC, MAX_BATCH_SEC for fewer dedup errors
|
2022-06-24 11:13:18 -07:00 |
|
Barbara Miller
|
8418fe10ba
|
add explanatory comment
|
2022-06-24 11:07:35 -07:00 |
|
Adam Miller
|
731cfe80cc
|
Adding url canonicalization tests and handling of edge cases to reduce log noise
|
2022-04-26 23:48:54 +00:00 |
|
Adam Miller
|
1e3d22aba4
|
Better handle non-ascii urls for crawl log hop info
|
2022-04-20 22:48:28 +00:00 |
|
Adam Miller
|
5ae1291e37
|
Refactor of hop path referer logic
|
2022-03-24 21:40:55 +00:00 |
|
Barbara Miller
|
05daafa19e
|
increase MIN_BATCH_SEC, MAX_BATCH_SEC
|
2022-03-03 18:46:20 -08:00 |
|