Barbara Miller
|
533f5c0af2
|
limit_revisits wants str, not bytes
|
2023-08-15 14:26:14 -07:00 |
|
Barbara Miller
|
a86169c56c
|
pep0440 version id (I think)
|
2023-08-03 14:09:23 -07:00 |
|
Barbara Miller
|
f83e82c900
|
limit_revisit check before dedup
|
2023-07-21 13:37:08 -07:00 |
|
Barbara Miller
|
e5b2561821
|
disable prepared statements: prepare_threshold=None
|
2023-07-13 10:19:01 -07:00 |
|
Barbara Miller
|
548c4e5cab
|
initial deploy fixes
|
2023-07-12 17:56:39 -07:00 |
|
Barbara Miller
|
af4c8b071a
|
lru_cache skip_revisit
|
2023-07-12 17:05:29 -07:00 |
|
Barbara Miller
|
64a152ee8c
|
lru_cache
|
2023-07-11 16:35:38 -07:00 |
|
Barbara Miller
|
b91a7d1d89
|
more updates qa prototyping
|
2023-06-28 17:34:26 -07:00 |
|
Barbara Miller
|
ef75164f8b
|
fixes for qa prototyping
|
2023-06-27 17:19:40 -07:00 |
|
Barbara Miller
|
d9145eefb5
|
LimitRecords, more LimitRevisitsPGMixin
|
2023-06-26 22:49:33 -07:00 |
|
Barbara Miller
|
08f2903f14
|
LimitRevisitsPGMixin
|
2023-06-22 19:29:53 -07:00 |
|
Barbara Miller
|
5075920415
|
limit revisits mixin
|
2023-06-21 17:25:41 -07:00 |
|
Barbara Miller
|
ca02c22ff7
|
Merge pull request #180 from cclauss/patch-1
Thanks, @cclauss!
|
2023-04-12 11:45:41 -07:00 |
|
Barbara Miller
|
1fd3b2c7a1
|
update readme — rm travis
|
2023-04-12 11:44:01 -07:00 |
|
Christian Clauss
|
ba14480a2d
|
Delete .travis.yml
|
2023-04-12 11:37:56 +02:00 |
|
Barbara Miller
|
50a4f35e5f
|
Merge pull request #177 from internetarchive/blocks-shrink
@adam-miller ok'd this elsewhere
|
2022-08-05 15:44:05 -07:00 |
|
Barbara Miller
|
9973d28de9
|
bump version
|
2022-08-04 17:28:33 -07:00 |
|
Barbara Miller
|
ee9e375560
|
zlib decompression
|
2022-08-04 11:14:33 -07:00 |
|
Barbara Miller
|
c008c2eca7
|
bump version
|
2022-07-01 14:18:17 -07:00 |
|
Barbara Miller
|
7958921053
|
Merge pull request #175 from vbanos/random-tls-fingerprint
Thanks, @vbanos!
|
2022-07-01 14:16:05 -07:00 |
|
Vangelis Banos
|
329fef31a8
|
Randomize TLS fingerprint
Create a random TLS fingerprint per HTTPS connection to avoid TLS
fingerprinting.
|
2022-07-01 17:39:49 +00:00 |
|
Barbara Miller
|
d253ea85c3
|
Merge pull request #173 from internetarchive/increase_batch_sec
tune MIN_BATCH_SEC, MAX_BATCH_SEC for fewer dedup errors
|
2022-06-24 11:13:18 -07:00 |
|
Barbara Miller
|
8418fe10ba
|
add explanatory comment
|
2022-06-24 11:07:35 -07:00 |
|
Adam Miller
|
fcd9b2b3bd
|
Merge pull request #172 from internetarchive/adds-canonicalization-tests
Adding url canonicalization tests and handling of edge cases to reduc…
|
2022-04-27 09:57:03 -07:00 |
|
Adam Miller
|
731cfe80cc
|
Adding url canonicalization tests and handling of edge cases to reduce log noise
|
2022-04-26 23:48:54 +00:00 |
|
Adam Miller
|
9521042a23
|
Merge pull request #171 from internetarchive/adds-hop-path-logging
Adds hop path logging
|
2022-04-26 12:11:11 -07:00 |
|
Adam Miller
|
daa925db17
|
Bump version
|
2022-04-26 09:55:48 -07:00 |
|
Adam Miller
|
d96dd5d842
|
Adjust rfc3986 package version for deployment across more versions
|
2022-04-21 18:37:27 +00:00 |
|
Adam Miller
|
1e3d22aba4
|
Better handle non-ascii urls for crawl log hop info
|
2022-04-20 22:48:28 +00:00 |
|
Adam Miller
|
5ae1291e37
|
Refactor of hop path referer logic
|
2022-03-24 21:40:55 +00:00 |
|
Barbara Miller
|
05daafa19e
|
increase MIN_BATCH_SEC, MAX_BATCH_SEC
|
2022-03-03 18:46:20 -08:00 |
|
Adam Miller
|
ade2373711
|
Fixing referer on request with null hop path
|
2022-03-04 02:01:55 +00:00 |
|
Adam Miller
|
3a234d0cec
|
Refactor hop_path metadata
|
2022-03-03 00:18:16 +00:00 |
|
Adam Miller
|
366ed5155f
|
Merge branch 'master' into adds-hop-path-logging
|
2022-02-09 18:18:32 +00:00 |
|
Barbara Miller
|
c027659001
|
Merge pull request #167 from galgeek/WT-31
fix logging buglet iii
|
2021-12-29 12:14:56 -08:00 |
|
Barbara Miller
|
9e8ea5bb45
|
fix logging buglet iii
|
2021-12-29 12:06:18 -08:00 |
|
Barbara Miller
|
bc3d1e6d00
|
fix logging buglet ii
|
2021-12-29 11:55:39 -08:00 |
|
Barbara Miller
|
6b372e2f3f
|
Merge pull request #166 from galgeek/WT-31
fix logging buglet
|
2021-12-29 11:04:03 -08:00 |
|
Barbara Miller
|
5d8fbf7038
|
fix logging buglet
|
2021-12-29 10:25:04 -08:00 |
|
Barbara Miller
|
a969430b37
|
Merge pull request #163 from internetarchive/idna2_10
idna==2.10
|
2021-12-28 13:50:23 -08:00 |
|
Barbara Miller
|
aeecb6515f
|
bump version
|
2021-12-28 11:58:30 -08:00 |
|
Adam Miller
|
e1eddb8fa7
|
Merge pull request #165 from galgeek/WT-31
in-batch dedup
|
2021-12-28 11:52:41 -08:00 |
|
Barbara Miller
|
d7aec77597
|
faster, likely
|
2021-12-16 18:36:00 -08:00 |
|
Barbara Miller
|
bcaf293081
|
better logging
|
2021-12-09 12:19:45 -08:00 |
|
Barbara Miller
|
7d4c8dcb4e
|
recorded_url.do_not_archive = True
|
2021-12-08 11:04:09 -08:00 |
|
Barbara Miller
|
da089e0a92
|
bytes not str
|
2021-12-06 20:33:16 -08:00 |
|
Barbara Miller
|
3eeccd0016
|
more hash_plus_url
|
2021-12-06 19:43:27 -08:00 |
|
Barbara Miller
|
5e5a74f204
|
str, not object
|
2021-12-06 19:33:10 -08:00 |
|
Barbara Miller
|
b67f1ad0f3
|
add logging
|
2021-12-06 17:29:27 -08:00 |
|
Barbara Miller
|
e6a1a7dd7e
|
increase trough dedup batch window
|
2021-12-06 17:29:02 -08:00 |
|