mirror of
https://github.com/internetarchive/warcprox.git
synced 2025-01-18 13:22:09 +01:00
update readme (and trigger travis ci build?)
This commit is contained in:
parent
20c25da48d
commit
235e0dce45
60
README.rst
60
README.rst
@ -82,33 +82,33 @@ Usage
|
|||||||
To do
|
To do
|
||||||
~~~~~
|
~~~~~
|
||||||
|
|
||||||
- integration tests, unit tests
|
* (partly done) integration tests, unit tests
|
||||||
- [STRIKEOUT:url-agnostic deduplication]
|
* (done) url-agnostic deduplication
|
||||||
- unchunk and/or ungzip before storing payload, or alter request to
|
* unchunk and/or ungzip before storing payload, or alter request to
|
||||||
discourage server from chunking/gzipping
|
discourage server from chunking/gzipping
|
||||||
- check certs from proxied website, like browser does, and present
|
* check certs from proxied website, like browser does, and present
|
||||||
browser-like warning if appropriate
|
browser-like warning if appropriate
|
||||||
- keep statistics, produce reports
|
* keep statistics, produce reports
|
||||||
- write cdx while crawling?
|
* write cdx while crawling?
|
||||||
- performance testing
|
* performance testing
|
||||||
- [STRIKEOUT:base32 sha1 like heritrix?]
|
* (done) base32 sha1 like heritrix?
|
||||||
- configurable timeouts and stuff
|
* configurable timeouts and stuff
|
||||||
- evaluate ipv6 support
|
* evaluate ipv6 support
|
||||||
- [STRIKEOUT:more explicit handling of connection closed exception
|
* (done) more explicit handling of connection closed exception
|
||||||
during transfer? other error cases?]
|
during transfer
|
||||||
- dns cache?? the system already does a fine job I'm thinking
|
* dns cache?? the system already does a fine job I'm thinking
|
||||||
- keepalive with remote servers?
|
* keepalive with remote servers?
|
||||||
- python3
|
* (done) python3
|
||||||
- special handling for 304 not-modified (write nothing or write revisit
|
* special handling for 304 not-modified (write nothing or write revisit
|
||||||
record... and/or modify request so server never responds with 304)
|
record... and/or modify request so server never responds with 304)
|
||||||
- [STRIKEOUT:instant playback on a second proxy port]
|
* (done) instant playback on a second proxy port
|
||||||
- special url for downloading ca cert e.g. http(s)://warcprox./ca.pem
|
* special url for downloading ca cert e.g. http(s)://warcprox./ca.pem
|
||||||
- special url for other stuff, some status info or something?
|
* special url for other stuff, some status info or something?
|
||||||
- browser plugin for warcprox mode
|
* browser plugin for warcprox mode
|
||||||
- accept warcprox CA cert only when in warcprox mode
|
- accept warcprox CA cert only when in warcprox mode
|
||||||
- separate temporary cookie store, like incognito
|
- separate temporary cookie store, like incognito
|
||||||
- "careful! your activity is being archived" banner
|
- "careful! your activity is being archived" banner
|
||||||
- easy switch between archiving and instant playback proxy port
|
- easy switch between archiving and instant playback proxy port
|
||||||
|
|
||||||
To not do
|
To not do
|
||||||
^^^^^^^^^
|
^^^^^^^^^
|
||||||
@ -118,8 +118,8 @@ belong here, since this is a proxy, not a crawler/robot. It can be used
|
|||||||
by a human with a browser, or by something automated, i.e. a robot. My
|
by a human with a browser, or by something automated, i.e. a robot. My
|
||||||
feeling is that it's more appropriate to implement these in the robot.
|
feeling is that it's more appropriate to implement these in the robot.
|
||||||
|
|
||||||
- politeness, i.e. throttle requests per server
|
* politeness, i.e. throttle requests per server
|
||||||
- fetch and obey robots.txt
|
* fetch and obey robots.txt
|
||||||
- alter user-agent, maybe insert something like "warcprox mitm
|
* alter user-agent, maybe insert something like "warcprox mitm
|
||||||
archiving proxy; +http://archive.org/details/archive.org\_bot"
|
archiving proxy; +http://archive.org/details/archive.org\_bot"
|
||||||
|
|
||||||
|
Loading…
x
Reference in New Issue
Block a user