finish switch from README.md to README.rst

2025-01-18 13:22:09 +01:00 · 2013-11-28 01:28:59 -08:00 · 2013-11-28 01:28:59 -08:00 · 8ae164f8ca
commit 8ae164f8ca
parent b0dc399392
2 changed files with 1 additions and 114 deletions
--- a/README.md
+++ b/README.md
@ -1,113 +0,0 @@
-##warcprox - WARC writing MITM HTTP/S proxy
-
-Based on the excellent and simple pymiproxy by Nadeem Douba.
-https://github.com/allfro/pymiproxy
-
-License: because pymiproxy is GPL and warcprox is a derivative work of
-pymiproxy, warcprox is also GPL.
-
-###Trusting the CA cert
-
-For best results while browsing through warcprox, you need to add the CA cert
-as a trusted cert in your browser. If you don't do that, you will get the
-warning when you visit each new site. But worse, any embedded https content on
-a different server will simply fail to load, because the browser will reject
-the certificate without telling you. 
-
-###Dependencies
-
-Currently depends on tweaks branch of my fork of warctools.
-https://github.com/nlevitt/warctools/tree/tweaks
-Hopefully the changes in that branch, or something equivalent, will be
-incorporated into warctools mainline.
-
-###Usage
-
-    usage: warcprox.py [-h] [-p PORT] [-b ADDRESS] [-c CACERT]
-                       [--certs-dir CERTS_DIR] [-d DIRECTORY] [-z] [-n PREFIX]
-                       [-s SIZE] [--rollover-idle-time ROLLOVER_IDLE_TIME]
-                       [-g DIGEST_ALGORITHM] [--base32] [-j DEDUP_DB_FILE]
-                       [-P PLAYBACK_PORT]
-                       [--playback-index-db-file PLAYBACK_INDEX_DB_FILE] [-v] [-q]
-    
-    warcprox - WARC writing MITM HTTP/S proxy
-    
-    optional arguments:
-      -h, --help            show this help message and exit
-      -p PORT, --port PORT  port to listen on (default: 8000)
-      -b ADDRESS, --address ADDRESS
-                            address to listen on (default: localhost)
-      -c CACERT, --cacert CACERT
-                            CA certificate file; if file does not exist, it will
-                            be created (default: ./desktop-nlevitt-warcprox-
-                            ca.pem)
-      --certs-dir CERTS_DIR
-                            where to store and load generated certificates
-                            (default: ./desktop-nlevitt-warcprox-ca)
-      -d DIRECTORY, --dir DIRECTORY
-                            where to write warcs (default: ./warcs)
-      -z, --gzip            write gzip-compressed warc records (default: False)
-      -n PREFIX, --prefix PREFIX
-                            WARC filename prefix (default: WARCPROX)
-      -s SIZE, --size SIZE  WARC file rollover size threshold in bytes (default:
-                            1000000000)
-      --rollover-idle-time ROLLOVER_IDLE_TIME
-                            WARC file rollover idle time threshold in seconds (so
-                            that Friday's last open WARC doesn't sit there all
-                            weekend waiting for more data) (default: None)
-      -g DIGEST_ALGORITHM, --digest-algorithm DIGEST_ALGORITHM
-                            digest algorithm, one of md5, sha1, sha224, sha256,
-                            sha384, sha512 (default: sha1)
-      --base32              write digests in Base32 instead of hex (default:
-                            False)
-      -j DEDUP_DB_FILE, --dedup-db-file DEDUP_DB_FILE
-                            persistent deduplication database file; empty string
-                            or /dev/null disables deduplication (default:
-                            ./warcprox-dedup.db)
-      -P PLAYBACK_PORT, --playback-port PLAYBACK_PORT
-                            port to listen on for instant playback (default: None)
-      --playback-index-db-file PLAYBACK_INDEX_DB_FILE
-                            playback index database file (only used if --playback-
-                            port is specified) (default: ./warcprox-playback-
-                            index.db)
-      -v, --verbose
-      -q, --quiet
-
-###To do
-
- integration tests, unit tests
- ~~url-agnostic deduplication~~
- unchunk and/or ungzip before storing payload, or alter request to discourage server from chunking/gzipping
- check certs from proxied website, like browser does, and present browser-like warning if appropriate
- keep statistics, produce reports
- write cdx while crawling?
- performance testing
- ~~base32 sha1 like heritrix?~~
- configurable timeouts and stuff
- evaluate ipv6 support
- ~~more explicit handling of connection closed exception during transfer? other error cases?~~
- dns cache?? the system already does a fine job I'm thinking
- keepalive with remote servers?
- python3
- special handling for 304 not-modified (write nothing or write revisit
-  record... and/or modify request so server never responds with 304)
- ~~instant playback on a second proxy port~~
- special url for downloading ca cert e.g. http(s)://warcprox./ca.pem
- special url for other stuff, some status info or something?
- browser plugin for warcprox mode
-  * accept warcprox CA cert only when in warcprox mode
-  * separate temporary cookie store, like incognito
-  * "careful! your activity is being archived" banner
-  * easy switch between archiving and instant playback proxy port
-
-#### To not do
-
-The features below could also be part of warcprox. But maybe they don't belong
-here, since this is a proxy, not a crawler/robot. It can be used by a human
-with a browser, or by something automated, i.e. a robot. My feeling is that
-it's more appropriate to implement these in the robot.
-
- politeness, i.e. throttle requests per server
- fetch and obey robots.txt
- alter user-agent, maybe insert something like "warcprox mitm archiving proxy; +http://archive.org/details/archive.org_bot"
-
--- a/setup.py
+++ b/setup.py
@ -9,7 +9,7 @@ setuptools.setup(name='warcprox',
        url='https://github.com/internetarchive/warcprox',
        author='Noah Levitt',
        author_email='nlevitt@archive.org',
-        long_description=open('README.md').read(),
+        long_description=open('README.rst').read(),
        license='GPL',
        packages=['warcprox'],
        install_requires=['pyopenssl', 'warctools>=4.8.2'],  # gdbm/dbhash?