2023-02-01
Here is a simple way to make a fully navigable archive of an interesting static website, for instance for offline consumption, or because you are afraid that the Internet will cease to exist soon. We will be using redbean to make this archive easily viewable: this will make the archive an executable, that runs a simple static webserver when launched. For this example, we will be archiving redbean’s website itself.
Step 1. Download all files in the website using recursive wget:
wget --recursive \
--page-requisites \
--adjust-extension \
--convert-links \
--restrict-file-names=windows \
--domains redbean.dev \
--no-parent \
http://redbean.dev
Step 2. Download a small redbean executable suited to static websites:
wget https://redbean.dev/redbean-static-2.2.com -O redbean.dev.com
Step 3. Put all the stuff in there:
(cd redbean.dev; zip -r ../redbean.dev.com *)
Step 4. Make it executable and serve it right here and now:
chmod +x redbean.dev.com
./redbean.dev.com
The archived website can now be viewed in your browser at http://localhost:8080.
Step 5. Clean up the mess, you don’t need it anymore:
rm -rv redbean.dev
Here it is in the form of a generic script for any web domain starting at any URL:
#!/usr/bin/env bash
if [ -z "$1" ]; then
echo "Usage: $0 <domain>"
exit 1
fi
URL=$1
DOMAIN=$(echo "${URL}" | cut -d '/' -f 1)
echo "Archiving web domain: $DOMAIN, starting at url: http://$URL"
echo "Is this ok? Press enter for yes, ^C for no"
read
wget --recursive \
--page-requisites \
--adjust-extension \
--convert-links \
--restrict-file-names=windows \
--domains "$DOMAIN" \
--no-parent \
--user-agent "curl" \
"http://$URL"
if [ ! -d "$DOMAIN" ]; then
echo "Could not retrieve website content"
exit 1
fi
wget https://redbean.dev/redbean-static-2.2.com -O "${DOMAIN}.com"
(cd "$DOMAIN" && zip -r "../${DOMAIN}.com" * || exit 1)
chmod +x "${DOMAIN}.com"
rm -rv "${DOMAIN}"
I might start a small project of doing this for websites I find useful.