Script to scrape a given list of websites, filter for given start/end lines, then commit to a git repository.
Go to file
Rob Pearce 2825e35152 Add -L option tk use lynx to render html 2024-06-30 09:40:51 +10:00
README.md Add -L option tk use lynx to render html 2024-06-30 09:40:51 +10:00
tostracker.sh Add -L option tk use lynx to render html 2024-06-30 09:40:51 +10:00

README.md

Overview

Script to scrape a given list of websites, filter for given start/end lines, then commit to a git repository.

Intended for keeping track of updates to terms of service, privacy policies, etc.

Usage

rob@crom:tostracker$ ./tostracker.sh  -h
usage:  ./tostracker.sh OPTIONS

OPTIONS:
       -c  filename      Use given config file instead of default (./config)
       -F  char          Use given character as a field separator in config file instead of default (@)
       -gc               After site scrapes, run 'git add' on all files, then 'git commit'
       -gp               After site scrapes, run 'git add' on all files, then 'git commit', then 'git push'
       -l                List configured sites then exit
       -L                Use lynx to render html
       -o  dirname       Use given output dir instead of default (.)
       -t  sitename      Just output raw content of given site, useful for finding start/end regexps.
       -T  sitename      Just output content of given site between re_start and re_end regexps.

Example Config file

Based on online sites used by preschools.

seesaw_tos@https://web.seesaw.me/terms-of-service@^Terms of Service$@^Contact$
seesaw_privacy@https://web.seesaw.me/privacy-policy@^1.*Introduction$@^Last Updated
languagenut_tos@https://www.languagenut.com/en-au/terms/@^Terms of Service$@^FAQs$
languagenut_privacy@https://www.languagenut.com/en-au/privacy-policy/@^PRIVACY POLICY$@^FAQs$
acer_tos@https://www.acer.org/online-terms-of-use@^Legal agreement$@^Contact us$
acer_privacy@https://www.acer.org/privacy@^Privacy Policies and Legal Disclaimers$@^Contact us$
3plearning_tos@https://www.3plearning.com/terms?locate=en-AU@^Last updated@^Skip to$
3plearning_privacy@https://www.3plearning.com/privacy?locate=en-AU@^Last updated@^Skip to$

Examples

Basic usage - scrape only

rob@crom:tostracker$ ./tostracker.sh
Scraped 'seesaw_tos' to '/Users/rob/.tostracker/output/seesaw_tos.txt' (no change)
Scraped 'seesaw_privacy' to '/Users/rob/.tostracker/output/seesaw_privacy.txt' (no change)
Scraped 'languagenut_tos' to '/Users/rob/.tostracker/output/languagenut_tos.txt' (no change)
Scraped 'languagenut_privacy' to '/Users/rob/.tostracker/output/languagenut_privacy.txt' (no change)
Scraped 'acer_tos' to '/Users/rob/.tostracker/output/acer_tos.txt' (no change)
Scraped 'acer_privacy' to '/Users/rob/.tostracker/output/acer_privacy.txt' (no change)
Scraped '3plearning_tos' to '/Users/rob/.tostracker/output/3plearning_tos.txt' (no change)
Scraped '3plearning_privacy' to '/Users/rob/.tostracker/output/3plearning_privacy.txt' (no change)

Scrape and commit results to git repo

rob@crom:tostracker$ ./tostracker.sh -gc
Scraped 'seesaw_tos' to '/Users/rob/.tostracker/output/seesaw_tos.txt' (no change)
Scraped 'seesaw_privacy' to '/Users/rob/.tostracker/output/seesaw_privacy.txt' (no change)
Scraped 'languagenut_tos' to '/Users/rob/.tostracker/output/languagenut_tos.txt' (no change)
Scraped 'languagenut_privacy' to '/Users/rob/.tostracker/output/languagenut_privacy.txt' (no change)
Scraped 'acer_tos' to '/Users/rob/.tostracker/output/acer_tos.txt' (no change)
Scraped 'acer_privacy' to '/Users/rob/.tostracker/output/acer_privacy.txt' (no change)
Scraped '3plearning_tos' to '/Users/rob/.tostracker/output/3plearning_tos.txt' (no change)
Scraped '3plearning_privacy' to '/Users/rob/.tostracker/output/3plearning_privacy.txt' (UPDATES FOUND)
Doing git add...ok
Doing git commit...ok

Scrape, commit results to git repo, then git push to upstream

rob@crom:tostracker$ ./tostracker.sh -gp
Scraped 'seesaw_tos' to '/Users/rob/.tostracker/output/seesaw_tos.txt' (no change)
Scraped 'seesaw_privacy' to '/Users/rob/.tostracker/output/seesaw_privacy.txt' (no change)
Scraped 'languagenut_tos' to '/Users/rob/.tostracker/output/languagenut_tos.txt' (no change)
Scraped 'languagenut_privacy' to '/Users/rob/.tostracker/output/languagenut_privacy.txt' (no change)
Scraped 'acer_tos' to '/Users/rob/.tostracker/output/acer_tos.txt' (no change)
Scraped 'acer_privacy' to '/Users/rob/.tostracker/output/acer_privacy.txt' (no change)
Scraped '3plearning_tos' to '/Users/rob/.tostracker/output/3plearning_tos.txt' (no change)
Scraped '3plearning_privacy' to '/Users/rob/.tostracker/output/3plearning_privacy.txt' (UPDATES FOUND)
Doing git add...ok
Doing git commit...ok
Doing git push...ok