Rob Pearce 799ba547ff | ||
---|---|---|
README.md | ||
tostracker.sh |
README.md
Overview
Script to scrape a given list of websites, filter for given start/end lines, then commit to a git repository.
Intended for keeping track of updates to terms of service, privacy policies, etc.
Usage
rob@crom:tostracker$ ./tostracker.sh -h
usage: ./tostracker.sh OPTIONS
OPTIONS:
-c filename Use given config file instead of default (./config)
-F char Use given character as a field separator in config file instead of default (@)
-gc After site scrapes, run 'git add' on all files, then 'git commit'
-gp After site scrapes, run 'git add' on all files, then 'git commit', then 'git push'
-o dirname Use given output dir instead of default (.)
-t sitename Just output raw content of given site, useful for finding start/end regexps.
-T sitename Just output content of given site between re_start and re_end regexps.
Example Config file
Based on online sites used by preschools.
seesaw_tos@https://web.seesaw.me/terms-of-service@^Terms of Service$@^Contact$
seesaw_privacy@https://web.seesaw.me/privacy-policy@^1.*Introduction$@^Last Updated
languagenut_tos@https://www.languagenut.com/en-au/terms/@^Terms of Service$@^FAQs$
languagenut_privacy@https://www.languagenut.com/en-au/privacy-policy/@^PRIVACY POLICY$@^FAQs$
acer_tos@https://www.acer.org/online-terms-of-use@^Legal agreement$@^Contact us$
acer_privacy@https://www.acer.org/privacy@^Privacy Policies and Legal Disclaimers$@^Contact us$
3plearning_tos@https://www.3plearning.com/terms?locate=en-AU@^Last updated@^Skip to$
3plearning_privacy@https://www.3plearning.com/privacy?locate=en-AU@^Last updated@^Skip to$
Examples
Basic usage - scrape only
rob@crom:tostracker$ ./tostracker.sh
Scraped 'seesaw_tos' to '/Users/rob/.tostracker/output/seesaw_tos.txt' (no change)
Scraped 'seesaw_privacy' to '/Users/rob/.tostracker/output/seesaw_privacy.txt' (no change)
Scraped 'languagenut_tos' to '/Users/rob/.tostracker/output/languagenut_tos.txt' (no change)
Scraped 'languagenut_privacy' to '/Users/rob/.tostracker/output/languagenut_privacy.txt' (no change)
Scraped 'acer_tos' to '/Users/rob/.tostracker/output/acer_tos.txt' (no change)
Scraped 'acer_privacy' to '/Users/rob/.tostracker/output/acer_privacy.txt' (no change)
Scraped '3plearning_tos' to '/Users/rob/.tostracker/output/3plearning_tos.txt' (no change)
Scraped '3plearning_privacy' to '/Users/rob/.tostracker/output/3plearning_privacy.txt' (no change)
Scrape and commit results to git repo
rob@crom:tostracker$ ./tostracker.sh -gc
Scraped 'seesaw_tos' to '/Users/rob/.tostracker/output/seesaw_tos.txt' (no change)
Scraped 'seesaw_privacy' to '/Users/rob/.tostracker/output/seesaw_privacy.txt' (no change)
Scraped 'languagenut_tos' to '/Users/rob/.tostracker/output/languagenut_tos.txt' (no change)
Scraped 'languagenut_privacy' to '/Users/rob/.tostracker/output/languagenut_privacy.txt' (no change)
Scraped 'acer_tos' to '/Users/rob/.tostracker/output/acer_tos.txt' (no change)
Scraped 'acer_privacy' to '/Users/rob/.tostracker/output/acer_privacy.txt' (no change)
Scraped '3plearning_tos' to '/Users/rob/.tostracker/output/3plearning_tos.txt' (no change)
Scraped '3plearning_privacy' to '/Users/rob/.tostracker/output/3plearning_privacy.txt' (UPDATES FOUND)
Doing git add...ok
Doing git commit...ok
Scrape, commit results to git repo, then git push to upstream
rob@crom:tostracker$ ./tostracker.sh -gp
Scraped 'seesaw_tos' to '/Users/rob/.tostracker/output/seesaw_tos.txt' (no change)
Scraped 'seesaw_privacy' to '/Users/rob/.tostracker/output/seesaw_privacy.txt' (no change)
Scraped 'languagenut_tos' to '/Users/rob/.tostracker/output/languagenut_tos.txt' (no change)
Scraped 'languagenut_privacy' to '/Users/rob/.tostracker/output/languagenut_privacy.txt' (no change)
Scraped 'acer_tos' to '/Users/rob/.tostracker/output/acer_tos.txt' (no change)
Scraped 'acer_privacy' to '/Users/rob/.tostracker/output/acer_privacy.txt' (no change)
Scraped '3plearning_tos' to '/Users/rob/.tostracker/output/3plearning_tos.txt' (no change)
Scraped '3plearning_privacy' to '/Users/rob/.tostracker/output/3plearning_privacy.txt' (UPDATES FOUND)
Doing git add...ok
Doing git commit...ok
Doing git push...ok