# Overview Script to scrape a given list of websites, filter for given start/end lines, then commit to a git repository. Intended for keeping track of updates to terms of service, privacy policies, etc. # Usage ``` rob@crom:tostracker$ ./tostracker.sh -h usage: ./tostracker.sh OPTIONS OPTIONS: -c filename Use given config file instead of default (./config) -F char Use given character as a field separator in config file instead of default (@) -gc After site scrapes, run 'git add' on all files, then 'git commit' -gp After site scrapes, run 'git add' on all files, then 'git commit', then 'git push' -l List configured sites then exit -L Use lynx to render html -o dirname Use given output dir instead of default (.) -t sitename Just output raw content of given site, useful for finding start/end regexps. -T sitename Just output content of given site between re_start and re_end regexps. ``` # Example Config file Based on online sites used by preschools. ``` seesaw_tos@https://web.seesaw.me/terms-of-service@^Terms of Service$@^Contact$ seesaw_privacy@https://web.seesaw.me/privacy-policy@^1.*Introduction$@^Last Updated languagenut_tos@https://www.languagenut.com/en-au/terms/@^Terms of Service$@^FAQs$ languagenut_privacy@https://www.languagenut.com/en-au/privacy-policy/@^PRIVACY POLICY$@^FAQs$ acer_tos@https://www.acer.org/online-terms-of-use@^Legal agreement$@^Contact us$ acer_privacy@https://www.acer.org/privacy@^Privacy Policies and Legal Disclaimers$@^Contact us$ 3plearning_tos@https://www.3plearning.com/terms?locate=en-AU@^Last updated@^Skip to$ 3plearning_privacy@https://www.3plearning.com/privacy?locate=en-AU@^Last updated@^Skip to$ ``` # Examples ## Basic usage - scrape only ``` rob@crom:tostracker$ ./tostracker.sh Scraped 'seesaw_tos' to '/Users/rob/.tostracker/output/seesaw_tos.txt' (no change) Scraped 'seesaw_privacy' to '/Users/rob/.tostracker/output/seesaw_privacy.txt' (no change) Scraped 'languagenut_tos' to '/Users/rob/.tostracker/output/languagenut_tos.txt' (no change) Scraped 'languagenut_privacy' to '/Users/rob/.tostracker/output/languagenut_privacy.txt' (no change) Scraped 'acer_tos' to '/Users/rob/.tostracker/output/acer_tos.txt' (no change) Scraped 'acer_privacy' to '/Users/rob/.tostracker/output/acer_privacy.txt' (no change) Scraped '3plearning_tos' to '/Users/rob/.tostracker/output/3plearning_tos.txt' (no change) Scraped '3plearning_privacy' to '/Users/rob/.tostracker/output/3plearning_privacy.txt' (no change) ``` ## Scrape and commit results to git repo ``` rob@crom:tostracker$ ./tostracker.sh -gc Scraped 'seesaw_tos' to '/Users/rob/.tostracker/output/seesaw_tos.txt' (no change) Scraped 'seesaw_privacy' to '/Users/rob/.tostracker/output/seesaw_privacy.txt' (no change) Scraped 'languagenut_tos' to '/Users/rob/.tostracker/output/languagenut_tos.txt' (no change) Scraped 'languagenut_privacy' to '/Users/rob/.tostracker/output/languagenut_privacy.txt' (no change) Scraped 'acer_tos' to '/Users/rob/.tostracker/output/acer_tos.txt' (no change) Scraped 'acer_privacy' to '/Users/rob/.tostracker/output/acer_privacy.txt' (no change) Scraped '3plearning_tos' to '/Users/rob/.tostracker/output/3plearning_tos.txt' (no change) Scraped '3plearning_privacy' to '/Users/rob/.tostracker/output/3plearning_privacy.txt' (UPDATES FOUND) Doing git add...ok Doing git commit...ok ``` ## Scrape, commit results to git repo, then git push to upstream ``` rob@crom:tostracker$ ./tostracker.sh -gp Scraped 'seesaw_tos' to '/Users/rob/.tostracker/output/seesaw_tos.txt' (no change) Scraped 'seesaw_privacy' to '/Users/rob/.tostracker/output/seesaw_privacy.txt' (no change) Scraped 'languagenut_tos' to '/Users/rob/.tostracker/output/languagenut_tos.txt' (no change) Scraped 'languagenut_privacy' to '/Users/rob/.tostracker/output/languagenut_privacy.txt' (no change) Scraped 'acer_tos' to '/Users/rob/.tostracker/output/acer_tos.txt' (no change) Scraped 'acer_privacy' to '/Users/rob/.tostracker/output/acer_privacy.txt' (no change) Scraped '3plearning_tos' to '/Users/rob/.tostracker/output/3plearning_tos.txt' (no change) Scraped '3plearning_privacy' to '/Users/rob/.tostracker/output/3plearning_privacy.txt' (UPDATES FOUND) Doing git add...ok Doing git commit...ok Doing git push...ok ```