Find RSS feed on a newspapers website
Usage
pb_find_rss(x, use = c("main", "suffixes", "feedly"))
Details
Uses a three step heuristic to find RSS feeds:
Scrapes the main page (without any paths) to see if the RSS feed is advertised
Checks a number of common paths where sites put their RSS feeds
Queries the feedly.com API to for feeds associated with a page
References
Approach inspired by https://github.com/mediacloud/feed_seeker
Examples
pb_find_rss("https://www.buzzfeed.com/")
#> ℹ Looking through links on the main page
#> ✔ Looking through links on the main page [343ms]
#>
#> ℹ Looking through common paths on the site
#> ✔ Looking through common paths on the site [571ms]
#>
#> ℹ Querying feedly API
#> ✔ Querying feedly API [273ms]
#>
#> ℹ Discovered 7 URLsCheck manually to see which ones fit
#> # A tibble: 7 × 2
#> source url
#> <chr> <chr>
#> 1 landing page https://www.buzzfeed.com/rss
#> 2 common locations https://buzzfeed.com/index.xml
#> 3 feedly API https://www.buzzfeed.com/index
#> 4 feedly API https://www.buzzfeed.com/food
#> 5 feedly API https://www.buzzfeed.com/celebrity
#> 6 feedly API https://www.buzzfeed.com/badge/collection
#> 7 feedly API https://www.buzzfeed.com/animals