Skip to contents

Find RSS feed on a newspapers website

Usage

pb_find_rss(x, use = c("main", "suffixes", "feedly"))

Arguments

x

main domain of the newspaper site to check for RSS feeds.

use

which steps to include in the search (see Details). Default is to include all.

Value

A URL to the RSS feed(s) or NULL if nothing is found

Details

Uses a three step heuristic to find RSS feeds:

  1. Scrapes the main page (without any paths) to see if the RSS feed is advertised

  2. Checks a number of common paths where sites put their RSS feeds

  3. Queries the feedly.com API to for feeds associated with a page

References

Approach inspired by https://github.com/mediacloud/feed_seeker

Examples

pb_find_rss("https://www.buzzfeed.com/")
#>  Looking through links on the main page
#>  Looking through links on the main page [343ms]
#> 
#>  Looking through common paths on the site
#>  Looking through common paths on the site [571ms]
#> 
#>  Querying feedly API
#>  Querying feedly API [273ms]
#> 
#>  Discovered 7 URLsCheck manually to see which ones fit
#> # A tibble: 7 × 2
#>   source           url                                      
#>   <chr>            <chr>                                    
#> 1 landing page     https://www.buzzfeed.com/rss             
#> 2 common locations https://buzzfeed.com/index.xml           
#> 3 feedly API       https://www.buzzfeed.com/index           
#> 4 feedly API       https://www.buzzfeed.com/food            
#> 5 feedly API       https://www.buzzfeed.com/celebrity       
#> 6 feedly API       https://www.buzzfeed.com/badge/collection
#> 7 feedly API       https://www.buzzfeed.com/animals