Collect data from supplied URLs
Usage
pb_collect(
urls,
collect_rss = TRUE,
timeout = 30,
ignore_fails = FALSE,
connections = 100L,
host_con = 6L,
use_cookies = FALSE,
useragent = "paperboy",
save_dir = NULL,
verbose = NULL,
...
)
Arguments
- urls
Character object with URLs.
- collect_rss
If one of the URLs contains an RSS feed, should it be parsed.
- timeout
How long should the function wait for the connection (in seconds). If the query finishes earlier, results are returned immediately.
- ignore_fails
normally the function errors when a URL can't be reached due to connection issues. Setting to TRUE ignores this.
- connections
max total concurrent connections.
- host_con
max concurrent connections per host.
If
TRUE
, use thecookiemonster
package to handle cookies. See add_cookies for details on how to store cookies. Cookies are used to enter articles behind a paywall or consent form.- useragent
String to be sent in the User-Agent header.
- save_dir
store raw html data on disk instead of memory by providing a path to a directory.
- verbose
A logical flag indicating whether information should be printed to the screen. If
NULL
will be determined fromgetOption("paperboy_verbose")
.- ...
Currently not used