Unofficial API • traktok

The unofficial or hidden API is essentially what the TikTok website uses to display you content. Partly based on Deen Freelon’s Pyktok Python module, traktok contains functions to simulate a browser accessing some of these API endpoints. How these endpoints work was discovered through reverse engineering and TikTok might change how these endpoints operate at any moment. As of writing this (2023-11-28), there are functions that can:

search videos using a search term
get video details and the video files from a given video URL
get who follows a user
get who a user is following

To use these functions, you have to log into <tiktok.com> first and then give R the cookies the browser uses to identify itself.

Authentication

The easiest way to get the cookies needed for authentication is to export the necessary cookies from your browser using a browser extension (after logging in at TikTok.com at least once). I can recommend “Get cookies.txt” for Chromium based browsers or “cookies.txt” for Firefox (note that almost all browsers used today are based on one of these).

Save the cookies.txt file, which will look something like this:

# Netscape HTTP Cookie File
# https://curl.haxx.se/rfc/cookie_spec.html
# This is a generated file! Do not edit.

.tiktok.com TRUE    /   TRUE    1728810805  cookie-consent  {%22ga%22:true%2C%22af%...
.tiktok.com TRUE    /   TRUE    1700471788  passport_csrf_token e07d3487c11ce5258a3...
.tiktok.com TRUE    /   FALSE   1700471788  passport_csrf_token_default e07d3487c11...
#HttpOnly_.tiktok.com   TRUE    /   TRUE    1700493610  multi_sids  71573310862246389...
#HttpOnly_.tiktok.com   TRUE    /   TRUE    1700493610  cmpl_token  AgQQAPORF-RO0rNtH...
...

It does not matter if you download all cookies or just the ones specific to TikTok, as we use the cookiemonster package to deal with that. To read the cookies into a specific encrypted file, simply use:

cookiemonster::add_cookies("tiktok.com_cookies.txt")

And that’s it! traktok will access these cookies whenever necessary.

Usage

Search videos

To search for videos, you can use either tt_search or tt_search_hidden, which do the same, as long as you do not have a token for the Research API. To get the first two pages of search results (one page has 12 videos), you can use this command:

rstats_df <- tt_search_hidden("#rstats", max_pages = 2)
#> 
ℹ Getting page 1
⏲ waiting 0.5 seconds
ℹ Getting page 1
✔ Got page 1. Found 12 videos. [1.9s]
#> 
ℹ Getting page 2
✔ Got page 2. Found 12 videos. [690ms]
rstats_df
#> # A tibble: 24 × 20
#>    video_id  video_timestamp     video_url video_length video_title
#>    <chr>     <dttm>              <glue>           <int> <chr>      
#>  1 71151144… 2022-06-30 19:17:53 https://…          135 "R for Beg…
#>  2 72522261… 2023-07-05 07:01:45 https://…           36 "Wow!!! TH…
#>  3 72420686… 2023-06-07 22:05:16 https://…           34 "R GRAPHIC…
#>  4 72134135… 2023-03-22 16:49:12 https://…            6 "R and me …
#>  5 72576898… 2023-07-20 00:23:40 https://…           56 "Pie chart…
#>  6 72999870… 2023-11-10 23:58:21 https://…           51 "Quick R Q…
#>  7 72783048… 2023-09-13 13:40:21 https://…           36 "Quick R Q…
#>  8 73029706… 2023-11-19 00:56:09 https://…          163 "What is c…
#>  9 71670108… 2022-11-17 15:42:56 https://…           58 "Here’s an…
#> 10 72933174… 2023-10-24 00:36:48 https://…            9 "#CapCut #…
#> # ℹ 14 more rows
#> # ℹ 15 more variables: video_diggcount <int>,
#> #   video_sharecount <int>, video_commentcount <int>,
#> #   video_playcount <int>, video_is_ad <lgl>, author_name <chr>,
#> #   author_nickname <chr>, author_followercount <int>,
#> #   author_followingcount <int>, author_heartcount <int>,
#> #   author_videocount <int>, author_diggcount <int>, …

This already gives you pretty much all information you could want about the videos that were found.

Get metadata and download videos

However, you can obtain some more information, and importantly the video file, using tt_videos:

rstats_df2 <- tt_videos(rstats_df$video_url[1:2], save_video = TRUE)
#> 
ℹ Getting video 7115114419314560298
⏲ waiting 0.2 seconds              
ℹ Getting video 7115114419314560298
✔ Got video 7115114419314560298 (1/2). File size: 2.5 Mb. [2.5s]
#> 
ℹ Getting video 7252226153828584731
✔ Got video 7252226153828584731 (2/2). File size: 1.7 Mb. [999ms]
rstats_df2
#> # A tibble: 2 × 19
#>   video_id   video_url video_timestamp     video_length video_title
#>   <glue>     <chr>     <dttm>                     <int> <chr>      
#> 1 711511441… https://… 2022-06-30 19:17:53          135 R for Begi…
#> 2 725222615… https://… 2023-07-05 07:01:45           36 Wow!!! THI…
#> # ℹ 14 more variables: video_locationcreated <chr>,
#> #   video_diggcount <int>, video_sharecount <int>,
#> #   video_commentcount <int>, video_playcount <int>,
#> #   author_username <chr>, author_nickname <chr>,
#> #   author_bio <chr>, download_url <chr>, html_status <int>,
#> #   music <list>, challenges <list>, is_classified <lgl>,
#> #   video_fn <chr>

Per default, the function waits between one and ten seconds (chosen at random) between making two calls, to not make it too obvious that data is scraped from TikTok. You can speed up the process (at your own risk), by changing the sleep_pool argument, which controls the minimum and maximum number of seconds to wait:

rstats_df3 <- tt_videos(rstats_df$video_url[3:4], save_video = TRUE, sleep_pool = 0.1)
#> 
ℹ Getting video 7242068680484408581
⏲ waiting 0.1 seconds              
ℹ Getting video 7242068680484408581
✔ Got video 7242068680484408581 (1/2). File size: 1.8 Mb. [2.6s]
#> 
ℹ Getting video 7213413598998056234
✔ Got video 7213413598998056234 (2/2). File size: 598.1 Kb. [1.7s]
rstats_df3
#> # A tibble: 2 × 19
#>   video_id   video_url video_timestamp     video_length video_title
#>   <glue>     <chr>     <dttm>                     <int> <chr>      
#> 1 724206868… https://… 2023-06-07 22:05:16           34 "R GRAPHIC…
#> 2 721341359… https://… 2023-03-22 16:49:12            6 "R and me …
#> # ℹ 14 more variables: video_locationcreated <chr>,
#> #   video_diggcount <int>, video_sharecount <int>,
#> #   video_commentcount <int>, video_playcount <int>,
#> #   author_username <chr>, author_nickname <chr>,
#> #   author_bio <chr>, download_url <chr>, html_status <int>,
#> #   music <list>, challenges <list>, is_classified <lgl>,
#> #   video_fn <chr>

When you are scraping a lot of URLs, the function might fail eventually, due to a poor connection or because TikTok is blocking your requests. It therefore usually makes sense to save your progress in a cache directory:

rstats_df3 <- tt_videos(rstats_df$video_url[5:6], cache_dir = "rstats")
#> 
ℹ Getting video 7257689890245201153
⏲ waiting 1.7 seconds
ℹ Getting video 7257689890245201153
✔ Got video 7257689890245201153 (1/2). File size: 1.7 Mb. [2.6s]
#> 
ℹ Getting video 7299987059417042209
✔ Got video 7299987059417042209 (2/2). File size: 1.2 Mb. [1.8s]
list.files("rstats")
#> [1] "7257689890245201153.json" "7299987059417042209.json"

Note that the video files are downloaded into the dir directory (your working directory by default), independently from your cache directory.

If there are information that you feel are missing from the data.frame tt_videos returns, you can look at the raw, unparsed json data using:

rstats_list1 <- tt_request_hidden(rstats_df$video_url[1]) |>
  jsonlite::fromJSON()

Parsing the result into a list using fromJSON, results in a rather complex nested list. You can look through this and see for yourself if the data you are interested in is there

Get followers and who a user is following

Getting followers and who a user is following is (at the moment?) a little tricky to use, since TikTok blocks requests to a users profile page with anti-scraping measures. To circumvent that, you can open a users page in your browser and then right-click to show the source code:¹

You can then search for and copy the authorSecId value:

Once you have this authorSecId you can look up a maximum of 5,000 followers per account:

tt_get_follower(secuid = "MS4wLjABAAAAwiH32UMb5RenqEN7duyfLIeGQgSIx9WtgtOILt55q6ueUXgz4gHqZC5HFx4nabPi",
                verbose = FALSE)
#> 
#> # A tibble: 1,116 × 27
#>    avatarLarger             avatarMedium avatarThumb commentSetting
#>    <chr>                    <chr>        <chr>                <int>
#>  1 https://p16-sign-sg.tik… https://p16… https://p1…              0
#>  2 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  3 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  4 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  5 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  6 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  7 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  8 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  9 https://p16-sign-va.tik… https://p16… https://p1…              0
#> 10 https://p16-sign-va.tik… https://p16… https://p1…              0
#> # ℹ 1,106 more rows
#> # ℹ 23 more variables: downloadSetting <int>, duetSetting <int>,
#> #   ftc <lgl>, id <chr>, isADVirtual <lgl>, nickname <chr>,
#> #   openFavorite <lgl>, privateAccount <lgl>, relation <int>,
#> #   secUid <chr>, secret <lgl>, signature <chr>,
#> #   stitchSetting <int>, ttSeller <lgl>, uniqueId <chr>,
#> #   verified <lgl>, diggCount <int>, followerCount <int>, …

Likewise, you can also check who this account follows:

tt_get_following(secuid = "MS4wLjABAAAAwiH32UMb5RenqEN7duyfLIeGQgSIx9WtgtOILt55q6ueUXgz4gHqZC5HFx4nabPi",
                 verbose = FALSE)
#> 
#> # A tibble: 489 × 28
#>    avatarLarger             avatarMedium avatarThumb commentSetting
#>    <chr>                    <chr>        <chr>                <int>
#>  1 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  2 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  3 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  4 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  5 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  6 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  7 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  8 https://p16-sign-va.tik… https://p16… https://p1…              0
#>  9 https://p16-sign-va.tik… https://p16… https://p1…              0
#> 10 https://p16-sign-va.tik… https://p16… https://p1…              0
#> # ℹ 479 more rows
#> # ℹ 24 more variables: downloadSetting <int>, duetSetting <int>,
#> #   ftc <lgl>, id <chr>, isADVirtual <lgl>, nickname <chr>,
#> #   openFavorite <lgl>, privateAccount <lgl>, relation <int>,
#> #   secUid <chr>, secret <lgl>, signature <chr>,
#> #   stitchSetting <int>, ttSeller <lgl>, uniqueId <chr>,
#> #   verified <lgl>, diggCount <int>, followerCount <int>, …

list.files(pattern = ".mp4") |>
  unlink()
unlink("rstats", recursive = TRUE)