Washington Post is collecting TikTok user data from its followers for public good
I wrote about communities collecting twitter user data so that we can collectively learn about ourselves, and now the Washington Post is doing the same thing!!
I wrote about my crazy vision 2 days ago, about how the twitter API is locked down and that has killed a lot of community applications, but they can’t stop us from exporting our own data and sharing it with each other.
There was some good (and skeptical) discussion of this on HackerNews.
Yesterday I found out that the Washington Post is actively doing something very similar on TikTok with their followers! They’ve collected data from 800 users and watch history on 55 million videos so far according to this video.
They data they’re collecting is private, but the analysis they’re doing is open. They’re telling you exactly what they’re doing, and users are sharing their data because they’re curious about the answers to these questions. This is just as I predicted: users share their data when it benefits them. If companies won’t share the insights about society & culture that they know about us, we’ll do it ourselves.
It’s working.
I love this question and I’m extremely curious to see the results, of how much people THINK they spend on TikTok vs how much time they actually spend on it (this one is a fun game that anyone can do locally, without giving up your data to anyone, and we can have global results for this, on other apps too!)
Involving the public when studying the public
Here is a data journalist from WaPo, asking on twitter if we have any ideas for analysis they’re doing. This is incredible, I love to see it!!
I don’t use TikTok very much, but if you do you should share your thoughts!! Or put them in a comment here and I’ll tweet them along.
We were just discussing something very similar in the twitter community archive Discord, that maybe we could make the data private. I didn’t like this idea at first because, it feels like it defeats the purpose. I don’t want user data to end up in just another pot. The whole point is that I am curious, I have questions about society & culture, and big tech won’t let me analyze this data (and even if they did let me, it wouldn’t be right, because users haven’t consented).
But what if the data was private but the analysis open source? If you could open a PR with a new aggregated view or query on the twitter archive? And then you can get your answers if others in the community think it’s interesting too. If you really want to do something that this specific community doesn’t seem to like, you can always just implement the analysis tool, and ask users to collect their data (or someone with a bigger following can take your tool and ask their followers to submit to it and publish the results).
Basically, if people really want something to happen, they’ll make it happen. Things that not enough people are excited about, or are against, will be much more difficult. All of this is a feature, not a bug!
Some ideas for TikTok data analysis
What’s actually in the TikTok archive? I scanned through mine (which is very scarce)
The browsing history, & like list only contains links to the videos, no other semantic content:
The search terms does itself contain data that might be interesting to visualize! (interestingly, I’ve only ever searched for ONE thing on TikTok, and it’s “LLMs” 😄)
They DO show you “ad interests”, which I think is super important for users to understand, that companies can infer things about what you like. This one I think I’d love to make a game out of, your own personal data (like, does TikTok know MORE or LESS than what you *think* it knows about you?). Like imagine it gives you a series of categories, some real some fake, and you vote whether you think TikTok knows this about you or not,
Best I can come up with is:
Cluster users around videos watched. Is everyone watching the same videos? Is everyone watching very different things, but some videos are very common amongst everyone?
What does the “viral rate” look like for videos? If you find the most common video across the dataset, can you chart “its path”, what’s the timeline of it appearing in people’s watch histories?
How well of a predictor is “ad category” of what people watch? If you cluster people that have the same ad categories, do they watch the same videos?
If we had ANY metadata about the video itself (general category, or semantic content description generated) I think we could do much more interesting analysis.
There’s other stuff like “order history”, “DMs”, posts etc. There’s also a mysterious “Off TikTok Activity.txt” which I’m super interested in, what do they know?? what are they tracking???
I think some of this can freak users out, and the company may feel like this is unfair to call it out, but I think the correct response should be:
Have TikTok call out other companies, “FB and Google do this too! Don’t believe us? Go export your data from them and check yourself!”
This is great, now users are extra curious and they, in collaboration with data journalists, can repeat this analysis on other platforms. Decentralized, bottom up data journalism!
People realize that both are true: (1) companies collect a TON of data about us (2) it’s not the end of the world. It DOES let them create a very good maps of society & culture. I think those are genuinely useful, I think those should be public.
The next big frontier I’d love to see is: user archival of ads. The ads I see make sense to me, because they’re heavily targeted at me. I never get to see ads targeted at [the other political side], or groups & subcultures that I don’t even know exist. What do they look like? What is the messaging say? Are my parents getting more manipulative ads because they’re not as tech-savvy as I am?
Will leave you with this beautiful piece of fiction:
Apparently Jeremy B. Merrill has been doing this for a long time!
- in 2018 they collected 100k targeted facebook ads, contributed by 16,000 users - https://www.propublica.org/article/facebook-political-ad-collector-targeted-ads-what-we-learned
it's incredible to me the scale of this. They made a plugin to anonymously collect ads people see. Partnered with Mozilla for outreach. Then a followup plugin that tied people you "YouGov" data (demographics, political leaning, race, state of residence, but not name or address).
highly suspicious: " mysterious Facebook page called “America Progress Now” urging liberals to vote for Green Party candidates. The candidates themselves had never heard of the group, and we couldn’t find any address or legal registration for it"