For our project tracking image censorship on Sina Weibo, a popular social networking service in China, ProPublica wrote software to monitor a set of 100 Weibo user accounts to detect censored images.
In collaboration with outside researchers, we began collecting the posts and reposts made by the accounts starting on July 3, 2013. We have assembled a database of nearly 80,000 posts. Of those, 524 contained images that were censored by Sina Weibo during an observation period we established between July 24 and August 4, 2013.
Our goal in assembling our selection of accounts was to increase our chances of observing image deletion. The users and posts were not chosen at random, and you should not generalize our findings to larger populations.
What is a Weibo Post?
Similar to Twitter, Weibo posts have a 140-character limit, and allow users to attach an image. For our app, ProPublica only analyzed posts that included an image, though we cannot be certain if a post was deleted due to the text, the image, or both.
Like many social media services, Weibo provides an Application Programming Interface, or API, to give programmatic access to Weibo posts to other software, such as mobile apps, websites that include Weibo content, etc. The API returns a JSON object, much as the Twitter API does, containing the message text as well as a host of metadata about the message. This is the API we used to collect data.
The Weibo API changed two weeks after we began our collection period and required that we deploy code fixes to restart our collection scripts.
Our Detection Technique
Starting in July, our scripts checked the Weibo API every six minutes and collected any new posts or reposts from the users we were observing. To determine if a post had been deleted, a separate, hourly script checked whether those posts still existed.
A deleted post resulted in one of two responses from the Weibo API. The first was error code 20101, “Target Weibo does not exist!" This was the error our script received after we uploaded a test image using a Weibo account and then manually deleted it. We counted posts that returned error code 20101 as deleted by the user.
The second error code was 20112, with the message "Permission Denied!" Researchers have concluded that this is an error code that cannot be the result of user activity and have used it as an indication of censorship. We counted posts returning error code 20112 as censored.
It is imaginable that some posts returning 20101 are deleted by a censor and not the user. Further research may prove that to be the case. Our choice reflects current research by others and is, we believe, the most straightforward and conservative method available to us.
The 100 Users
Since the goal was not to collect a representative sample of all Sina Weibo posts, but to collect as many censored posts as possible, researchers assembled a subset of users who had previously posted content that was later censored. They picked 50 users from accounts found on WeiboScope, a University of Hong Kong project that archives deleted content on Weibo. They then added another 50 users, finding members of the first cohort’s Weibo circle who posted similar content.
All 100 users have a minimum of 2,000 Weibo followers. Researchers sought out users who said they were journalists or lawyers. Members of these professions, in the judgment of our researchers, are more likely to exhibit behaviors that would cause them to be censored, such as vocalizing social criticism and posting messages about human rights violations online (in Chinese).
The 100 users did not know about our project, and the only posts that we collected were ones the users made public.
Between July 24 and August 4, ProPublica collected a total of 7,972 posts, 1,710 of which were deleted by the time we published. Of the deleted posts, ProPublica identified at least 557 posts that were removed by censors.
We selected these two weeks for our observation period because during this time frame, many topics forbidden by the Chinese government gained traction on Weibo, such as the indictment of former politician Bo Xilai, the arrest of dissidents Xu Zhiyong and singer Wu Hongfei, a scandal involving celebrity faith-healer Wang Lin, as well as demonstrations in Chongqing.
We removed 33 images from our final presentation because they contained sexually explicit material, leaving us with a final count of 524 images in our published collection.