Google News screen

How Algorithms Decide What is Newsworthy and What is Not

How Algorithms Decide What is Newsworthy and What is Not

February 8, 2016

By Afef Abrougui

New media is often credited with democratizing news production and distribution thanks to the Internet’s decentralized nature which empowers users to do their own reporting on a variety of topics such as corruption, anti-government protests or natural and environmental disasters.

New media does not only allow users to produce and distribute content, but also empower them to access diverse viewpoints. In his book Public Parts, journalism professor Jeff Jarvis describes the net as “everyone’s printing press”, adding that “all of us no longer watch the same, shared news with the same, one-size-fits-all viewpoint”. Even under oppressive regimes that extensively filter dissenting voices and those who cross imposed red lines, the most sophisticated of users can still access banned content and anti-government opinions thanks to circumvention technologies, despite the limitations of such tools (Morozov 2011).

New media is not only empowering those living under dictatorships, but also citizens of western democracies. In the United States, Americans have been using their mobile phones to document and widely disseminate footage of police abuse.

The internet may have empowered bloggers, citizen journalists, video makers, photographers and artists to produce and share their own work without the interference of editors and publishers, but it did not completely eliminate intermediaries. As the popularly recognized media critic Eli Pariser notes, the “middlemen” have been replaced by “algorithmic gatekeepers”.

As we are increasingly relying on the virtual networks and communities we are part of to stay updated with current events, algorithms of online platforms and services such as Facebook and Google News are increasingly governing our access to news and deciding for us what is newsworthy and what is not, a role once played by editors and traditional media (Gillespie 2014, Tufekci 2015).

This essay explores how algorithmic gatekeepers and personalised news services are increasingly deciding which news to pay attention, and the implications of such algorithmic filtering on news consumption.

Online News Consumption and the Rise of Algorithmic Curation

News sites and online newspapers get most of their traffic from search engines and social networking sites (Hindman 2008, Jarvis 2009). According to a study conducted by the Pew Research Center (PRC) and the Knight Foundation, 30% of US adults get their news on Facebook. This percentage is even higher among younger generations, with 61% of American millennials report getting political news on Facebook in a given week.

“Facebook’s News Feed was algorithmically suppressing” news updates about the Ferguson protests against racial discrimination.

The increasing role search engines and networking sites play in directing their users to news content, means algorithms are increasingly deciding what is newsworthy and what is not (Tufekci 2015, Pariser 2012). These services and platforms track their users’ behaviors and habits to make profits from targeted advertising. In the process, they deliver personalized results and filtered feeds (Turow 2011, Pariser 2012).

For instance, stories that appear on Facebook’s News Feed are influenced by a user’s connections and activity on the site. Posts that receive high numbers of comments and likes, in addition to videos and photos are more likely to be prioritized by the site’s EdgeRank algorithm. Yahoo News relies on a combination of algorithmic gatekeepers and human editors to keep its users updated with the latest events, while stories and results on Google News are only generated by computers.

A number of media critics and theorists have expressed concerns that the reliance on the algorithms of online platforms and personalised news services in governing our access to content deemed “relevant” to our preferences and interests, could have implications on democracy by excluding noteworthy and general-interest news. (Gillespie 2014, Morozov 2013, Pariser 2012, Sunstein 2009, Tufekci 2015).

When algorithms filter #ferguson and #wikileaks

In August 2014, professor at the University of North California Zeynep Tufekci documented how “Facebook’s News Feed was algorithmically suppressing” news updates about the Ferguson protests against racial discrimination and inequalities in the wake of the deadly shooting of 18 year-old African-American teenager Michael Brown by a police officer. It was not only Tufekci who noticed this filtering. Several users have also taken to Twitter and Facebook to complain about the “de facto censorship” of Ferguson from the most popular social networking site in the world.

On the other hand, the #ferguson hashtag was trending and rising in popularity on twitter thanks to its unfiltered reverse chronological stream. Twitter’s uncurated feed allows human editors to make decisions about the newsworthiness of content and to amplify it “without strong algorithmic biases”, while Facebook’s algorithmic gatekeepers tend to reward what is already popular, as the site’s News Feed prioritizes viral content like videos and photos that get a high number of shares, comments and likes. In fact, when the unrest in Ferguson started, Facebook was brimming with videos of the ice-bucket challenge which could explain why it took hours before updates about Ferguson made it to the feeds of Tufekci and other users.

Twitter Trends algorithm has also previously come under criticism and accused of preventing major topics and political events like the release of the #wikileaks cables in 2010 and the Occupy Wall Street protests in 2011 from trending. In response to criticism and censorship accusations, the company has since sought to clarify how its Trends algorithm works:

“Trends are determined by an algorithm and, by default, are tailored for you based on who you follow and your location. This algorithm identifies topics that are popular now, rather than topics that have been popular for a while or on a daily basis, to help you discover the hottest emerging topics of discussion on Twitter that matter most to you”.

The company also explained why the #wikileaks hashtag in late 2010 did not trend:

“Sometimes, popular terms don’t make the Trends list because the velocity of conversation isn’t increasing quickly enough, relative to the baseline level of conversation happening on an average day; this is what happened with #wikileaks this week”.

Though discussions about the release of the US diplomatic cables were happening every day, there was not a significant and a quick spike in the volume of tweets at a given moment to make from the topic a twitter trend. In addition, the Trends algorithm favors topics that have not trended before.

Complicated political events and news stories that are being discussed and develop over longer periods of time are excluded from trends.

According to communication professor Tarleton Gillespie, Twitter’s Trends algorithm “prefers novelty in public discourse over phenomena with a longer shelf-life”, “foster[ing] a public more attuned to the “new” than to the discussion of persistent problems, to viral memes more than to slow-building political movements”.

That is, while a protest movement like the Occupy Wall Street or Ferguson may trend at a given moment, they will soon be replaced by newer topics even when these movements continue for days or weeks and people are still tweeting about them. This definitely helps newer topics to get noticed, but it also means that complicated political events and news stories that are being discussed and develop over longer periods of time are excluded from trends because they have trended before.

The Occupy Wall Street and Ferguson cases are indicative of how social networking sites have developed algorithms that favor popular and viral content, sometimes at the expense of major political news.

These algorithms “make meme manufacturing easier” (Morozov 2013, 236), “help us navigate online platforms and social networks, based not on what we want, but on what all of their users do” (Gillespie 2012 ), or “attempt to turn the whole internet into a most popular list” (Pariser 2012, 71).

No absolute free will for users

With the unsophisticated ways people seek information online, it could be problematic when viral or personalised content is prioritized over depressing but important news such as Ferguson or refugees drowning in the Mediterranean.

Journalism and media studies professor at the Rutgers School of Communication and Information Philip Napoli argues that when they consume news and information online, users “engage in less purposeful, directed information-seeking and rely instead on the operation of their social media platforms, and the behaviors of the individuals and organizations within their social networks, to place relevant news and information in front of them”.

According to him, what differentiates news production and distribution on the internet from the traditional models is the absence of “institutionalized representations of the public interest”. Though general-interest news organizations upload and publish content online, it is up to the users to judge the worth of the content, to view it and disseminate it on social media. Napoli then concludes: “Users’ tastes, preferences, and inclinations to disseminate are much more directly determinative of the news and informational character of the platform as a whole (and thus its service to the broader public interest) than was ever the case in traditional news media”.

On the surface it may seem that social media users have absolute free will on the platforms they are subscribed to. However, these platforms are governed by algorithms, which are the result of the decisions and the choices of the programmers and the institutions that develop them (Pariser 2012, 175). Since code is law, algorithms regulate the cyberspace, thus determining what users can or cannot do online, how they interact with others online and which content is “relevant” to them (Gillespie 2014). For instance, Facebook users can “like”, “comment” or “share”, but they cannot dislike.

Having noticed that a video she posted on Facebook about children refugees wading among floating bodies off the coast of Greece was not gaining visibility because it is not and could not be “likable”, Tufekci wrote that adding a “dislike” icon or any other similar button is needed to undermine what she describes as the “tyranny of the like”. The higher a number of likes a Facebook post receives, the more visible it gets. A counterpart to the “like” icon is thus needed to signal to the algorithm that a video about refugees may not be “likable” but it is still important.

In the“Like economy”, “users are constantly prompted to like, enjoy, recommend and buy as opposed to discuss or critique.”

Despite the demand for it, Facebook has so far refrained from adding a “dislike” icon for economic reasons. In the“Like economy”, “users are constantly prompted to like, enjoy, recommend and buy as opposed to discuss or critique – making all forms of engagement more comparable but also more sellable to webmasters, brands and advertisers” (Gerlitz and Helmond 2013). In other words, adding a “dislike” icon could be disruptive to the business model of Facebook and similar companies, which could result in loss of profits.

The company has recently announced “a more expressive Like button”. When users click on the “like” button, they will have the option to react to a post with a heart, a sad face, an angry face, or an astonished face. However, it remains unclear how these reactions would affect the placement of a post in the news feed of a user. Will facebook’s algorithm, for instance, prioritize Tufekci’s post if it receives a high number of sad and angry faces, or will it place it below in the timeline?

Personalized news and the decline of general interest news

With the online content deluge, filtering helps users quickly find “relevant” information. For American legal scholar Cass Sunstein, “filtering is inevitable, a fact of life. It is as old as humanity itself”. He explains in his book Republic 2.0: “No one can see, hear, or read everything. In the course of any hour, let alone any day, every one of us engages in massive filtering, simply in order to make life manageable and coherent”.

Personalized recommendation systems and algorithmic gatekeepers are therefore needed to help us navigate through the information overload. However, when it comes to the “democratic domain”, and in particular general-interest news the stakes are higher than a mere shopping recommendation on Amazon or a book suggestion on Goodreads.

While personalized news services, and customization options are increasing, the role of intermediaries that provide general-interest news is in decline (Sunstein 2009, 21-21). This is a trend that will very likely continue in the years to come as more services announce or hint at more algorithmic filtering.

Speaking at a 2014 technology conference in New York, Twitter CFO Anthony Noto said the service’s reverse chronological order “isn’t the most relevant experience for a user”, suggesting that the unfiltered feed may soon be replaced by an algorithmically-filtered one.

In June, Apple announced its News application which “conveniently collects all the stories you want to read in one place, in a customized News Feed called For You”. Apple tells its customers that the more they read, the better the application “gets at understanding [their] interests, refining the selection of stories delivered to [their] screen[s] so they are relevant to [them]”.

In addition, a number of major news publications and broadcasters have also started allowing their audiences to customize the type of content that appear to them when they visit their sites’ homepages. The Wall Street Journal has a ‘my journal’ page to which the paper’s readers can log on and customize the news they receive. The British Broadcasting Corporation (BBC) allows its audiences inside the UK to customize the bbc.co.uk homepage by adding and removing topics that interest them, getting recommended TV shows, and adding their locations to get local news. The site’s users, however, cannot hide top stories and the three latest news headlines that appear on the homepage.

When it comes to the “democratic domain”, the stakes are higher than a mere shopping recommendation on Amazon.

Joseph Turow, a professor at the Annenberg School for Communication at the University of Pennsylvania and author of The Daily You: How the New Advertising Industry is Defining Your Identity and Your Worth, warns that in the future users of one news site or service may not even get the same headlines, as we “[enter] a world of intensively customized content”. In his book, Turow argues that under increased economic pressures, more online news providers including traditional and established media institutions, will consider or start delivering customized news stories to keep readers clicking and make as much profit as possible from targeted advertising

Access to general-interest news is fundamental to democracy, as it “shapes our sense of the world, of what’s important, of the scale and color and character of our problems” (Pariser 2012, 50).

For Sunstein, under a well-functioning system of free expression citizens are exposed to topics and viewpoints they were not anticipating or even find “irritating”, and to a range of shared experiences (17-18). Such exposure is enhanced by general-interest intermediaries, papers and magazines that have editorial meetings everyday or week to decide what are the important topics and news the public should know. To a large number of readers, these topics may seem dull and complicated, and in today’s online news industry they do not drive much clicking. Yet, in democracies publishing articles about topics such as corruption, police abuse or domestic violence is necessary to attract public attention, and drive citizens to hold their governments accountable by protesting or through the ballot box.

Case study: Diversity across arabic-language editions of Google News

Launched in 2002, Google News has grown to become one of the largest news sites, drawing an estimated 150 million unique visitors per month. The service exclusively deploys computers to aggregate stories from more than 50,000 news sources. The technology company states that diversity is among the factors it takes into consideration when ranking news headlines. Stories are “ranked by computers that evaluate, among other factors, how often and on what sites a story appears online”, and “based on certain characteristics of news content such as freshness, location, relevance and diversity”.

The service provides five different editions in Arabic for the entire Arabic-speaking region, the United Arab Emirates (UAE), Saudi Arabia (KSA), Egypt and Lebanon. The study aims at identifying to what extent do the different Arabic-language editions of Google News provide results from diverse sources covering a major political event, in this case the recent Saudi Arabia-Iran row.

Relations between the two rival countries has deteriorated after Saudi Arabia’s execution of Shiite cleric Nimr al-Nimr, in early January. A critic of the Saudi royal family, Nimr was arrested during anti-government protests in 2012. In reaction to his execution, protesters stormed and set on fire the Saudi embassy in Tehran, prompting Saudi Arabia and a number of its allies in the region including Bahrain, the UAE, and Sudan to cut or sever ties with Iran. In Lebanon, the Shiite Lebanese militia group Hezbollah denounced ‘criminal’ Saudi Arabia for the execution, while in Iraq thousands took to the street to protest against the conservative kingdom.

The methodology consists in collecting the news headlines related to this diplomatic crisis as they appear on the different front pages accessed between 18:00 and 18:04 Netherlands time on Monday, 4 January. I have then searched for the location and ownership of each media source to identify to what extent are the sources diverse.

The number of headlines collected for all editions from 13 different sources is 33, while the number of distinct stories is 22 (a number of headlines that appeared in one page also appeared in other pages). Most importantly, 18 of the headlines are by Saudi news sites that are pro or owned by the government. In addition, five more headlines are by media owned by the Egyptian and UAE governments which have backed Saudi Arabia. Only two headlines are by sources that are anti-Saudi Arabia: they are the Iran-owned Al-Alam news network and the Lebanese daily Assafir. Five of the seven headlines that appeared on the ‘Arab World’ front page are by Saudi sources close to the government. The remaining two headlines are from Egyptian sources. The below table show the representation of sources.

The dominance of Saudi sources is unsurprising since the row with Iran has been widely covered by the local media and discussed on social media by Saudi users. In addition, the fact that these sources are close to the government is a reflection of the restrictive environment for media in the country. However, the lack of diversity particularly on the “Arab World” front page raises questions about Google News’ diversity promise, and the efficiency of its algorithm in turning diverse viewpoints and sources.

In fact, though several Arab governments have expressed their support to Saudi Arabia, the media coverage has taken different directions. The Lebanese daily Assafir, for instance, has a special coverage page on the execution of Nimr and critical of Saudi Arabia. While in Tunisia, local media have focused on the official reaction from the Ministry of Foreign affairs denouncing the attack on the Saudi embassy, and the condemnation of Nimr execution by Tunisian political parties and human rights groups. Finally, there is the more objective coverage by Arabic-speaking international news agencies and channels such as the BBC and Reuters.

On its news help forum, the Google explains how stories on the front page are selected:

“Our headlines are selected by computer algorithms, based on factors such as the timeliness of a story, its newsworthiness, and its originality. Google News has no human editors selecting stories or deciding which ones deserve top placement. The front page of Google News is always changing, and we’re working to make sure that it reflects a diversity of articles and sources”.

Since stories on Google News front page are “always changing”, it is possible that content from pro-Iran sources have appeared at other times. So, I checked the “Arab World” front page again on 6 January at 11:00, 16:00, 18:00 and 20:00 Netherlands time. Out of 24 headlines, seven are by Saudi sources that are pro government, while eight are by arabic speaking international media organizations such as Reuters and the BBC. Only one source is pro-Iran, which is a Syrian news site.

Google News provides the most recent content in its different editions through an algorithmic crawl process taking a into consideration a number of characteristics as “freshness, location, relevance and diversity”. When it comes to the recency and relevance of a story, Google algorithm may make flawless decisions, at least most of the times (in 2008, the algorithms judged a story as recent while it occurred in 2002).

Google says that stories are “sorted without regard to political viewpoint or ideology”. But how can the algorithm ensure diversity without judging the political viewpoint or ideology of a story or a media organization?

In addition, even if the algorithm is able to make judgements about established media organizations such as the Washington Post, Fox News, the BBC or CNN, can it make judgements about bias or the viewpoint in each story published?

“When it comes to news, however, while the Algorithm can make judgments about news organizations, it cannot gather the judgments that millions of others have rendered on each news story as they read it”, writes business professor Randall Stross in his 2008 book Planet Google.

As one of the most popular news sites, Google News is a gateway to local and global news, and though the service promises to deliver diversity, this study has shown a lack of representation of sources that have spoken out against Saudi Arabia in its recent row with Iran. This can be a reflection of Google algorithm’s limitations in delivering diversity of sources and views. Or, at least this diversity cannot be guaranteed at all times.