This post is based on the Searsia lightning talk for the 2018 New Year meeting of the Dutch Internet Society, for which the slides were appropriately made in HTML.

(top picture by @bitsoffreedom@twitter.com)

The following poem, which according to the poet, @lifning@cybre.space, has been “floating around the fediverse”, perfectly describes what’s wrong with the world wide web today:

    roses are red
    violets are blue
    in surveillance capitalism
    a poem reads you

    and shows you ads
    for flower shops
    and tracks your clicks
    and never stops

    it cares not about
    if privacy's harmed
    the money is green
    when people are farmed

    twitter is cyan
    facebook is blue
    your friends are the product
    and so are you

Surveillance capitalism

In surveillance capitalism, companies compete by collecting as much data as possible through surveillance. They sell that data, or they use the data to predict who is most likely to buy a product or service.

Why do companies use surveillance? They will (have to) explain this in their terms-of-service: You know, the lengthy disclaimer that you accepted without reading. Typically, the terms-of-service of companies like Google and Facebook claim that collecting user data is necessary for at least two reasons: 1) to improve the service and 2) to show targeted advertisements.

1. To improve the service

An example of a search service that is improved with user data is Google’s query autocompletion service. Google stores the queries of all its users to predict what you search for. Autocompletions are an effective feature. They help users to formulate better queries, and they save users keystrokes: Very helpful if you are searching on your mobile phone. The image below shows autocompletions for the prefix “surveillance”.

Query autocompletions for 'surveillance'

2. To show targeted advertisements

The main reason for collecting user data is, of course, to show targeted advertisements. Why are targeted advertisements so profitable? This vintage news paper advertisement for a pain killer is only effective to people that are in pain. In the old days, the only way to target such an advertisement, was to choose a news paper that is read more by people in pain, and target pages in the news paper that are read most by the same people, for instance the health pages. Today, Facebook and Google will be able to find the individuals among their users that are likely in pain, and show the advertisement only to them. This makes targeted advertisements much more profitable

Vintage pain killer advertisement (1908)

Companies like Facebook and Google would like you to think that they must collect data to improve services and show targeted advertisements. We work at Searsia to debunk this.

Query autocompletions without tracking users

Let’s have a closer look at query autocompletions. Using user data to predict query autocompletions causes several problems. For instance, Google actively promoted nazi propaganda for innocent queries like “did the h”, and even worse, the first results for those queries would be pages from neo-nazi sites like Stormfront that deny that the holocaust happened.

Google autocompletions suggesting 'did the holocaust happen'

Many small organizations and individuals were damaged by autocompletions. There have been lawsuits in France, Italy, Germany, and Japan, where individuals could for instance not get a job because Google suggested offensive completions for their name. Google started to actively filter completions for person names after these lawsuits, but its autocompletions continue to suggested terms which could be viewed as racist, sexist or homophobic.

At Searsia, we do not track user queries for autocompletions. Instead, we use the anchor texts of web pages, that is, the text of hyperlinks. We compared its performance with autocompletions based on user queries for a general web search task. Our approach that uses hyperlinks is as effective as the approach that uses user data. The results below show that for short queries, we need 1 more keystroke to predict the full user query. For long queries, our approach outperforms the approach based on user data.

Experiment: Autocompletions from web content vs. Autocompletions from user data

Targeted advertisements without tracking

Now, you might think: Sure, you can improve services without tracking people, but it is impossible to target advertisements without tracking people. Not quite. Remember Searsia is a search engine. Users tell a search engine what they are looking for. If you are looking for a car, Searsia can show you advertisements for cars. The advertisements are based on the search terms, not on the person.

Targeted advertisements by Searsia are based on the query, not on the person, so there is not need to track individual users. Images and banners are provided via a local proxy, so the merchant or advertisement network will not know the users’ IP address unless they click on the advertisement. Advertisements do not use third party Javascript, third party cookies, or web beacons to prevent further privacy leakage via advertisements.

Advertisements for the query 'Amazon'

The requirements for Javascript and cookies above do not allow Searsia to display ads from for instance Google’s DoubleClick. However, they are perfect for displaying advertisements from affiliate programs and networks. Searsia connects big affiliate programs APIS of for instance Amazon. It provides local search from datafeeds of affiliate networks like ShareASale. The image above shows example advertisements provided by Dr. SheetMusic for the query amazon, which provides advertisements for charities if there are no targeted affiliate results.

Searsia provides free open source software for federated search. Federated systems are the answer to global corporations like Twitter, Facebook and Google that profit from tracking their users. In a federated system, no-one owns the complete service, so no-one is able to profit from tracking all user interactions. The poem that started this post was taken from Mastodon, a federated alternative for Twitter and Facebook. Just as Mastodon is a federated alternative for Twitter, Searsia’s technology might one day be an alternative for Google. If that happens, we know that there is no reason to track users, not even for providing autocompletions or targeted advertisements.

Acknowledgments

Many thanks to the Vietsch Foundation and NLnet Foundation for funding our work on query recommendations and search advertising without tracking users.