Taming Big Tech – Hacker Noon

The Nielsen Company, founded by the American Arthur C. Nielsen in 1923, keeps track of the number of viewers that television shows attract in 47 countries. In the U.S., those numbers — the all-powerful Nielsen ratings — determine whether shows stay on the air and how much money companies must pay to advertise on them. Nielsen started collecting TV data in the U.S. in 1950 by convincing families around the country to hook up a device to their televisions that tracked their viewing. Today, the company relies on data obtained from thousands of such families to tabulate its ratings.

For this tracking system to produce valid numbers, secrecy is essential. With hundreds of millions of dollars in production costs and advertising revenues on the line, imagine the lengths to which interested parties might go to influence a Nielsen family’s viewing habits — or, for that matter, to tamper with those set-top devices.

Could we, I wondered, set up a Neilsen-type network of anonymous field agents, and could we develop the equivalent of a set-top device to monitor what these people see when they conduct searches using major search engines?

Late in 2015, my associates and I began to sketch out plans for setting up a nationwide system to monitor election-related search results in the months leading up to the November 2016 election. This is where that old dictum — “There is a fine line between paranoia and caution” — came into play. It seemed obvious to me that this new system had to be secret in all its aspects, although I wasn’t yet sure what those aspects were.

To get started, we needed funding, but how does one approach potential donors about a system that needs to be secret and that hasn’t yet been devised?

Here I will need to leave out some details, but let’s just say I got lucky. I explained the situation to one of my political contacts, and he referred me to a mysterious man in Central America. He, in turn, spoke to… well, I have no idea, really — and a few weeks later a significant donation was made to the nonprofit, nonpartisan research institute where I conduct my research.

Those funds allowed us to get going, and one of the first things we did was to form an anonymous LLC in New Mexico to oversee the new project. The head of the LLC was a reanimated “Robin Williams,” and the address was that of a house that was listed for rent in Santa Fe. In other words, the LLC was truly a fiction, and it could not easily be traced to me or any of my staff members, each of whom had to sign nondisclosure agreements (NDAs) to have any involvement in the project.

We called the new corporation “Able Path,” and this deserves some explanation. In 2015, Eric Schmidt, formerly head of Google and then head of Alphabet, Google’s parent company, set up a secretive tech company called The Groundwork, the sole purpose of which was to put Hillary Clinton into office. Staffed mainly by members of the tech team that successfully guided Obama’s reelection in 2012 — a team that received regular guidance from Eric Schmidt — The Groundwork was a brilliant regulations dodge. It allowed Schmidt to provide unlimited support for Clinton’s campaign (which he had previously offered to supervise as an outside adviser) without having to disclose a single penny of his financial largesse. Had he donated large sums to a Super PAC, the PAC would have been prohibited from working directly with the Clinton campaign. The Groundwork solved that problem.

Before the November 2016 election, if you visited TheGroundwork.com, all you got was a creepy Illuminati-type symbol, as follows:

You couldn’t get into the website itself, and there was no text on the page at all — just that creepy symbol. Our own new organization, Able Path, used a transformed version of The Groundwork’s inscrutable symbol for its own uninformative landing page:

At this writing, our Groundwork-style landing page is still accessible at AblePath.org, and, in case you haven’t figured it out by now, “Able Path” is an anagram of the name of Google’s parent company, Alphabet.

We conducted Able Path activities from California through proxy computers in Santa Fe, sometimes shifting to proxies in other locations, taking multiple precautions every day to conceal our actual identifies and locations. As you will see, the precautions we took proved necessary. Without them, our new tracking system would probably have turned up nothing — and nothing is definitely not what we found.

HOW TO SPY ON SEARCH ENGINES

To find programmers, we networked through friends and colleagues, and we ultimately had two coding projects running secretly and in parallel. One project — the real one — was run by an outstanding coder who had served time in federal prison for hacking. It was his job to oversee the creation of a passive, undetectable add-on for the Firefox browser that would allow us to track election-related search results on the computers on which it was installed.

The second project involved an independent software group whose assignment was to create a similar add-on for the Chrome browser, which is a Google product. Its main purpose was to give us something to talk about when we felt we needed to tell someone about the project. It was, in other words, our cover story. Neither coding group knew about the other, all of the coders signed NDAs, and payments were made with money orders.

By early 2016, we were testing and refining our new add-ons, so the time had come to recruit a diverse group of field agents willing to install our new browser add-ons and to keep their mouths shut. Unfortunately, every method we used to try to recruit people failed — sometimes dismally. Ultimately, we had to make use of the services of a black-hat marketing group that specialized in serving as a buffer between Facebook and rather sketchy businesses.

That is a world I know little about, but the general idea is that Facebook is picky about the ads they run, which made it tough for the fledgling Able Path LLC to place ads that would run for more than a few minutes before Facebook took them down. Our paid ads were small, innocuous, and honest (although a bit vague), but we could never get them through Facebook’s human or algorithmic censors.

We focused on Facebook because it offers the most precise demographic targeting these days, and we needed to reach a diverse group of eligible voters in a variety of U.S. states who used the Firefox browser. By serving as a middleman between us and Facebook, the black-hat group was able to place and refine ads quickly, eventually getting us just the people we needed. Because our field agents had to sign two NDAs, and because we were asking them to give up sensitive information about their online activity, the daily give-and-take between our staff members and our recruits was intense. Our difficulties notwithstanding, over a period of several months, we successfully recruited 95 field agents in 24 states.

One issue that slowed us down was the email addresses people were using. Most of the people who responded to our ads used gmail, Google’s email service, but because Google analyzes and stores all gmail messages — including the incoming emails from non-gmail email services — we were reluctant to recruit them. We ultimately decided to recruit just a few to serve as a control. Google could easily identify our field agents who used gmail. Would this control group get different search results than our non-gmail users? The company takes pride in customizing search results for individuals, after all, so anything was possible.

On May 19th, 2016, we began to get our first trickle of data. Here is how it worked:

When any of our field agents conducted an online search with the Google, Bing, or Yahoo search engines using any one of the 500 election-related search terms we had provided, three things happened almost instantly. First, an HTML copy of the first page of search results they saw was transmitted to one of our online servers. Second, that server used the 10 search results on that page of search results to look up and preserve HTML versions of the 10 web pages to which the search results linked. And third, all of this information was downloaded to one of our local servers, preserving a code number that was associated with the field agent, the date and time of the search, the 10 search results and their search positions (that was important!), and the corresponding 10 web pages. We deliberately preserved HTML versions of everything (rather than image versions) to make it easier to analyze content.

We could adjust the list of search terms as we pleased, and we eventually reduced the list from 500 to 250, removing most of the search terms that were, according to independent raters, inherently biased toward Hillary Clinton or Donald Trump. We did this to reduce the likelihood that we would get search rankings favoring one candidate simply because our field agents were choosing to use biased search terms.

We knew various demographic characteristics of our field agents, the search terms they were using, what search results they were seeing, and what the web pages looked like that linked to those search results. In other words, we were indeed now looking over the shoulders of real internet users as they conducted election-related searches.

So what were they seeing? Did the search results favor Clinton, Trump or neither one? If search results favored one candidate — that is, if higher ranked results connected to web pages that made one candidate look better than the other — did the favoritism vary by demographic group, and did it differ among the three search engines we were monitoring? Were our gmail users seeing anything different than our non-gmail users?

We now had the ability to answer these questions. At some point, I realized that we also had far more than a modest system for monitoring a presidential election.

read original article here