21 Sep 2018

Advertisers want their ads to appear next to quality content that has direct relevance to their target audience (the concept of “endemic advertising”). It sounds simple, but most marketers (64% according to one survey) say they have problems ensuring brand safety.

BulletinHealthcare sells ad space in publications written by our in-house editorial team. As an independent publisher, we control when and where an ad appears, so we can easily guarantee a brand-safe environment. However, our burgeoning audience extension program requires a different approach, since it involves running ad campaigns with third-party publishers.

Many advertisers employ white lists, which means they list every single website where their ads are allowed to appear. Finding reputable websites for whitelisting is a challenge. Ad tech companies offer off-the-shelf whitelists for many topics, but they sometimes contain low quality or out of date sites. Recently, we’ve added Wikipedia as part of the whitelist process for audience extension.

Wikipedia isn’t just a repository of human knowledge, it’s also a treasure trove for researchers. Google Scholar records more than 1.6 million references to Wikipedia. Wikipedia editors have a strict policy about the kinds of websites that are allowed to be used as references and an especially strict policy for medical references.

While the system isn’t perfect, websites used to perpetuate ad fraud or misinformation are removed and banned. Additionally, Wikipedia articles are meticulously categorized by topic. For example, the article for the landmark 1950 Wynder and Graham Study that exposed the cancer risks of smoking has categories such as “Cancer research,” “Lung cancer,” and “Health effects of tobacco.”

Taken together, the strict approach to sourcing and careful article categorization mean we can use Wikipedia to provide us with a list of reputable websites on virtually any topic. I wrote a parser to identify the most commonly-cited websites on Wikipedia for any category. You can check out my source code on Github.

For example, here are the 15 most commonly-cited websites for oncology-related articles on Wikipedia:

  1. progenetix.org
  2. web.archive.org
  3. cancer.gov
  4. cancer.org
  5. ncbi.nlm.nih.gov
  6. books.google.com
  7. nature.com
  8. nytimes.com
  9. linkinghub.elsevier.com
  10. cancerresearchuk.org
  11. onlinelibrary.wiley.com
  12. fda.gov
  13. who.int
  14. uspreventiveservicestaskforce.org
  15. clinicaltrials.gov

Realistically, this is just the first step. Once we have a list of top sites, creating a whitelist involves finding publishers who support our ad networks and giving the all-important human review to make sure all sites meet our standards.