In this modern world information travels faster than the speed of reason, so at The Daily Edit we go to great lengths to make our analyses as unambiguous and unbiased as possible. We want you to feel confident that you’re seeing the full story. This ethic permeates through every part of our operation, from how we train machine learning models to whom we hire. So, given that we tell you how trustworthy an article is compared to its peers, why should you trust us?

This post explains how our whole pipeline works, from selecting articles to be crawled, to finding the story’s details, to scoring each article. We’ll cover the parts that are completely objective, and the parts that have some subjective elements to them with an explanation of our rationale. We’ll even show you where we don’t perform so well. We’ll do our best here to explain it all in layman’s terms and will follow-up with several other blog posts going into the raw, unadulterated technical detail.

Overview

Analyzing the news takes a lot of work from a number of different pieces of software. A high-level overview of what happens can be seen below.

Before we dive in it’s best to list a few terms that will come up frequently and how we interpret them:

How do we choose articles?

Firstly, before we can do any kind of analysis we need to know that there is even a story. To do this we maintain a database of over 13,000 news sources which is updated weekly with new sources as we encounter them. Our crawler operates on a schedule. It periodically wakes up and starts looking at the sources to find any new articles they have published. When it encounters a new article it puts it in a scratchpad with all the other new articles that are found. At the end of a crawling run we gather articles with similar content and group them together, calling this grouping a ‘story’.

Stories evolve over time. More sources appear, existing sources edit their articles, some even remove their article altogether. To cover all these cases we have logic around when we reprocess stories. For starters, we refresh stories at most every six hours. We feel this is frequent enough for us to provide real value with our analyses without overburdening our servers with redundant work. During one of these refreshes, if we encounter an article we already have we’ll only refresh it once every 12 hours. This means we could miss some frequent edits on a breaking story, but by the time things have settled down we’ll have covered the changes. Keeping this relatively infrequent also reduces the burden we place on our sources’ websites.

While we don’t filter or discriminate sources in any way, we do have one technical limitation that causes us to remove some: poorly structured HTML or websites that load more articles infinitely. When we crawl a news article we’re crawling the HTML their website serves us. There are recommendations and some poorly-followed standards but for the most part HTML is the Wild West, the number of ways it can be organized are infinite. Most of the time we encounter reasonably well-structured HTML and we can extract the text content with ease, sometimes it’s a little more difficult and requires a sophisticated model to parse, other times it’s just plain diabolical. When we encounter a pathological source which our application can’t work with, we remove them from our database. This means that we might miss a detail or two, particularly if that source had the scoop, but trying to analyze text that might not be the article content will pollute all the other articles we cover with things like advertising text or image captions.

Reading the articles

At the end of this crawling process we have a collection of articles grouped into a  ‘story’ ready for analysis. Quite a few things happen during analysis starting with extracting metadata. Article metadata are items like its title, author(s), publisher, the time published and whether or not it’s an opinion piece. 

We then extract the article’s text content. We’re not interested in menus, advertising or images. What we want is the raw text content that makes up the piece. This process is rather technical and has several components itself so we’ll cover that in a post of its own. Worth mentioning, however, is that from time to time our model might leak some text that wasn’t part of the content into the analysis. Most often these leaks are image captions from the article. The ultimate effect of this is that we sometimes show a ‘more detail’ item which isn’t really relevant. We’re always working to improve this and are regularly reducing the occurrence rate.

How do we find details?

Once  we have the article’s raw text content we split it up into sentences. At first this might seem really simple, just split on the period, right? However, it’s one of those things that sounds easy but has labyrinthine complexity when you dig a little deeper. For example, what about a prefix like ‘Ms.’? Or an acronym like U.S.A? Or what about an acronym that someone just made up and placed right at the end of a sentence? Despite the challenges, we do eventually get nicely split sentences out of the article.

Why sentences though? When considering what makes up a ‘detail’ in any news story we scoured thousands of articles to see how journalists present information. A detail is some event or something that was said. Ideally it would contain context like who said the thing, and to whom it was said, and where, and why. The typical presentation for an entire detail like this is a sentence. Sometimes the context is added in adjacent sentences, forming a paragraph. We were faced with a choice, should sentences or paragraphs be the ‘atom’ when considering details? We went with sentences for a simple reason, the majority of paragraphs we researched contained more than one detail across its sentences. If we tried to analyze details this way we’d end up with all kinds of strange behavior since the semantic meaning of each detail would be mixed.

So, sentences it is! We now have a collection of them for every article in the story. Next we cluster them together across the sources in order to find consensus on their semantic meaning. That’s a mouthful, so what does it mean? 

In a news story the different sources are all reporting on the same thing, some might have fewer or more details than others but there will be a lot of commonality. We want to find all the details that have several sources covering them, that’s the clustering and consensus part. Additionally, we want to find these clusters regardless of the exact wording each source chose for its sentence. For example, let’s pretend there’s a story covering a new scientific paper on the effect of a Nutella-only diet. One detail may be that participants reported a marked increase in happiness in their daily lives. One source may write “survey respondents consistently showed an improvement in happiness” while another source may write “participants demonstrated a 10-20% increase in happiness when surveyed”. Despite the difference in words these are the same thing and we want to capture that. That’s the semantic part.

How we actually do this is horrendously technical and will be saved for our next post covering all those nitty-gritty details (pun intended). The level of consensus we need in order to call a cluster of sentences a detail depends on how many articles we have. Not every story is as earth-shattering as the Nutella diet one, some only get covered by a handful of sources. When we have less than 10 sources we only need 2 articles to form consensus with matching details. If we have up to 50 articles then that threshold is increased to 7 articles containing a shared detail. Beyond 50 we require at least 15 articles to present a detail for that detail to be considered. 

There’s no side-stepping that our choice of consensus levels is subjective. Every month we revisit these numbers and try to do better, what we have so far was chosen from trial and error with typical news stories.

You might be asking, but what about that one source which has something special the others didn’t cover? Unfortunately that will be left out of our analysis. There is no way for us to verify if that detail is at all valid or relevant to the story. During a breaking story this might cause us to miss things, however after just one hour of the story’s life we have enough to form consensus since sources tend to copy each other. 

There’s more to how we form consensus though. Here’s something fun a clever news conglomerate could do. Let’s say our conglomerate (we’ll call it Shoes Corp) has several dozen publishers in their organization. Shoes Corp could instruct each of these publishers to write the same superfluous details in order to trick our analysis software into thinking that they’ve covered some special detail. This would lead to these organizations receiving a higher score than others (more to come on that) and would unfairly favor Shoes Corp. To combat this twisted gamification, we adjust the scoring weight of each detail based on the number of unique sources that covered it. We maintain a database of correlated sources to do this.

At the end of this process we have the text content from every article and all of the details we found in the whole story. Now we go through each article and look for which details it did not contain. For each of these we then try to find a sentence within the article that is somewhat related to that missing detail. With that sentence we can highlight it in the app and give the reader a place to find the missing information with the right context. This is fun since we’re trying to connect the missing piece to something that might not have anything close to it in the article at all. Despite this we get it right most of the time but we’re always working to improve this feature in particular.

How do we find misleading text?

Next we look for potentially misleading pieces of text in each article. This can be a slippery slope, the bottom of which terminates with a sheer cliff. One person’s idea of misleading text might not be the same as another’s. Much discussion at The Daily Edit centers around this point but ultimately our plan of attack is to never consider anything misleading unless it can objectively be shown by the actual text we highlight.

This means that we do not highlight hyperboles, false dichotomies or straw man arguments. Instead, we highlight things like missing data references (“a recent study shows” – without a reference to the study), missing sources (“according to an anonymous source”) and scare quotes. Each of these can be verified by the reader just by looking at the text we highlighted. Either the data was referenced or it was not. Either the source was named or it was not (and it’s ok to not name a source, we just want to increase awareness).

We achieve this by matching grammatical patterns on each sentence. Each time we find a match we add it to a scratchpad for further review. Later in the article we might find another piece of text that does in fact clear a previous match. For example, an article might have a data reference in its first paragraph but only mention where it came from in the last paragraph. In this case the data reference is valid and we shouldn’t highlight it.

Making this happen led us to creating a small programming language which allows us to describe grammatical patterns with lots of complexity in a very concise way. It supports 60 languages so far! Since it’s a complex tool itself we’ll leave its description for a post of its own.

Alas, we do not always get this perfect. Languages are tricky and there are myriad ways to construct a sentence, so from time to time we’ll highlight something erroneously. If that happens then please report it in the app so we can improve things further.

How does this lead to a score?

OK, so now we have all the articles, details, missing details and misleading text. The poor computer is exhausted and just wants to go home and sleep. However, it has just one more thing to do before it can clock off. It has to produce a score for the reader to compare sources. 

Heads-up, this is the most subjective part of what we do, please send us any and all feedback you may have so that we can make something that works for everybody.

Article scores are made up of three components:

  1. The coverage score – this is the percentage of all details found in the story set that a particular source covered. More is better.
  2. The misleading score – this is a percentage derived from the number of potentially misleading pieces of text we found in an article. More is worse.
  3. The trust index – this is just a simple arithmetic combination of the above two scores.

We compute the coverage score by creating a ‘weight’ for each detail we found. The weight is just the number of unique sources that cover that detail. As per the Shoes Corp example above, their publishers would all count as a single source when calculating the weight. We then add these all up to get the maximum possible coverage score. We then compute each source’s coverage score by dividing the weighted details it contained by the maximum score. So if you see a source with a coverage score of 100% then it did a bang-up job of covering the story, give that journalist a Pulitzer.

The misleading score is much simpler. Each highlighted region of potentially misleading text adds 20% to the misleading score with a maximum penalty of 100%. This means that five highlights gives that source the worst possible misleading score. This sounds bad but most journalists are pretty good so it’s rare to see more than 40%.

Now we come to the trust index. Choosing how to combine the two previous scores to form this is an ongoing discussion and has seen several iterations so far. One question tends to drive it however: 

What’s better, an article that covers the whole story but is a little misleading or an article that is pristine but misses a few details?

Over time we’ve settled on favoring articles with more coverage, since more coverage tends to lead to a more balanced view of the story. If they include a couple of scare quotes then so be it, the reader is still better off than only seeing half the story, plus we highlight those scare quotes in the article so they can make their own informed judgment. Based on this the computation is very simple, the coverage score makes up 80% of the trust index and the misleading score determines the remaining 20%. If an article covers every detail and has no misleading text it gets a perfect trust index. If it has five or more misleading pieces of text but perfect detail coverage it gets 80% (since 20% is lost due to misleading text). And so on.

What about machine learning bias?

Much of what I’ve covered so far depends on the output of machine learning models and no discussion of these is complete without covering bias. We’ve all read about machine learning bias ruining models (Google did it, so did Facebook) so how does this apply here?

Our models are trained on very large corpora of text. These are selected to give the broadest possible coverage of news stories in the wild. Despite this, the inherent structure of news stories can lead to bias seeping through the model. 

For example, what about an entirely new technology covered in an article with a truly eccentric style of writing? This would never have been encountered during training. In our application the worst this leads to is a source’s sentence not making it into a cluster. This will cause us to show that detail as ‘more detail’ in the app on that source’s article despite it already being there in some funky way. 

This has two effects, first it means that we waste a few seconds of the reader’s time by showing them a detail that they can already see. Second, it means that we penalize the article and give it a lower score than it would otherwise have had. This isn’t optimal and it’s not easy for us to know when it happens so if you encounter this case in the wild please let us know so that we can improve our model’s training data in the future.

Compared to the two examples in the links above however, we see that the effects of bias in our machine learning models don’t pose a risk and are a minor annoyance more than anything else.

Conclusion

Whew! Almost 3000 words and here we are, finally, at the end. In an industry as thorny as news there is no trust without transparency, so we hope that this post has helped show you at least some of the lengths we go to at The Daily Edit to give you a better news reading experience and more media insight.

This will forever be a work in progress as the news itself changes so please send any feedback and questions you might have. We’re always open to discussion and debate on any topic.

Over the coming weeks we’ll publish more posts explaining each of these components with all the technical detail.

At The Daily Edit we have a small, sharp team that ships something new every single week. While there are many reasons for this cadence, one technical choice has helped immensely and that’s our choice to use Rust wherever we can. Rust enables us to achieve our company’s mission with speed and confidence.

When we started this project we were just three pilots. I was the only engineer of the group so I had complete freedom of choice in languages and tooling. This sounds nice in theory but it’s very daunting! Apart from the obvious use of Python for the machine learning parts everything else was open. Should we just choose Java since it’s boring? Should we choose Python because of the great community? Or should we do it all in Javascript since everything ends up in Javascript?

Every option has trade-offs so I was going in circles. What broke the infinite loop was one simple question: which choice would attract excellent hackers and allow us to ship something confidently frequently?

So here’s how we started using Rust and how today, even as the project matures, it continues to be a driver for our growth and success.

Background

Throughout my life I’ve worked and dabbled with a plethora of programming languages, editors, tools and methodologies. I got started back in 1994 messing around with QBasic and the Gorillas game that came with MS-DOS. Soon after I got into C and worked almost entirely on unix platforms. Then C++, Python, PHP, Ruby and of course JavaScript a bit later. Some of these usages were for fun especially when I was younger, some were for profit during my later teen years and early 20s. Then I started flying planes and the programming was all just for fun. 

In 2015 I started messing around with Rust after having heard about it from a PLT friend. She kept saying that it brought the first truly ‘new’ thing to PLT in a long time, that thing being lifetimes in the syntax. I was fairly skeptical and, like many others, accepted that C was the best systems language since it was so simple and easy to reason about.

Until I started using Rust.

My skepticism faded very quickly when I realized that despite my experience and best intentions, I was in fact making mistakes with C. Subtle leaks, use-after-free, they don’t really happen when everything’s nice and small and self-contained, but when you start making and using libraries and passing pointers to opaque structs around it gets messy. Rust made it very clear that I was not the programmer I thought I was. The compiler was like a crusty old simulator instructor where no matter how well you did something, it was never enough. I hit the ‘fighting the borrow checker’ phase like hitting my toe on the corner of a wall, persistent pain and a feeling of being broken.

But that passed, reasonably quickly.

It took about 3 months of using Rust before I really became comfortable with it. Toy projects didn’t cut it. Once I started working on more complex systems such as a distributed job queue with asynchronous behavior or an embedded system interfacing with an FPGA, the gains started to come. It’s like working out, you really have to push it to get the benefit. After these projects I didn’t run into the nagging compiler as much and the organization of the code started to become clear very early on into any project. That’s the thing with Rust, it can be annoying but it really guides you towards an excellent architecture.

But that’s not the best part.

The best part is that once you’ve built this big, complex system and pleased the compiler enough for it to give you a binary. It just works. And continues working. The amount of debugging required for Rust projects is an order of magnitude less than I’ve seen anywhere else. With features like tagged unions (Rust’s enum) the exact working of the code can be specified clearly and inviolably. This gives extreme confidence when deploying anything. Then comes the regular change and maintenance that any project has. Often, an engineer new to the company will have to dive into some code they haven’t seen before. With Rust, that on-ramp has guard rails. If they interact with the existing code in an unexpected way, the compiler tells them.

Attracting talent

Rust has entered the popular use phase of its time, there are almost as many blog posts singing its praises as there are exposing its shortcomings. That’s a great sign, it means that it’s being used. It’s no surprise I like Rust but what about other engineers? One of our main concerns with selecting a relatively niche language like Rust was that the size of the talent pool might be too small. The thing is, the really world-class talent out there care about the ideas they get to work on *and* the tools they get to work with.

The time came to reach out and test the water. We contacted the This Week in Rust newsletter and had our job listings placed there. We’re a remote-first company and don’t care where our engineers choose to live. By a great stroke of luck Twitter’s Jack Dorsey tweeted that “rust is a perfect programming language”.

Over the next eight weeks I received emails from almost four thousand applicants. Four thousand! Many didn’t actually have Rust experience at all and that’s fine, they were just interested in the idea. We were spoiled for choice and have found ourselves with nothing short of a world-class engineering team.

Our concerns about the talent pool for Rust were unnecessary. It’s a language that attracts experienced programmers who want to deliver.

Getting things done

We use Rust everywhere we can. Right now our web backend is built in Rust, our background task processors are in Rust, the scheduler for our analysis engine? You guessed it, Rust. The only places we don’t use it are those which have an incredible library that doesn’t yet exist in Rust’s ecosystem. Also the mobile app, there we use Flutter.

One of the shortcomings Rust is often reported of having is a rather verbose syntax that takes a long time to both write and read. While parts of this are true (it’s a lot more verbose than Python), many are exaggerated. For example, across our entire codebase we have manually annotated lifetimes fewer than a dozen times. It just doesn’t come up that often in application code.

With Rust there is more work required up-front. You do need to please the compiler across a large number of constraints. However, the cognitive overhead of this, and the time required to actually write it, diminishes very rapidly with a little experience. Over the course of just a few weeks, with the myriad changes that a startup goes through in that time, it requires less work. Far less. Our team all agree that reading Rust code is simpler than many other languages. There’s no doubt or ambiguity of any kind, you know exactly what the program is going to do by reading it.

We operate with a very small team of talented engineers and iterate on code at rapid speed. With Rust we can decide to rewrite an entire, complicated, module and have confidence that after it compiles it will work just fine. Sometimes we make errors with our logic or our understanding of the problem isn’t quite right, no language will prevent that, but for every other task that’s required to get something into production, Rust lets us do it faster.

With Rust you can go very far with very few engineers.

Performance

No post about Rust would be complete without this but it’s well known so I’ll keep it brief. Rust is fast. Most of the pain people write about with lifetimes goes away if you just use a `clone` or `Arc` here or there, and guess what, it’s still orders of magnitude faster than Python, Ruby, Javascript and Java. Then if it really becomes necessary to squeeze that last drop of performance out of it, you can write out those complex lifetimes. It’s nice to write something easily and have truly excellent performance but know that the ceiling on that performance is still higher.

What does this have to do with startups though? Well, high performance means fewer servers, fewer servers means less operational overhead. As a startup your runway burns up pretty fast if you start spending it on web servers that can only support a few hundred requests per second each.

The usual trope brought out to fight my argument would be “engineer time is more expensive than computer time” and that is true. But you really don’t need much more engineer time than using another language, and you get the lower overhead when you actually run your program. 

It pays itself off very quickly.

Should your startup use it?

I’m not a startup adviser.

I began by writing that choosing a language for a greenfield startup can be daunting, more so when the language you think best fits is a bit niche and new. From our team’s experience with Rust so far, we don’t want to use anything else. It may be hard to learn but the return on that investment is incredible. 

In the hands of an experienced team Rust is a superpower.

In the next post we’ll go into some of the downsides of using Rust.