Importance of Prestige and Popularity for a News Source in Google Algorithm
News Sites are the biggest growing section of all web. Every news site has millions of URLs, and they are publishing every event for every second. Crawling, Indexing, Evaluating, Associating and Ranking all these news posts require strong servers and fast-reflexive algorithms with correct features, samples, decision-trees.
According to the Chartbeat, everyday human-beings publishes 92.000 new articles (2016), according to the MarketingProfs, we post more than 2.000.000 articles everyday (2017).
Today, WorldOMeter also has a metric for posts that are written for everyday. And, usually it finishes every day with a 10.000.000 plus number.
In this article, I will explain possible methodologies of a Search Engine that can be used for ranking news sources and news articles along with summarizing the news articles. Thus, understanding a Search Engine's obstacles for this fast-growing news ocean can help an SEO for emphasizing the Search Engine's point of view.
- Breaking News Score for News Articles
- Why is Breaking News Score Important and Useful for Search Engines?
- How might Google Calculate Breaking News Score for News Sources?
- How Google Might Check Related Entities, Phrases and Topics to Determine a Breaking News Score
- How might Google Prioritize the News Sources according to their Prestige and Coverage?
- How Google Can Determine Source Rank for a News Site?
- Importance of Article Count for Period of Time
- Possible Test Periods and Originality Metrics for Article Publishing Activity
- Importance of Average Length of an Article
- How to Understand Which Article Length is Better?
- Importance of Coverage by the News Sources for a Topic
- Importance of Value Proportion of News Source
- Importance of Human Opinion for the News Sources
- Importance of News Reading Statistics for News Sources
- Size and Quality of the Staff for News Sources
- Importance of News Bureaus for News Sources
- Importance of Named Entities within News Articles for News Sources
- Importance of Topicality for the News Sources
- Importance of Diversity of Traffic for News Sources
- Importance of Writing Style of the News Articles for News Sources
- How Google might Generate Source Rank Score?
- Last Thoughts on Breaking News Score and Source Rank in the Context of Google Search Engine’s Methodologies
Before continuing further, let me introduce myself with a brief snippet.Koray Tuğberk GÜBÜR is the owner and founder of Holistic SEO. I believe SEO is the intersection of coding and marketing along with a strong analytical thinking capacity.
One of the most explanatory concepts from Google Patents for News Articles is that "Breaking News Score" for articles. Google tries to see whether a News Article is about a recent event or not, according to its freshness and importance for the specific time, it gives a Breaking News Score.
Why do you think that Google needs to filter news articles or normal articles according to their relevance for breaking news? Because of the cost and speed.
During the article, I will support the theoretical information with concrete and practical results. Above, you will see that Natural News lost its authority (Source Rank), prominence and organic traffic thanks to unreliable news that they publish about “aliens, climate change and vaccines”. Google can realize a news’ reliability and accuracy if the topic is controversial enough.
Most of the SEOs focus on only the "quality of the search results", but they don't pay enough attention to the "server needs, computation costs and time" for Search Engines. An algorithm might increase the search results quality 1% but as long as it increases the cost as 10%, it won't matter.
A simple representative summary for determining Source Rank for News Source and Website.
Thus, if a Search Engine tries to calculate a quality score for every article on the web for every possible query, imagine the cost and required time! Thus, filtering news articles according to their Breaking News signals will help Search Engines to consolidate their crawling, indexing, evaluating, associating and ranking resources. That's why News Sites can be indexed faster and might have more tolerance for index bloating or cannibalization issues.
Breaking News Score can be used to determine the important news articles for a specific timeline from specific news sources. According to Google, Breaking News Score and News Source’s characters are connected to each other. Thus, the methods below can be used to determine a Breaking News Scores for specific news articles.
If a News Source publishes a news story after an event, the time difference between the search demand occurrence and article publication might be used to determine the Breaking News Score. If the news source always publishes an original news story right after the search demand increases, Google might consider the News Source as a quality candidate for satisfying the sudden information need queries.
Google might determine the “time of the event” according to the news articles’ dates after clustering the news articles. In other words, it can check all of the dates for clustered articles and assume the first date as the time of the event. For this possibility, Google shared some formulas as below.
- If T>N1, then breaking score=0;
- If 0<TsN1, then breaking score=log(N1/T); and
- If T=0, then breaking score=log(N1).
T here represents the difference between the first article and the current article date. N1 is the breaking news threshold which can be determined according to the search demand, cluster size and article publication frequency for the specific event. Google also shares a possible formula to determine the Event’s Importance as below.
- factor=(1+log(cluster size)).
- breaking newsSource(A) + 30-rank within cluster (A).
Also, Google might choose a constant N2 value such as 3 hours for specific entities or specific types of events as the Breaking News Score Threshold.
According to Google, Google might determine a Breaking News Score based on included named entity count, and related topics’ sizes. If a topic includes an important named entity, and if the named entity has a big variety of search queries for also related attributes and entities, the Breaking News Score for these types of news articles might be higher so that they can be prioritized for crawling, evaluating, associating, ranking and indexing.
In this context, using all of the related entities, topics, phrases and sudden search increases and new search demands from Google for news sources might be a useful communication with the Search Engine to be prioritized for it's crawling and indexing activities.
With Breaking News Score, we have seen that Google can prioritize its resource by filtering only the news articles based on their freshness and importance. But, after prioritizing these articles, a Search Engine will also need to prioritize the sources.
Because, taking all the news articles with a higher Breaking News Score than the necessary threshold still wouldn't decrease the cost of the Search Engine with an enough level. Because, a Search Engine can filter millions of articles for every hour from different sources for different topics.
A search engine might want to filter News Sources because of the reasons below.
- Most of the articles will be too similar.
- Most of the articles will be syndicated from different sources.
- Some of the articles will include inaccurate information.
- Some of the articles will include biased propaganda against specific entities.
- Some of these news articles might be unnecessarily long or complicated to understand for algorithms.
- Some of the articles won’t have any related internal links for the related and possible Search Activities.
- Some of the articles won’t come from the expert and reliable sources.
- Some of the articles will have misspellings and wrong writing style.
- Some of the articles might have unique information.
- Some of the articles might have famous and expert authors.
From 10.000.000 articles everyday, a Search Engine can filter the news articles thanks to Breaking News Score, and after that Search Engine can also prioritize the News Sources so that it can focus its limited resources to the best possible narrowed and quality section of the web for the news readers.
RealClearPolitics is another example with fanatic headlines, conspiratorial news articles without solid evidence.
Thus, we will need to focus on a new term from Google Patents. And, it is “Source Rank”. Until now, we have heard about Trust Rank, Author Rank, PageRank, Deep Rank (BERT) and many more. But, probably, the SEO Industry didn’t talk about this possible metric.
Breaking News Score and Source Rank concepts are connected to each other.
- Breaking News Score is for filtering the news articles.
- And, Source Rank is for filtering the news sources.
Thus, we need to process the possible methodologies for determining a News Source for the News Sites by Google.
Google might assign a Source Rank by auditing the twenty different metrics. In this section, we will focus on these metrics, their definitions and how they can help a News Source and also Search Engine for satisfying the news readers on the web.
A representation of Source Ranks and Source Names within a server.
Search Engine can determine a time-line for auditing and testing the news sources based on their content publication frequency and intensity. A news source can publish only 50 or 500 articles for a specific number of topics and sub-topics. Google can take these numbers to understand the News Source’s activity level.
Imagine that there is an active news source for a period of time, and you assign this news source a high Source Rank, but after 2 weeks, they stop publishing new articles like they do during the test period. How would you feel if you would be a Search Engine Algorithm Engineer?
Thus, performing these tests repeatedly and choosing the most consistent sources are important. But, the article count is not only an obstacle here, according to the Google Patents, Google might also check the originality of the article.
BreitBart is another News Source that losts its organic and generic visibility. They publish aggressive content without authenticity, and their news titles are fanatically conspiratorial. Again, Google hits the news source repeatedly with every algorithm update, but yet thanks to their brand value, they survive.
Possible Test Periods and Originality Metrics for Article Publishing Activity:
According to Google, a Search Engine can test a news source during a week, bi-week, month. In terms of the originality, Google might check all of the article for originality, but this wouldn’t be so effective. Because, most of the News Articles have “quotes”, “entity names”, “similar definitions for events” and similar sentence patterns. Thus, checking the original sentence count from a news source can be more useful and this method is also mentioned during the News Ranking Algorithm patent.
While reading the Google Patents for the News Ranking Algorithms, I didn’t encounter any kind of argument that tells “longer articles or shorter articles are better”. But, this metric can affect the News Rankings, because most of the time news readers want to learn the actual information as soon as possible, and sometimes having a long article might be a negative situation, or sometimes it can be an advantage due to the information and detail amount differences.
Thus, commenting on this topic might have some relativity, but according to the Google Patent creators, average article length might be a factor for Source Rank, and they can calculate the average length of an article with these possible methodologies.
- Calculating the length of original sentences.
- Calculating the length of non-duplicate articles.
And, it seems that Google has an obsession for finding the original news source in terms of ranking it.
How to Understand Which Article Length is Better?
Methodologies below can be used for understanding the optimum article length.
- Testing the different article lengths for the same sub-topic with same mentioned entities.
- Analyzing the competitors’ articles’ length compared to their rankings.
- Learning the industry normals for different topics in terms of article length.
- Using only original sentences and articles for testing.
As I said before for my Semantic SEO Case Studies, I didn’t care about the length. I have only two rules for my SEO Projects in terms of content length whether it is text, visual, sound based.
- Short as much as possible.
- Long as much as necessary.
But for the Breaking News, length might be a user experience factor, especially if a News Source hides the actual answer behind a long introduction for keeping the reader on the web page.
Since, Topical Authority started to be an important factor for the broad SEO Success stories, Topical Coverage also improved its prominence. Topical Coverage can be calculated by checking the possible semantic search intent graph for a topic and a source’s publications’ competence for these queries. Topical Authority can be measured by Topical Coverage, Historical Data and Information Quality of the source.
According to Google, Coverage or Story Size Score might be calculated based on a specific time such as a week, bi-week, a month by calculating the number of distinct articles for a specific subject. For a specific subject, if a news source has 1000 distinct articles, then the story size is 1000 for that news source.
ScienceDaily is another news publisher about scientific events. The negative thing is that they have lost their organic visibility over 50% over the last two years. The main reason is for this, all of their content is not unique. They syndicate the content from certain resources, and some of these resources say “Climate Change is not real”, and they also publish these types of content automatically.
Value Proportion is bi-directional. A news source can provide value to a search engine and a search engine can present value to a news source. But, how can this help for prioritizing news sites? Because, if a search engine gives too much value to a news source, it will end-up having too much organic traffic. So, in other words, according to this the actual source of the value is the search engine. But, for Google patent creators, this works in a different way.
Google is also a popular search engine, but according to the patents, they might use "traffic normalization", in other words a news source with low traffic and user activity might have more value proportion if the news source also has way much fewer link opportunities from the search engine.
Imagine that a news source generates 5000 organic sessions for ten minutes with 50000 links from the search engine while an alternate is generating 3000 with 3000 links. In such a situation, the latter has a better value proportion for the search engine and also users.
Human opinion can show a news source’s reliability and prestige to a search engine. According to the Google patents, Pulitzer prizes can be used to determine important news sources along with other honorable prizes. And, search engines can perform surveys or polls similar to the Quality Rater Guidelines. And, age of the news source might be a metric for the human opinion since the human-beings trust more to the older sources. Another possible methodology is assigning every rater one article from every news source to see general opinion about the news sources.
In other words, Google search engine can use human opinion for different news sources. Thus, having a reliable image and prestige is important again.
Another example, InfoWars.com. It is a news publisher that says Covid 19 Vaccine is a hoax, also they propaganda against the Black Lives Matter Movement in an aggressive way. They usually use aggressive and controversial titles for their news. As a result, they lost their organic visibility but also still, they have an important search volume for their names in the US.
According to the Google Patents, similar to the Human Opinions, Google might also care about industry news that interprets the usage statistics of different news sources. For Google patent creators, those open metrics can be used to determine the most prestigious and reliable news sources.
If people want to read a specific news source for specific topics, it is not about just ranking, it is also giving what people want. And, thus Google might try to validate its own algorithmic results by checking the other sources’ publications in this possible methodology.
When it comes to news, it is not just about the name of the newspaper, it is also about the names of the authors, journalists, writers and other employees. Google might care about the size of the staff that a news source has. And, expertise, reliability of these employees might be also important.
And, how can Google determine the size of the staff? According to the patent, Google can just take the unique journalist names from the news articles to calculate it. Or, it can check the about us page. Expertise of the Journalists, fame or credibility also might affect in this context.
CNBC is one of the good examples in terms of Source Rank. Their DOM Size for the mobile pages is just 263. Their headlines are reliable and logical. They do not give places for conspiratorial headlines, news, and their writers have years of experience for the subjects that they write.
According to the Google Patents, Google might check the news bureaus and members of these news bureaus. This might help Google for understanding the reliable news sources. Having a membership from these news bureaus or organizations might help for prestige and human opinion too.
According to the Google Patents, Google might check the named entity count and their relations as a value representing numbers for the news articles and their sources. According to Google, Google might cluster news articles from different news sources and check the related articles based on every news source and count the entities, attributes for the specific topics. If a news source includes unique entities and information along with attributes that other news sources don’t have, it might signal the original reporting capability, along with a unique information opportunity for the users.
DrWeil is another lost cause in terms of News Source and Authority. The main reason for this outcome can be seen as heavy monetization on the landing pages, lack of references, lack of information on the health articles. DrWeil still continues to lose traffic, because they published more than 100 articles just for Covid-19 with the same understanding, as a result ,they only have 16-20 keywords for Covid-19, this shows the source’s prioritization for the urgent health topics.
As an opinion, I also remember, information gain score from another Google patent that talks about unique information including and its benefit for the users. This might show a consistent point of view in Google patents.
Lastly, when it comes to entities, we must remember Bill Slawski since he was the first person to mention entities for search engines in 2007. And now, we are all talking about it.
Topicality means segmenting articles and content according to their topics. And, in Google Patents, Google talks about segmenting news articles according to their topics and sub-topics. Since, Google officially published the “sub-topic update”, we can tell that they are using topical and contextual domains for search queries. But, according to this patent, if a news source has a broader topicality, it will have more advantages, and also if a news source has a deeper understanding for that topic with more named entities and informational explanations, again, it will be better.
Patent also talks about Machine Learning methodologies for clustering the news articles based on their topicality.
Diversity of the traffic means users of a news source are from different countries, languages, cultures and demographics. Google can check Internet Protocol (IP) addresses to understand a website’s diversification in terms of the audience. Google might not just check the audience, if there is an audience, it means that there are also links. To see a news source’s diversification, it can also check the IP Addresses’ diversification in terms of the referring domains for the news sources.
According to Google, having a diversified traffic, and referrer profile is better than a consolidated and single-color (from beginning to end, same) audience.
PrisonPlanet is another source that lives with only its brand name without any organic visibility in terms of non-brand search activities. It appears that they also publish content and news in a supporter way for racist symphaty.
Misspellings, grammar errors can affect a news article’s reliability in the eyes of search engines. Also, according to the patents, Google might check an article’s reading level, writing style and grammar errors along with misspellings.
In this context, whether the search engine cares or not, finding and fixing the grammar errors of a news source is useful, and since the grammar errors and misspellings will make an article harder to understand for algorithms, it will make things worse.
According to Google, for qualifying a news source, there are more possible metrics but these are called “additional metrics” such as links. But, they are not included into the News Source qualifying metrics.
- To generate a source rank score, Google might think of the factors below.
- News bureaus memberships,
- News prizes,
- Employees, journalists and staff size, quality signals
- Grammar errors, misspellings,
- News article publishing activity,
- International Popularity
- Audience Diversification
- Topicality and Topical Breadth
- Topical Coverage for Specific Contexts
- Named Entity Count and Unique Information
- Original Article and Sentence Count
- Value Proposition
- Human Opinions
- Public Statistics and Traffic Information
- Article Length or Unique Sentence Length and Count
- Polls and Surveys
- Search Demand from the Users
But, the main obstacle for a Search Engine is not obtaining all these information from the web. The main obstacle is mixing all these data with the best possible formula to create optimum Source Rank for modifying the news related search results.
To adjust Source Rank for News Sources, Google gives a few possible methodologies that can help SEOs to understand Search Engines’ perspectives.
Source Rank Usage and Generation representation
According to Google, a Search Engine might calculate the Source Rank for the news articles based on different mixtures. For instance, for some types of queries and news, international popularity might be more important than the article count. Because of these types of possible differences, every query might generate a different Source Rank based on the query phrases.
Another possibility is that generating Source Rank based on topical coverage and named entities for a specific topic. If a news source has a deeper article repository and coverage for specific entities, it might have a better Source Rank according to the other news sources.
For calculating Source Rank, Google also talks about “metric percentile”. For instance, if CNN has 2 for international popularity, and BBC has the highest rank for that metric which is 10, then the percentile score of CNN will be 0.2.
In another methodology, Google might calculate Source Rank based on just the best score for every metric. For instance, if CNN has the highest rankings for some of these metrics, its source rank will be calculated with just these metrics.
In another possible methodology, Google talks about “metric normalization”. In other words, if a news source has a tremendous amount of traffic, and another news source has 5% of this traffic amount, Google can normalize the traffic amount within 0 and 1. Thus, differences between news sources might be normalized for better adjusting and calculation.
Google might use all of these methodologies and more and it can also increase the metric type, count for source rank calculation.
News with Views is another source with tons of syndicated content and heavy monetization. Also, none of the authors and writers from this news source has an expertise or a solid background. Thus, it couldn't even start its organic news visibility increase journey for many years.
After generating and calculating a news source, Google might store the Source Rank for the news source, or possible sections of the news source. Google also can generate an ID for the News Source based on its alternative names and original attributes on the web such as social media accounts or websites.
Google might use Source Rank for also non-news articles and queries for the news sources. It might be applied to different geographies, cities, countries, languages, demographics, keywords or list of keywords, entities in a different way based on user interaction and news source’s activity levels.
According to the Google Patents, if a user searches for George Bush or any other named entity such as Joe Biden, the user might see a list of links, link descriptions and link titles based on their Relevance Score. From the link title and link URL, Google might identify the News Sources and modify the search results.
If the link owner is a News Source and has a Source Rank, Google will generate another score which is NEWSSCORE. And, this NEWSSCORE is a similar score to the NewzDash’s Google News Score term. Below, you will see a possible News Score calculation example.
NEWSCORE(D)=alpha OLDSCORE(D)+ beta; SOURCERANK(SOURCE(D))
According to Google, SOURCE(D) is the news source for the specific link, and alpha-beta are constants. OLDSCORE(D) is the relevance score of the article, and SOURCERANK(SOURCE(D)) is the source rank of the news source. These values and constants might be adjusted based on the different metrics and parameters of the web and search ecosystem.
Another conspiracy site that is against the election results, and covid-19 vaccines, and, same result.
Last Thoughts on Breaking News Score and Source Rank in the Context of Google Search Engine’s Methodologies
Breaking News Score and Source Rank concepts can help SEOs to understand the perspective of a News Search Engine. Google uses events, cases, incidents, any kind of happenings for feeding the information needed by society.
Google or any other search engine doesn't have an unlimited amount of resources to crawl, render, understand, evaluate, rank and index. Thus, prioritizing the articles based on specific criterias and then prioritizing the specific news sources based on criterias help Search Engines to consolidate their resources for creating the best possible Search Engine Result Page.
Besides Google Patents, most of the News SEOs know that Google behaves differently to every news source. Some news sources have a better coverage for specific types of entities, sub-topics and phrase patterns. News integrity, authority, coverage and journalist names are mentioned repeatedly by the official Google Spokespersons.
Snopes is another source that loses its organic traffic share annually. Their content is quality and unique, but also they have lots of content related to the propaganda topics and negative news about individuals. And, most of these headlines are the same type with the content that has been out filtered within Google auto-complete data.
In ``Search off the Record Podcast Series, Garry Illyes even said that Google's biggest cost is "cooling the servers". Thus, they also store the most quality web sites within the best quality servers. According to Garry Illyes, there are different Indexing Tiers such as Tier 1, Tier 2 and Tier 3. Tier 1 is the most quality one and whatever happens, Google tries to keep these Tier 1 servers alive so that users might continue to be satisfied by the search engine and its most quality sources.
There are more things that a News SEO and a Holistic SEO might learn things from Google Patents, such as:
- Ranking News Articles Based on Opinions,
- Summarizing News Articles,
- Question and Answer Generation from News Articles,
- Updating News Articles for Update Score
We might process these articles in the future episodes.
|Koray Tuğberk GÜBÜR is the founder of Holistic SEO & Digital. He focuses on different verticals of SEO such as Data Science, Web Development, Search Engine Patents, and Methodologies. Koray publishes SEO Case Studies and articles regularly.|