What You Need To Know About Google's Search Document Leak Learn More

Google’s Search Document Leak: 11 Ranking Factors to Pay Attention to

In March 2024, the SEO community discovered an unprecedented leak of documents that originate from Google’s internal Content API Warehouse. This cache includes over 14,000 named ranking factors. 

Initially, there was speculation over the authenticity of the documents, but on May 31st Google confirmed that the leaked search documents are real. 

This leak is poised to be one of the biggest stories in SEO this year, potentially offering a rare glimpse into the complexities of Google’s ranking algorithms, reminiscent of the Yandex search leak in 2023. Among thousands of other data points, it suggests Google uses both user click metrics and Chrome data in rankings—despite the company’s previous assertions to the contrary. 

Although this suggests major implications for search, it’s important to understand the story will continue to develop in the coming weeks and months. Smart business owners and SEO professionals will take these assertions with a grain of salt, continue to experiment, and track performance before making major strategy changes. 

What Are the API Content Warehouse Documents? 

The API Content Warehouse documents, implicated in the Google leak, appear to be a set of internal files detailing the architecture and functionality of Google’s Content Warehouse API. This API is crucial for organizing, storing, and retrieving the vast data that Google processes for its search operations. 

These documents may partly explain how Google’s search algorithms find and rank information in their databases. They may also outline different modules and attributes that help manage data within Google’s system.

Details of The Leak

The leak originated on March 13th, 2024 when an automated bot uploaded thousands of documents to GitHub. These documents were later shared with Rand Fishkin of SparkToro in May, who brought further attention to their potential significance. 

Timeline of Events

  • March 13th: Leaked documents from a code repository mistake are uploaded to GitHub by yoshi-code-bot.
  • May 5th: An anonymous source shares the documents with Rand Fishkin, leading to wider public exposure and analysis by SEO experts.
  • May 7th: The code repository mistake is fixed, but the documentation remains live and accessible to the public.
  • May 27th: Michael King, founder of iPullRank, posts his analysis of the leaked documents.
  • May 28th: The anonymous source comes forward as Erfan Azimi, founder of EA Eagle Digital. He says his goal in sharing the leaked documents is to hold Google accountable and create more transparency surrounding ranking factors. 
  • May 31st: Google confirms the leaked documents are indeed internal company files.  

What’s Inside the Leaked Documents?

The leaked documents appear to reveal a myriad of details about Google’s ranking factors. Notably, references to a system called NavBoost suggest that engagement data may play a significant role in ranking decisions, challenging Google’s public statements that dismiss the direct impact of click data on rankings.

In total, the document leak contains 2,596 modules with a total of 14,014 attributes that outline possible ranking features. However, rank weight is not discussed. Pulling from Michael King’s analysis, these 11 factors stand out the most: 

1. User Click Data Matters

The documents mention terms like “goodClicks,” “badClicks,” “lastLongestClicks,” and “unicorn clicks.” These terms are associated with Navboost and Glue, two algorithm components uncovered during Google’s October 2023 DOJ testimony. This information seems to confirm clicks and dwell time as important factors in ranking. This means it’s more important than ever to create enticing titles, meta descriptions, and content that draw users in and keep them on your site. 

2. Domain Authority is Real

Despite Google’s initial denial that domain authority doesn’t affect rankings, the leaked documents indicate otherwise. According to the disclosed documents, Google retains “siteAuthority” (as it’s penned in the documentation), and they utilize this factor in their ranking system. 

3. Google Measures Chrome Visits

Matt Cutts once stated that Google does not incorporate Chrome data into its organic search results, but according to the API documentation, Google utilizes several different metrics in Chrome to analyze pages and domains. These metrics may track website performance, offering insights into user engagement and experience. Some in the SEO community speculate that Chrome click metrics are used to select URLS for Google’s sitelinks feature.

4. Links Are Weighted Based on Click Data

According to the documents, Google designates links as more valuable when they come from top-tier pages. Fresh pages are also seen as higher quality. The bottom line—it’s important to source links from recent, high-quality pages.

5. Authorship is Counted

The leak confirms that Google tracks who wrote a document and whether or not the author is mentioned on the page. They also check if the entities mentioned are the same as the author. Google appears to evaluate authors by mapping entities and embeddings extensively.

6. Font Size and Weight Matter

Google appears to keep track of attributes such as font size for links and text weight. Having larger links seems to have a positive impact, and Google interprets bolded text differently from regular text.

7. More Demotions Are Revealed

The leaked documents appear to confirm a number of algorithmic demotions including: 

  • SERP demotion: based on poor click rate or bounce rate
  • Nav demotion: for poor site navigation 
  • Anchor mismatch: for improperly-labeled links
  • Exact match domains demotion: for domains with exact match keywords
  • Location demotion: for pages not associated with the relevant location

8. Length Mostly Doesn’t Matter

When considering content length, the leaked documents make two important clarifications:

  • Short content is scored for its originality rather than its length, meaning both successful short content and unsuccessful long content are possible; it all depends on the amount of unique information. 
  • The documents don’t make any mention of page title lengths, underscoring the theory that lengthy page titles can still rank well. (Although concise page titles are typically more effective for clicks.)

9. Google Accounts For Small Sites

Another factor mentioned in the document is smallPersonalSite, which designates small personal websites or blogs. Some SEO experts speculate that Google may adjust the visibility of these sites using a function known as a Twiddler. However, the exact use and weight of this feature is still unknown.

10. Page Titles Are Rewarded For Matching Queries 

According to the documentation, there is a metric called titlematchScore. The description implies that Google still considers the relevance of the page title to the query as a ranking factor. Additionally, putting your target keyword first still seems to be the right strategy. 

11. Fresh is Best

According to the API documents, you should specify a date and maintain consistency across structured data, page titles, and XML sitemaps. If the dates in your URL contradict the dates in other areas of the page, it is likely to negatively impact the performance of your content. This is likely part of Google’s fervent focus on fresh results. 

What Does the Google API Information Mean for SMBs? 

The Google leak may hold a number of implications for small and medium sized business owners. 

Create Content for Users, Not Search Engines

The leaked documents underscore the importance of creating content that genuinely engages and satisfies user needs, rather than merely catering to algorithm preferences. SMBs should focus on delivering value through high-quality, relevant content that addresses the interests and problems of their audience.

Value User Experience

User experience (UX) plays a critical role in how websites are ranked. Factors like site speed, easy navigation, and mobile-friendliness are essential. SMBs need to ensure that their websites are not only informative but also accessible and pleasant to use.

Build a Solid Brand Across Platforms

Google’s emphasis on site authority suggests that a strong, coherent brand across various online platforms can enhance your site’s authority and trustworthiness. SMBs should work on consistent branding and positive reputation management online to build trust with both users and search engines.

Follow E-E-A-T Even if Google Doesn’t Care

According to the leaked documentation, Google’s E-E-A-T (Expertise, Authoritativeness, Trustworthiness, and Experience) may not mean much for search rankings. But, that doesn’t mean you should skip out on demonstrating these factors to potential leads. Including original images, personal experiences, and testimonials on your site remains an important way to win your customers’ trust (and business). 

You Need a Smart SEO Team on Your Side

If the recent leak has made anything clear, it’s that the SEO field is constantly evolving, and we can’t always count on Google for accurate insights. To navigate search effectively, business owners must invest in a sharp SEO team. These strategists must be able to practice proven SEO tactics while employing a curious mindset to effectively experiment and evolve.

Moving Forward with Search

The Google leak offers a treasure trove of insights that may redefine what we understand about search engine optimization. From the importance of user clicks to the impact of Chrome user data and site authority, this revelation compels both SEO experts and business owners to carefully consider their digital strategies. 

For SMBs, this means prioritizing engaging content, enhancing user experience, and building a robust online brand. As the digital landscape evolves, staying informed and adaptable is paramount. Business owners should consult with a knowledgeable SEO company to ensure their online presence is both visible and valuable in this new era of search optimization.

 

Back to Blog
image

Build & Solidify Your Online Presence

Schedule A Free
Marketing Consultation

Learn how our digital marketing programs can help turn
more clicks into paying customers.

Get Started

Stay Informed:
Get our best digital marketing secrets, directly to your inbox.