Google's Use Of Bloom Filters Explains Higher Filtered Data In … – Search Engine Journal

The third edition of Ranking Factors is finally here! It got a little makeover both in looks and content inside. And, for the first time, we’ve put all the factors into a sortable sheet to find the info you need, faster.
Join us for a practical guide to diagnosing and recovering lost Google rankings. You’ll learn how to navigate this common challenge, along with the best ways to assess the impact of the drop on affected pages.
Want to learn how you can mitigate privacy risks and boost ROI through data standards?
Want to learn how you can mitigate privacy risks and boost ROI through data standards?
Want to learn how you can mitigate privacy risks and boost ROI through data standards?
Want to learn how you can mitigate privacy risks and boost ROI through data standards?
Google uses Bloom filters in Search Console, prioritizing speed over accuracy, causing higher filtered data volumes.
In the latest installment of Google’s monthly office-hours Q&A session, a question was asked regarding the higher volume of filtered data compared to overall data in Google Search Console.
The question prompted a detailed response from Gary Illyes, a Google Search Relations team member, who shed light on Google’s use of bloom filters.
The question was, “Why is filtered data higher than overall data on Search Console, it doesn’t make any sense.”
On the surface, this might appear as somewhat of a contradiction.
The expectation is that overall data should be more comprehensive and, therefore, more extensive than any filtered subset.
Yet, this isn’t what users are experiencing. What’s going on here?
Illyes begins his response:
“The short answer is that we make heavy use of something called Bloom filters because we need to handle a lot of data, and Bloom filters can save us lots of time and storage.
When you handle a large number of items in a set, and I mean billions of items, if not trillions, looking up things fast becomes super hard. This is where Bloom filters come in handy.”
Bloom filters speed up lookups in big data by first consulting a separate collection of hashed or encoded data.
This allows faster but less accurate analysis, Illyes explains:
“Since you’re looking up hashes first, it’s pretty fast, but hashing sometimes comes with data loss, either purposeful or not, and this missing data is what you’re experiencing: less data to go through means more accurate predictions about whether something exists in the main set or not, and this missing data is what you’re experiencing: less data to go through means more accurate predictions about whether something exists in the main set or not.
Basically, Bloom filters speed up lookups by predicting if something exists in a data set, but at the expense of accuracy, and the smaller the data set is, the more accurate the predictions are.”
Illyes’ explanation reveals a deliberate trade-off: speed and efficiency over perfect accuracy.
This approach might be surprising, but it’s a necessary strategy when dealing with the vast scale of data that Google handles daily.
Filtered data can be higher than overall data in Search Console because Google uses bloom filters to quickly analyze vast amounts of data.
Bloom filters allow Google to work with trillions of data points, but they sacrifice some accuracy.
This trade-off is intentional. Google cares more about speed than 100% accuracy. The minor inaccuracies are worth it to Google to analyze data rapidly.
So, it’s not a mistake to see that filtered data is higher than overall data. It’s how bloom filters work.
Featured Image: Tetiana Yurchenko/Shutterstock
Matt G. Southern, Senior News Writer, has been with Search Engine Journal since 2013. With a bachelor’s degree in communications, …
Conquer your day with daily search marketing news.
Join Our Newsletter.
Get your daily dose of search know-how.
In a world ruled by algorithms, SEJ brings timely, relevant information for SEOs, marketers, and entrepreneurs to optimize and grow their businesses — and careers.
Copyright © 2023 Search Engine Journal. All rights reserved. Published by Alpha Brand Media.

source

Related Posts

Google Strengthens Search Console Security With Token Removal Tools – Search Engine Journal

Stay ahead of the game with the best marketing tools and ensure your tactics are primed for success in this new era of digital marketing.This webinar will equip you with…

Read more

Google Search Console security update improves management of ownership tokens – Search Engine Land

sel logoSearch Engine Land » SEO » Google Search Console security update improves management of ownership tokensChat with SearchBot Please note that your conversations will be recorded. SearchBot: I am…

Read more

Search Engine Optimization (SEO) Market Size Worth USD 157.41 Billion in 2032 | Emergen Research – Yahoo Finance

Search Engine Optimization (SEO) Market Size Worth USD 157.41 Billion in 2032 | Emergen Research  Yahoo Financesource

Read more

AI Prompt Engineering Tips for SEO – JumpFly PPC Advertising News

AI Prompt Engineering Tips for SEO  JumpFly PPC Advertising Newssource

Read more

Most Common B2B SaaS SEO Mistakes – MarketingProfs.com

by Ryan Lingenfelser Many B2B SaaS companies ignore SEO… and they are often right to do so!For SMBs, especially startups, it rarely makes sense to prioritize SEO. Compared with marketing…

Read more

How To Create an XML Sitemap To Improve Your Website’s SEO (2023) – Shopify

Start your businessBuild your brandCreate your websiteOnline store editorCustomize your storeStore themesFind business appsShopify app storeOwn your site domainDomains & hostingExplore free business toolsTools to run your businessSell your productsSell…

Read more

Leave a Reply

Your email address will not be published. Required fields are marked *