Why Did Google Gemini "Leak" Chat Data? – Search Engine Journal

Join us as we explore exclusive survey data from today’s top SEO professionals and digital marketers to inform your strategy this year.
Join us as we explore exclusive survey data from today’s top SEO professionals and digital marketers to inform your strategy this year.
Join us as we explore how to scale the very time-consuming and complicated process of earning links from digital PR, with proven case studies showing how you can earn hundreds of links in 30 days.
This strategy guide is the first step towards attracting high-quality leads and revolutionizing how you think about lead generation.
Join us as we explore how to scale the very time-consuming and complicated process of earning links from digital PR, with proven case studies showing how you can earn hundreds of links in 30 days.
Reserve your spot and discover 10 quick and easy SEO wins to boost your site’s rankings.
Google Gemini chat pages seemingly leaked onto the Internet but the reality of what happened is eye opening
It only took twenty four hours after Google’s Gemini was publicly released for someone to notice that chats were being publicly displayed in Google’s search results. Google quickly responded to what appeared to be a leak. The reason how this happened is quite surprising and not as sinister as it first appears.
@shemiadhikarath tweeted:
“A few hours after the launch of @Google Gemini, search engines like Bing have indexed public conversations from Gemini.”
They posted a screenshot of the site search of gemini.google.com/share/
But if you look at the screenshot, you’ll see that there’s a message that says, “We would like to show you a description here but the site won’t allow us.”
By early morning on Tuesday February 13th the Google Gemini chats began dropping off of Google search results, Google was only showing three search results. By the afternoon the number of leaked Gemini chats showing in the search results had dwindled to just one search result.
Screenshot of Google's search results for pages indexed from the Google Gemini chat subdomain
Gemini offers a way to create a link to a publicly viewable version of a private chat.
Google does not automatically create webpages out of private chats. Users create the chat pages through a link at the bottom of each chat.
Screenshot of how to create a public webpage of a private Google Gemini Chat
The obvious reason for why the chat pages were crawled and indexed is because Google forgot to put a robots.txt in the root of the Gemini subdomain, (gemini.google.com).
A robots.txt file is a document for controlling crawler activity on websites. A publisher can block specific crawlers by using commands standardized in the Robots.txt Protocol.
I checked the robots.txt at 4:19 AM on February 13th and saw that one was in place:
Google Gemini robots.txt file
I next checked the Internet Archive to see how long the robots.txt file has been in place and discovered that it was there since at least February 8th, the day that the Gemini Apps were announced.
Screenshot of Google Gemini robots. txt from Internet Archive showing it was there on February 8, 2024.
That means that the obvious reason for why the chat pages were crawled is not the correct reason, it’s just the most obvious reason.
Although the Google Gemini subdomain had a robots.txt that blocked web crawlers from both Bing and Google, how did they end up crawling those pages and indexing them?
Read: 6 Common Robots.txt Issues & And How To Fix Them
It’s likelier that there’s a public links.
I asked Bill Hartzer (@bhartzer) about it and he discovered a public link for one of the indexed pages:
Public link to a Google Gemini shared chat page
So now we know that it’s highly likely that a public link caused these Gemini Chat pages to be crawled and indexed.
Bill Hartzer offered this observation:
“Even though the Gemini URL is being blocked in the robots.txt file, there is a link to the Gemini URL in a blog comment, so that Gemini URL is getting indexed.
This just goes to show that Google will still index URLs that are blocked from crawling in the robots.txt file.
If Google really wanted to make sure that Gemini URL is not indexed, they would ALLOW crawling in the robots.txt file and add a noindex meta tag on the pages. Maybe Google should follow it’s own advice here?”
But if there’s a public link then why did Google start dropping chat pages altogether? Did Google create an internal rule for the search crawler to exclude webpages from the /share/ folder from the search index, even if they’re publicly linked?
Now here’s the really interesting part for all the search geeks interested in how Google and Bing index content.
The Microsoft Bing search index responded to the Gemini content differently from how Google search did. While Google was still showing three search results in the early morning of February 13th, Bing was only showing one result from the subdomain. There was a seemingly random quality to what was indexed and how much of it.
Here are the known facts:
That brings us back to the question of why these pages started dropping off of the search results of both Google and Bing. My guess is that the Google Gemini chat pages are low quality webpages that are not worth showing for what are essentially longtail searches (site:gemini.google.com/share/). There’s really no useful reason to surface these pages in the search results.
Content that is blocked by Robots.txt can still be discovered, crawled and end up in the search index and if the pages are useful they can also rank, unless they are not useful. I think this may be the case.
 
I have 25 years hands-on experience in SEO and have kept on  top of the evolution of search every step …
Conquer your day with daily search marketing news.
Join Our Newsletter.
Get your daily dose of search know-how.
In a world ruled by algorithms, SEJ brings timely, relevant information for SEOs, marketers, and entrepreneurs to optimize and grow their businesses — and careers.
Copyright © 2024 Search Engine Journal. All rights reserved. Published by Alpha Brand Media.

source

Related Posts

Google Strengthens Search Console Security With Token Removal Tools – Search Engine Journal

Stay ahead of the game with the best marketing tools and ensure your tactics are primed for success in this new era of digital marketing.This webinar will equip you with…

Read more

Google Search Console security update improves management of ownership tokens – Search Engine Land

sel logoSearch Engine Land » SEO » Google Search Console security update improves management of ownership tokensChat with SearchBot Please note that your conversations will be recorded. SearchBot: I am…

Read more

Search Engine Optimization (SEO) Market Size Worth USD 157.41 Billion in 2032 | Emergen Research – Yahoo Finance

Search Engine Optimization (SEO) Market Size Worth USD 157.41 Billion in 2032 | Emergen Research  Yahoo Financesource

Read more

AI Prompt Engineering Tips for SEO – JumpFly PPC Advertising News

AI Prompt Engineering Tips for SEO  JumpFly PPC Advertising Newssource

Read more

Most Common B2B SaaS SEO Mistakes – MarketingProfs.com

by Ryan Lingenfelser Many B2B SaaS companies ignore SEO… and they are often right to do so!For SMBs, especially startups, it rarely makes sense to prioritize SEO. Compared with marketing…

Read more

How To Create an XML Sitemap To Improve Your Website’s SEO (2023) – Shopify

Start your businessBuild your brandCreate your websiteOnline store editorCustomize your storeStore themesFind business appsShopify app storeOwn your site domainDomains & hostingExplore free business toolsTools to run your businessSell your productsSell…

Read more

Leave a Reply

Your email address will not be published. Required fields are marked *