Select Page

The search engine giant has recently updated its Googlebot Help document with the clarification that Googlebot only crawls the first 15MB of content in an HTML file or supported text-based file, to determine rankings. Google uses Googlebot to crawl a huge number of web pages on the Internet that is increasing every day.

The update with this huge limit is not something new and the documentation has been in place at Google for a long time.

Some highlights on this Googlebot clarification “Googlebot and the 15 MB thing“-

  • This 15MB Googlebot limit is specific to the HTML file itself
  • 15 MB only applies to the HTML script, it’s not inclusive of other elements such as images or videos
  • Embedded resources/content pulled in with IMG tags are not a part of the HTML file
  • Any resources referenced in the HTML such as images, videos, CSS, and JavaScript are fetched separately
  • The content after the first 15 MB is dropped by Googlebot, and only the first 15 MB gets forwarded to indexing
  • Using data URIs will contribute to the HTML file size
  • since they are in the HTML file
  • This limit only applies to the bytes (content) received for the initial request Googlebot makes, not the referenced resources within the page

John Mueller
Image Source:

In response to the queries related to the 15MB limit, Google’s John Mueller tweeted on 24 June to clarify that embedded resources or content with IMG tags would not count as part of the HTML file. John Mueller also confirmed that this is not a change, just official documentation of an already existing policy.

Googlebot Limit : What can Marketers Do?

To cope with this Googlebot limit, digital marketers and website owners can consider to –

  • include all the necessary content near the top of web pages
  • structure the code in such a way to place the SEO relevant data with the primary 15 MB in an HTML or supported text-based file
  • compress the images and videos and do not encode them directly into the HTML, whenever possible
  • keep your HTML pages to 100 KB or less, so this change will not affect many sites
  • check page size using advanced tools such as Google Page Speed Insights, Sitechecker, Chrome DevTools, DebugBear or any other
  • keep your web pages light and easy to load with faster navigation, which makes it easy to reach for users as well as Googlebot

However, images, videos, CSS, and JavaScript being fetched separately doesn’t mean Googlebot doesn’t see your pages images or videos. Instead, Googlebot fetches videos and images that are referenced in the HTML with a URL separately with consecutive fetches.

Does your website have pages exceeding 15MB of HTML? It is time to fix those underlying problems to improve your rankings? Consider hiring the services of a good digital marketing services company. Make sure your SEO partner is up to date with such changes trending in the industry.

 

Jargon Buster

Data URI – A Uniform Resource Identifier (URI) is a character sequence that facilitates interactions between resources.

Googlebot – Googlebot is the generic name for Google’s web crawler. Google has two different types of crawlers: a desktop crawler that simulates a user on desktop, and a mobile crawler that simulates a user on a mobile device.