Whole words only

Wednesday, October 9, 2024

Improved handling of JavaScript

 Dynamic websites have long been a challenge for web crawlers because some URLs may be generated dynamically via program code. Website menus, for example, may be created from data tables where the HTML for menu commands and landing pages are assembled at runtime. Without some interpretation of the JavaScript, those landing pages could be missed by a crawler.

The Blossom spider has long scanned JavaScript for potential URLs, but it does not execute JavaScript code. The latest update has enhanced the scan to extract more URLs embedded in strings. As a result, the spider now finds dynamic URLs that were previously missed and thus the number of pages in some indexes has grown.