Whole words only

Tuesday, October 29, 2019

New indexer live

A new spidering and indexing engine for Blossom Search is now live with significant changes to improve site coverage and search results. Over time, websites have become much more dynamic with much of the HTML generated at the time of delivery. Dynamic sites present two significant problems:
  1. Links to some content may only be generated by client-side programs.
  2. Multiple links may generate the same content.
These are not, of course, new concerns, but the increasing complexity of websites makes the spidering and indexing tasks more difficult. The new changes address both of these issues. As a result, you may see your search indexes grow, or perhaps shrink!

The index will grow if your site uses sitemaps. The Blossom spider will now routinely look for sitemap.xml in the root directory for a website. If the file exists, it will use the sitemap to help guide the traversal of a website. If you wish to prevent that behavior, log into the Search Configuration page for your index at Blossom.com, follow the "Spidering, Indexing, and Reporting" link and uncheck the box "Read Sitemaps".

The index will shrink if your site contains or generates very similar pages accessible by different URLs. Duplicate recognition has been improved to pick up pages with nearly identical content regardless of the tag structure. Duplicate pages always have the same title, so using unique titles for unique content will prevent the indexer from ever judging very similar pages to be duplicates.