Whole words only

Tuesday, February 18, 2020

New spider fully deployed

We have completed the transition to the new spider. It has been tested on every index. If a problem was found with your index, your technical contact would have received an email from Blossom Support advising a fix to the problem. In most cases, we were able to implement the fix and your technical contact was just asked to confirm the change.

The new spider has improved handling of dynamic websites and offers some new control over spidering. Among the changes:

  • The inclusion/exclusion lists are now more powerful. Read about the full capabilities in the Search Guide.
  • Of special note for includes lists is the new $ prefix. It tells the spider to only follow links given in an index file, for example a sitemap file. This is especially useful for richly interconnected sites like blogs.
  • The exclude list now allows multiple wildcards (the * character) and an end of URL mark (the $ character). The ? character is not special, following the syntax of robots.txt files.
  • Redirections (HTTP codes 301-308) now adhere to include/exclude specifications. Redirections are cached between spidering runs, speeding updates.
  • Canonical links are used when possible.
  • Cookies are always saved a resent. This improves the experience on session-oriented sites.
  • From the search configuration page (at https://blossom.com) you can control the speed of spidering.
  • Chunking of document content into logical units (e.g., sentences) has been improved. This is reflected in improved snippets shown search results.
If you see any problems due to the new spider, please let us know by emailing Blossom Support.