The new spider has improved handling of dynamic websites and offers some new control over spidering. Among the changes:
- The inclusion/exclusion lists are now more powerful. Read about the full capabilities in the Search Guide.
- Of special note for includes lists is the new $ prefix. It tells the spider to only follow links given in an index file, for example a sitemap file. This is especially useful for richly interconnected sites like blogs.
- The exclude list now allows multiple wildcards (the * character) and an end of URL mark (the $ character). The ? character is not special, following the syntax of robots.txt files.
- Redirections (HTTP codes 301-308) now adhere to include/exclude specifications. Redirections are cached between spidering runs, speeding updates.
- Canonical links are used when possible.
- Cookies are always saved a resent. This improves the experience on session-oriented sites.
- From the search configuration page (at https://blossom.com) you can control the speed of spidering.
- Chunking of document content into logical units (e.g., sentences) has been improved. This is reflected in improved snippets shown search results.
If you see any problems due to the new spider, please let us know by emailing Blossom Support.