Search Engine News: October 2009

Wednesday, October 14, 2009

List of image-only PDF files

Some sites have PDF files that don't contain text; often they are generated from scanned documents. The lack of text makes the files invisible to the search engine. To reduce the number of times image-only PDF files are downloaded, a list is kept for each index. You can now download that list from the Search Configuration page. Follow the link "Retrieve the list of image-only PDF files being ignored" in the "Search Index Settings" section.

The list is cleared whenever a complete respidering of the index is triggered, such as when you request a non-scheduled index update (by following the "Update an index" link on the Search Configuration page).

Thursday, October 8, 2009

New option for database-driven sites

Some websites have pages generated dynamically from a repository. Often on these sites there are multiple URLs that can generate the same page. Blogs are an example; an article may have multiple labels, each label providing a path to the same text. For example, www.domain.com/july/article1 and www.domain.com/announcements/article1 might both refer to the same article.

A new option has been added to the Blossom spider telling it to ignore all but the last component of a URL when determining whether two URLs are the same. Thus in the example above, "article1" would only be retrieved once. This will reduce the number of duplicate documents downloaded from a site, saving both bandwidth and potential page count.

Contact Blossom Support if you think your site might benefit from using this option.

Search Engine Library

We have built a library of documents to help you get the most out of the Blossom Search service:

The Search Guide. An introduction to all the features and options of the search service. Includes many examples.
What Makes a Good Search Engine? A peek into the philosophy behind the Blossom search engine.
Phrasal Query Suggestions. Describes a key aspect of Blossom's approach to guided search.
Fine tuning with page weights and meta tags. Details on how to influence the order of pages in the search results.

Search Engine News

Wednesday, October 14, 2009

List of image-only PDF files

Thursday, October 8, 2009

New option for database-driven sites

Blossom Search Engine News

Search Engine Library

Labels