solr

Three items on the Wishlist for the current Drupal apache_solr module

The current Solr-Drupal integration module written by Robert Douglass is very sophisticated. There are a three things that would be helpful:

  1. The ability to with in with a Solr Cluster -- split read writes -- you send updates to a solr update instance and read from a read instance.
  2. Allow a cck field mapping wizard in the admin pages
  3. Generate a schema.xml for updating the Solr scheme when the mapping of cckfields is updated.

Drupal Apache Solr <----> Views integration under way

I seems that there is an effort to integrate the Solr module in Drupal to the Views family of modules, details here.Offered to collaborate with Thomas -- the chap running the effort.

Lucene indexing performance parameters -- mergeFactor, mergeFactor, minMergeDocs

Key Lucene indexing performance parameters:

  1. mergeFactor -- this variable controls how many index segments get created. Interested tidbit that it uses power law to decide when to merge the segments. In short more segments quicker the indexing but slower the searching.
  2. maxMergeDocs -- this limits the documents per index segment.
  3. minMergeDocs -- this controls how many have to end up in the buffer before they are written to disk.

On Lucene in Action -- mini review

On Lucene in action is a great book for anyone who has to solve practical problems with Lucene or its related family of products. For me, someone who is working on a fairly complex Solr install the book is a gem because:

Benchmarking Solr and creating a standard document set from Gutenberg

As with any benchmarking exercise the need for a standard dataset is critical. My problems began when I was trying to index a large number of documents (1.5 million Drupal nodes). Googling the web I found a lot of notes on tweaking the performance parameters of Solr but may a time they were conflicting or did not make sense, As a result I have decided to construct my own benchmarking platform.

Syndicate content