Welcome to Arman Anwar's Scrap Book

Welcome to Arman's digital scrap book. Here you will find his notes on things that he enjoys and wrestles with. Two subjects that will often be addressed are Drupal and Web engineering. On rare occasion, you will find opinions on matters he has no qualification on.

Schema API: Drupal becomes reflective

I was at the Affinity Labs yesterday and Alex Barth gave an introductory talk about the Schema API in Drupal 6.0. I could glean the following pragmatics from his informative presentation:

  1. Drupal 6.0 is now aware of its schema, modules can query this interface and find out what the structure of the database schema is.
  2. The install files will not have hard coded DDL/SQL for the creation of tables in a Drupal instance's schema.

I've penciled in a note to myself to review this further but what worries me is the fact that are we complicating Drupal to a level where it will suffer the fate of some many other technology platforms. Don't get me wrong I still like the OR Managers such as Hibernate in the Java world but I'm very cautions at what may lead to conceptual bloat in Drupal.Perhaps Death by Bloat is a fundamental law of nature.

On Forester report on Open Source CMSes

I have not been able to get my hands on the original document, there is an distillation review of the report on CNet.

Interestingly the report cites Drupal and Alfresco and two leading contenders for the OSS CMS (or rather Web-CMS) space.

As someone who followed Alfresco initially, I was quite surprised that it came up on the list. Alfresco was more of the classical content management system -- or rather what I like to call digital asset management system like Documentum, etc. They had recently released a WebCMS capability. Going by innuendo the technical architecture of anything produced by Alfresco would have sufficient merit. But to me what differentiates Drupal is the number and variety of contributed modules. How Alfresco fares on that front or will continue to fare seems questionable. I think part of the reason is that the barrier to entry for someone to develop interesting functionality is much lower for Drupal as it is based on the LAMP stack as opposed to J2EE. Pundits may question Drupal's scalability credentials -- as someone who manages the development of a very large Drupal install and I mean large in terms of features, content and traffic -- the issue of scalability is tractable.

Drupal CDN module does not scale

Distressing comment about the Drupal module that works with CDNs:  "This module is not yet production ready. It works fine on smaller sites, but it doesn't scale yet"I'm thinking of using ProxyHTMLURLMap for this purpose.

Darth of Drupal Architecture articles

I've had a hard time finding articles and diagrams that illustrate the Drupal architecture. I've decided to post items explicitly in that area.Stay tuned.

Drupal Architecture

Druapl Sub System level Architecture diagram

The organization of the Drupal’s architecture at the subsystem level is traditional. It requires the use of a web server to handle transport via the http protocol and a relational database for persistence. Drupal exhibits the traditional three tiered architecture as seen in the attached image:

  1. The theme layer adds visual style to content generated by the business layers – this layer is generally seems heavy customization during the implementation of a target solution.
  2. The Business layer consists of components, known as modules; implement both functional and non functional features.
  3. The Utility stack cross cuts level and provides access to the database layer, controller (as in MVC), security, etc.

Three items on the Wishlist for the current Drupal apache_solr module

The current Solr-Drupal integration module written by Robert Douglass is very sophisticated. There are a three things that would be helpful:

  1. The ability to with in with a Solr Cluster -- split read writes -- you send updates to a solr update instance and read from a read instance.
  2. Allow a cck field mapping wizard in the admin pages
  3. Generate a schema.xml for updating the Solr scheme when the mapping of cckfields is updated.

Drupal Apache Solr <----> Views integration under way

I seems that there is an effort to integrate the Solr module in Drupal to the Views family of modules, details here.Offered to collaborate with Thomas -- the chap running the effort.

Generating Primes -- generatively

I have a new resolution that I'll write simple snippets of code that exercise my brain :-).

This is a simple code snippet that generates primes in a "generative" fashion -- it uses
previously generated primes to generate new primes.

Checked its correctness by comparing it to http://primes.utm.edu/lis...

$num_primes=1000; // the number of primes you want to generate
$primes = array(2,3,5,7); // prime the pump
$test=9; // set the first test prime
while(count($primes)<$num_primes) {
foreach ($primes as $prime) {
if ($prime > $test/2) { // if u haven't found one u won't
$primes[]=$test;
break;
} else if ($test%$prime===0) { // caught -- not a prime
break;
}
}
$test+=2;
}
print_r($primes); // print out the goodies

Spiral Generator

Ever wanted to draw a spiral like below:

*****************************************
* *
* ************************************* *
* * * *
* * ********************************* * *
* * * * * *
* * * ***************************** * * *
* * * * * * * *
* * * * ************************* * * * *
* * * * * * * * * *
* * * * * ********************* * * * * *
* * * * * * * * * * * *
* * * * * * ***************** * * * * * *
* * * * * * * * * * * * * *
* * * * * * * ************* * * * * * * *
* * * * * * * * * * * * * * * *
* * * * * * * * ********* * * * * * * * *
* * * * * * * * * * * * * * * * * *
* * * * * * * * * ***** * * * * * * * * *
* * * * * * * * * * * * * * * * * * * *
* * * * * * * * * * * * * * * * * * * * *
* * * * * * * * * *** * * * * * * * * * *
* * * * * * * * * * * * * * * * * * *
* * * * * * * * ******* * * * * * * * * *
* * * * * * * * * * * * * * * * *
* * * * * * * *********** * * * * * * * *
* * * * * * * * * * * * * * *
* * * * * * *************** * * * * * * *
* * * * * * * * * * * * *
* * * * * ******************* * * * * * *
* * * * * * * * * * *
* * * * *********************** * * * * *
* * * * * * * * *
* * * *************************** * * * *
* * * * * * *
* * ******************************* * * *
* * * * *
* *********************************** * *
* * *
*************************************** *
*
The following code will do the trick -- I don't like the 
way it is written -- I would like a f(x,y) approach
rather than this generative manner. If I have some time
I'll write that up too.
function gen_spiral($side_len=79) { $x=$y=$side_len/2; $len=1; $direction=0; $grid=array(); $dx=0; //delta x $dy=0; //delta y for ($count=0;$count<$side_len;$count++) $grid[]=array_fill(0,$side_len,0); // init 2d grid $grid[$x][$y]=1; //return $grid; $run=true; while ($run) { switch ($direction) { case 0: $dx=1; $dy=0; break; case 1: $dx=0; $dy=-1; break; case 2: $dx=-1; $dy=0; break; case 3: $dx=0; $dy=1; break; } for ($count=0;$count<$len;$count++) { $x+=$dx; $y+=$dy; if ($x<0 || $y<0 || $x>=$side_len || $y>=$side_len) { $run=false; break; } $grid[$x][$y]=1; } $direction=($direction+1)%4; $len++; } return $grid; } $spiral=gen_spiral(41); foreach ($spiral as $line) { foreach ($line as $cell) { echo ($cell==0?' ':'*'); //echo $cell; } echo PHP_EOL; }

Lucene indexing performance parameters -- mergeFactor, mergeFactor, minMergeDocs

Key Lucene indexing performance parameters:

  1. mergeFactor -- this variable controls how many index segments get created. Interested tidbit that it uses power law to decide when to merge the segments. In short more segments quicker the indexing but slower the searching.
  2. maxMergeDocs -- this limits the documents per index segment.
  3. minMergeDocs -- this controls how many have to end up in the buffer before they are written to disk.