For one of my startups-still-in-stealth-mode I’m working on a professional and scalable solution for search suggestions. Considering the current low number of daily visits I first thought I could easily get away with one of the many tutorials on search suggestions with mysql. However, the complexity would force me to use JOINS in combination with UNION JOINS which will result in the end in a crappy performance. Also, I shoot for the stars so anything I build needs to be super scalable.
So I chose for a setup with a text-based search engine. Now these days there are many text-based search engines out there but since I have some experience with Lucene based Solr and since Solr is just a bit more suited for dummies I chose Solr. Another popular alternative is Elastic Search but I seriously doubt that my website or yours would come to a point where you can’t do it with Solr. The Lucene engined is suitable for and extreme high number of concurrent requests, is super stable and serves many high-traffic websites out there.
So, this will be the first time I actually install Solr on a Webfaction server and for the view out there struggling with the same I share the complete process step-by-step.
In order for Solr to run it needs a Servlet container like Tomcat and Jetty. Solr actually comes with Jetty build in but for several reasons (security, doing-difficult-just-for-the-heck-of-it and others) I’ve decided to go with Tomcat 7.
OMG, this was a total nightmare and cost me >2 days to get it done so kudo’s and other forms of appreciation are very welcome!
Step 1: Installing Tomcat 7
Just follow the steps as described in the post How-to install Tomcat 7 on Webfaction for Dummies. Next, stop the Tomcat service:
Step 2: Download the Solr Distribution
First go to the root folder of your newly added website which prob is:
Next, find the link to the latest production version of Solr. Go to http://apache.spinellicreations.com/lucene/solr/ click on the latest production version which in this case is 4.0.0. Click on the folder link and then copy the link location of the tgz file, watch out you don’t copy the source code which is identified by having -src- in the filename. Once you’ve copied the download location go back to your ssh session and type:
And then wait till the download is finished, which can take quite a while in my case, probably I chose a mirror that’s quite far from my web server. Next extract the downloaded file:
tar zxf apache-tomcat-7.0.33.tar.gz
Basically here I followed the instructions from the Apache Wiki on installing Solr on Tomcat:
Copy the example/solr directory from the source to the installation directory like /opt/solr/example/solr, herafter $SOLR_HOME. Copy the .war file dist/apache-solr-*.war into $SOLR_HOME assolr.war.
The configuration file $SOLR_HOME/conf/solrconfig.xml in the example sets dataDir for the index to be ./solr/data relative to the current directory – which is true for running the Jetty server provided with the example, but incorrect for Tomcat running as a service. Modify the dataDir to specify the full path to $SOLR_HOME/data:
The dataDir can also be temporarily overridden with the JAVA_OPTS environment variable prior to starting Tomcat:
export JAVA_OPTS="$JAVA_OPTS -Dsolr.data.dir=/opt/solr/example/solr/data"
Create a Tomcat Context fragment to point docBase to the $SOLR_HOME/solr.war file and solr/home to $SOLR_HOME:
<?xml version="1.0" encoding="utf-8"?>
<Context docBase="/opt/solr/example/solr/solr.war" debug="0" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="/opt/solr/example/solr" override="true"/>
Symlink or place the file in $CATALINA_HOME/conf/Catalina/localhost/solr-example.xml, where Tomcat will automatically pick it up. Tomcat deletes the file on undeploy (which happens automatically if the configuration is invalid).
Seemed all pretty straightforward and I was hopeful it would work, but it didn’t….. Great, and all articles I can find on this topic are either way outdated or not relevant. This is where a 2 day quest began till and in the meanwhile also installing Solr using the included Jetty just to see Solr in action. But since securing Jetty is a drag, I didn’t want to give up.
The errors I got varied from “Corrupt war file” to:
Nov 27, 2012 10:10:43 AM org.apache.catalina.startup.ContextConfig init
SEVERE: Exception fixing docBase for context [/solr]
java.util.zip.ZipException: error in opening zip file
Some posts pointed to changing permissions to files and folders but in the end I found it had to do with the paths. Somehow relative paths seemed to fail, even though the paths logged were exatcly the right paths. So I changed all paths in any config file to full absolute paths:
So, first the most important one, the Tomcat Context fragment, which I saved as solr.xml instead of solr-example.xml just because it gives a nicer folder name.
So this is the new Tomcat Context fragment which you have to save as /conf/Catalina/localhost/solr.xml and don’t forget to remove any other exisiting context fragment:
<?xml version="1.0" encoding="utf-8"?>
<Context docBase="/home/username/webapps/tomcat/opt/solr/example/solr/solr.war" debug="0" crossContext="true">
<Environment name="solr/home" type="java.lang.String" value="/home/username/webapps/tomcat/opt/solr/example/solr/" override="true"/>
Next I also had to change the datadir setting in /opt/solr/example/solr/collection1/conf/solrconfig.xml and changed it to a full absolute path:
And this was the time when my two day quest was over, after restarting Tomcat I could access Solr, the core was started and I could add data to it. Now I’m pretty sure there must be a more graceful way to do this but it did the trick. Any suggestions are obviously always welcome.
Some More Tips
When I was trying to add another core I kept running into errors about fieldtypes, even though I copied that whole section from the example in my new schema. The solution was to add a version field to your schema:
<field name="_version_" type="long" indexed="true" stored="true" multiValued="false" />
Don’t fill this field, it will be handled by Solr itself.