Thursday, April 19, 2012

Tested software agent server with Solr 3.6

I just ran a couple of simple tests to see how well the Base Technology software agent server could connect to Apache Solr 3.6 (open source enterprise search platform) which was just released last week. I did have to make a few changes to the agent server code, to add support for the HTTP POST verb and to permit HTTP GET to bypass the web page cache manager of the agent server.
 
Originally, I was going to access Solr via the SolrJ interface (Solr for Java), but I figured I would start with direct HTTP access to see how bad it would be. It wasn't so bad at all. I may still add support for SolrJ, but one downside is that it wouldn't be subject to the same administrative web access controls that normal HTTP access is. I'll have to think about it some more, but I could probably encapsulate the various SolrJ methods as if they were the comparable HTTP access verbs (GET for query, POST for adding documents, etc.) so that the administrative controls would work just as well with SolrJ. At least that's the theory.
 
For now, at least I verified that a software agent can easily add documents to and query a Solr server running Solr 3.6.
 
The code changes are already up on GitHub.
 
I do need to add a new option, "enable_writable_web", which permits agents to do more than just GET from the web. I had held off on implementing POST since it is one thing to permit agents to read from the web, but permitting them to write to the web is a big step that adds some risk for rogue and buggy agents. For example, with one POST command you can delete all documents from a Solr server. Powerful, yes, dangerous, also yes.
 
I also need to make "enable_writable_web" a per-user and even per-agent option so that an agent server administrator can allow only some users or agents to have write access to the web. There will probably be two global settings for the server, one for the default for all users, and one which controls whether any users can ever have write access to the web. The goal is to make the agent server as safe as possible by default, but to allow convenient access when needed and acceptable as well.
 
Unfortunately, after all of that, it turns out that Solr has a "stream.body" feature that allow documents to be added and deleted using an HTTP GET verb. Oh well, that's life. You can't cover all bases all of the time.

No comments:

Post a Comment