[Fsf-india] HTML.Template.java (fwd)

Frederick Noronha fred@bytesforall.org
Tue, 19 Mar 2002 10:47:50 +0530 (IST)


>From Philip S Tellis <philip.tellis@iname.com>

---------- Forwarded message ----------

hi, just thought I'd let you know about developments on my contributions 
to the OSI.

I've started a new project - HTML.Template.java.  This is a templating 
system for servlet writers.  It is based on the HTML::Template perl 
module written by Sam Tregar.  The purpose of the module is to separate 
code from design - which is so important in software development.  The 
project aims to be 100% compatible with HTML::Template and not include 
any extra features.  Ideally, we'd like to use the same templates for 
our perl and java programs.

You can check out the project homepage at 
http://html-tmpl-java.sourceforge.net/

the code is distributed under either the GPL or the Artistic licence.


Apart from that project that I run completely, I'm also contributing to 
the everybuddy project.  Everybuddy is an instant messenger client for 
MSN, Yahoo, AOL, ICQ, Jabber and more.  The new modular structure means 
that anyone can write a module for a new service and plug it in.  You 
don't even have to restart everybuddy for that.

My contribution involves a lot of work to the currently unmaintained 
yahoo code.  I've added an option for invisible logins (a must for 
people like me who get bombarded with chat requests the moment i appear 
online).  I've also added code to authorise add contact requests, and 
fixed a few bugs that were present initially and cause eb to either 
crash or hang.

You can see the everybuddy homepage at http://www.everybuddy.com/.  It 
hasn't been updated in a while.

I've also contributed code to the namazu project.  namazu is a search 
engine that uses a full text index, ie, no database required.  This is 
ideal in situations where you have a small website, and don't need a 
database.  There's no point installing a database server only for your 
search engine.  Namazu helps out here.  Namazu is a Japanese project 
(namazu is the japanese word for catfish).  You can see the namazu 
project at http://www.namazu.org/

My contributions include adding code for running the search through 
server side includes and passing the index name through the PATH_INFO 
variable instead of a query string (this helped in integrating it with 
mailman).

These two are officially part of the namazu tree (the second one will 
make it into 2.1).

I've also developed a namazu add-on that handles stopwords and synonyms.  
If you've used google, you'll have seen stop words before.  Basically, 
there's no point searching for words like `a, to, the, it' which occur 
in almost every document present.  A stop list is a list of all such 
words, and my code basically eliminates them from the search.

The synonym part was developed as part of our work at NCST.  While 
analysing search patterns, (the search is mainly used by students at 
NCST), I noticed three things.

- people entered questions.
  eg: When is the CST exam?
      How much are the fees for G-level?
      etc.
- people can't spell
- people used alternate terms (synonyms :) for words on the pages
  eg: technical assistant instead of technical associate
     
  or `umbrella words' - that covered a broad range.
  eg: oops for oopj, oopc and oops
      adbms for dbms, rdbms, odbms and adbms

naturally, the exact match search would return zero results in most 
cases.  this wasn't the kind of service we wanted to provide (I'll get 
to the kind we wanted later).

so, I developed the synonyms add-on, which basically replaces words with 
their alternatives.

so, I have things like this:

when /when|da(te|y)|time|sun|mon(th)?|tue|wed|thu|fri|sat|week/
where /where|place|centre|location/
conducted /conduct|held/
qualification /qualif|require|score/
begin /begin|start/
oops /oop[cjs]/
adbms /[a-z]dbms|advance database/
assistantship /associate/
assistant /ass(ociate|istant)/
assistanceship /associate/
cursors /curses/
collage /college/


the word on the left is what is to be replaced, the regex on the right 
is what it is replaced with.

so, if someone searches for `when', we actually search for when or day 
or date or time or sun, mon, tue..., week, month.

similarly for the others.

the last two you see are common spelling mistakes specific to our 
domain.

For the linux users group, I have other synonyms, eg:

indianization /(indian|local)i[sz]ation/


these two patches are useful to a few people, but the namazu developers 
have decided not to include them in the main tree yet.  We do however 
make them available to anyone who wants them under the terms of namazu 
itself (GPL).

Our ultimate goal in this kind of a search system is to build a system 
(Sandesh) that will answer student queries by email.  A student sends in 
a standard query to some staff (sometimes they even send it to the 
director!).  The staff will forward the mail to Sandesh.  Sandesh in 
turn will analyse the mail, and look for text that looks like a 
question or something the student wants to know.  It will then generate 
a search string, and pass it on to our search engine.  The search engine 
will return a list of pages, that Sandesh will look at, and return clips 
from the page that match the query.

This is good for cases where we have FAQs on a single page (in fact, the 
perldoc -q method comes to mind).

As far as the official open source projects from NCST goes, I guess I 
should just go ahead and tell you what they are.

The first one (that we're sure of releasing) is a web based calendar.  
It was made primarily for scheduling courses over the web, so students 
could check up on their classes.  The design was extensible, and we 
managed to use it for different options, even sending out birthday 
wishes for staff in our department.

We're now planning on releasing it part by part.  Each part would be 
usable on its own for different projects.  The first that will be 
released is the Calendar perl module.  This is nothing but an 
abstraction around the unix cal program.  It provides an abstract 
object oriented interface to cal for perl programmers.  Additionally, it 
has hooks to attach text to each date.  This will be used in conjunction 
with a Schedule class that will populate the Calendar with events.

The only delay right now is in setting up a download server in NCST (or 
at least in India) to host these.  We had initially planned on 
sourceforge, and I even created the account there, but then their terms 
of service changed, and we decided not to go ahead.  The next option was 
savannah from gnu, but at this time we decided that it would be good to 
have a server in India itself.

Well, hope you find this information useful.

Philip

-- 
One uses power by grasping it lightly.  To grasp with too much force is to be 
taken over by power, thus becoming its victim.

-Bene Gesserit Axiom


Visit my webpage at http://www.ncst.ernet.in/~philip/
Read my writings at http://www.ncst.ernet.in/~philip/writings/

  MSN  philiptellis                         Yahoo!  philiptellis
  AIM  philiptellis                         ICQ     129711328