﻿id	summary	reporter	owner	description	type	status	priority	milestone	component	version	resolution	keywords	cc
82	cool comparison tools -- find similar	Fred T. Hamster	bugdock	"*    Examines a directory tree and builds an index of file name portions.     *
*  The files with similar names are reported.                                 *


   1.  text comparison tool for supporting nechung:
         1. it would be nice to have a tool that could check that there are no duplicate fortunes in nechung database.
         2. more generally, this should be able to do a fuzzy compare of quotes against each other and report ones that incorporate a large amount of another quote or which is too similar to another quote.



idea could encompass both...

==============

generalized text comparison giving a score based on:

  words used in common.

  similar word orders.

  similar sizes for words.

  sizes for words distributed similarly.
     (long here in both, etc)

  how many large chunks are the same.

  must work on text as a single stream--no crs.




probably most important is how similar the word choices are.
  
  
------------

basic idea...

comparator:
  can take like a file system tree and spit out the names that are similar
  or same within it.  should operate on any list.


"	new-feature	assigned	minor		feistymeow-nucleus				
