Personal tools
You are here: Home
Document Actions

ddc-concordance - a search engine for linguists

by superuser last modified 2008-01-10 18:01

Introduction to DDC-Concordance


DDC-Concordance is an open source (LGPL) search engine developed specially to meet the needs of linguistic researchers. The following properties in particular are relevant: 
  • Sentence-based or document-based searches
  • Statistical queries, not approximations
  • additionally to classical search engine properties like boolean operators (AND, OR, NOT), left and right truncation and distance search operators, ddc-concordance also can search for word forms. E.g. a search for "child" will find all documents containing wordforms like child, children etc. This functionality is currently available for english, german and russian.
  • ddc-concordance can index metadata from xml documents
  • words can be indexed with searchable annotations, especially word forms, lemmas, part of speech-tags and semantic categories
  • Interval searches (targeted and symmetrical e.g. NEAR and FOLLOWED_BY)
  • searching for phrases
  • relevance ranking operator for documents
  • ddc-concordance is fast. Indexing of a 100 million words corpus takes approximately 1.5 hours. The first ten hits for simple queries are shown in about 0.2 seconds.
  • ddc-concordance can handle huge corpora because of its distributed clustering architecture. The largest known corpus is about 1 billion tokens, but we haven't reached a limit yet.
  • there is client software for perl, php, python, C/C++ available (developer stuff) but also ready-to-use command line clients and a simple cgi script

Download the software

You can download the ddc-concordance software and some extensions here:

http://sourceforge.net/projects/ddc-concordance

Are you doing something interesting with ddc-concordance? Big site deployments, interesting use cases? Tell us about it!

Thanks for using our product!



Powered by Plone CMS, the Open Source Content Management System