Company

Products

Services

Partners

Media

 

 

 

Home

Empress Markets

News & Events


Presentations

Press Releases

Product Profile

Success Stories

Technical News


White Papers

Join Mailing List

For More Info

           

      Empress Technical News January 2009

    Database Text SearchIndex Fast Text Data Retrieval Part 1

     

    Introduction

    Empress Ultra Embedded 10.20 offers many additional features to application developers. One of those features is a Text Search Index capability.

    Empress Text Search Index

    Empress Ultra Embedded V10.20 Text Search Index capability empowers application developers to implement an efficient search for database records using keywords/tokens/phrases. The most typical usage would be to associate those keywords with particular character/text based attribute in a database table.

    The search index capability is developed as an additional set of C calls that are used in conjunction with Empress C/C++ Kernel Level API mr Routines. The search index is a user maintained index an index not maintained via Empress database engine calls.

    Application would supply the list (array) of tokens/keywords/phrases on insertion in the Empress database in order to create a text search index.

    Lets take an example of storing song titles in an Empress TEXT data type attribute. For the famous Beatles song I Want To Hold Your Hand the list of tokens/keywords/phrases could look like as follows:

    Want
    Hold
    Hand
    I
    Hold Your
    Your Hand
    Hold Your Hand

    In the above example the list contains seven tokens. Tokens can contain a single word or multiple words. Words do not necessarily have to be contained in the original song title. This could be beneficial when, for example, using either token Mc or Mac to get a song title Old Macdonald.

    Once created an Empress text search index enables very powerful searches using inserted tokens/keywords.

    In the presence of hundreds of thousands of song titles a search for the song title with a token Hold could result in the result set containing the following titles:

    You Really Got A Hold on Me (Beatles)
    If We Hold On
    (Diana Ross)
    I Want To Hold Your Hand
    (Beatles)
    Hold Me Back
    (Ac/Dc)
    Hold Me, Thrill Me, Kill Me
    (U2)


    Retrieval Using Search Index

    The following example provides the actual program code (select_text.c) using Empress C/C++ Kernel Level API mr Routines to show how to retrieve records from the table songs using the search index on the same table. The search index C API calls are in bold. The example performs the retrieval of all the records from the table that contain token Hold. When translated into SQL the example does something like:

    SELECT id, title FROM songs WHERE title HAS Hold

    #include      < mscc.h>
    #include
          < tsi_api.hx>
    #ifdef
      WINCE

    global_shared_func    (char* format, ...)
    #define
                    swprintf               
    #define
                    printf                      mssio_printf
    #endif
    #define
            DATABASE    L" karaokedb"
    int
              msmain (int      argc,
                                       
    char*  argv[])
    {

                 
    void*                         
                 
    void*                         
                 
    void*                         
                 
    void*                         
                 
    void*                         
                 
    void*                         
                 
    wchar_t*                                  id_value
                 
    wchar_t*                                  title_value
                 
    void*                         
                 
    long*                         
               
    char                   

    }

                  = mropen (DATABASE, L" songs" , 'r')
                 
    songs_recdesc = mrmkrec (songs_tabdesc)
                 
    id_attrdesc = mrngeta (songs_tabdesc, L" id"
                 
    title_attrdesc = mrngeta (songs_tabdesc, L" title" )
                 
    index_handle= mstsi_open (songs_tabdesc , title_attrdesc, 'r')
                 
    id_value = mrspv (id_attrdesc)
                 
    title_value = mrspv (title_attrdesc)
                 
    printf (" Songs that contain token Hold\n\n"
                     
      printf (buf, %-12s %-22s\n" , " Id" Title" )
                 
    record_list = mstsi_get (index_handle, L" Hold" , 0)
                 
    if (record_list == 0)
                 
    {
                                       
    printf (" No songs with token Hold\n"
                                       
    return 0
                 
    }   
                 
    qual = mrqlst (songs_tabdesc, record_list)
                 
    retrieve_desc = mrgetbegin (qual, songs_recdesc, (void*) 0)

                  (mrget (retrieve_desc))
                 
    {
                       
    mrcopyv (songs_recdesc, id_attrdesc, id_value)
                       
    mrcopyv (songs_recdesc, title_attrdesc, title_value)
                     
            sprintf (buf, " %-12ls %-22ls\n" , id_value, title_value)
                     
            printf (" %s" , buf)
                 
    }
                 
    mrgetend (retrieve_desc)
                 
    mrfree (id_value)
         
            mrfree (title_value)
                 
    mrfrrec (songs_recdesc)
                 
    mstsi_close (index_handle)    
                 
    mrclose (songs_tabdesc)
                     
      printf ( select1_text is done\n" )
                 
    return 0
    }
     

    Performance Considerations

    When executing the above mr program the performance could be several orders of magnitude better than when performing the same search using an SQL query, e.g.:

    SELECT id, title FROM songs WHERE title LIKE %Hold%

    or

    SELECT id, title FROM songs WHERE title SMATCH *

    In another example a search for a keyword " Embedded" was performed against all titles stored in Empress database containing Wikipedia data. The table pages contained 7,649,051 titles and related information such as and author.

    Using the text search index, the search completed in 0.077 seconds with 131 titles in the result set.

    For the comparison, an SQL query was executed to do the similar search on the TEXT attribute title

    SELECT id, title FROM pages WHERE title SMATCH '

    It required 35.314 seconds for the SQL query to complete.

    The Empress text search index was about 500 times faster than the SQL query!!

     

    EmpressSoftware Inc.
    www.empress.com

     

                       
                             
                         

Company
Information
GSA Contracts
Careers
Privacy Policy 
Contact US

Products
Empress RDBMS
Empress Servers
Empress API
Free Trial 
Empress iPad Apps
 

Services
Consulting
Mobile Apps Development
Technical Support

Training Courses
Empress Extreme  

Partners
Distributors
Business Alliances
Partner Programs  
GSA Program
Distributors Program

Media 
Empress Markets
News and Events
Press releases
Technical News
Success Stories

        USA: 301-220-1919                 Canada & International:       905-513-8888         
Copyright 2014           Empress Software Inc.           info@empress.com           Join Empress Mailing List