Akom's Tech Ruminations

Various tech outbursts - code and solutions to practical problems

Simple flat file site search in PHP/Smarty Linux

Posted by Admin • Wednesday, January 19. 2011 • Category: Linux

Sometimes using a real search implementation (Lucene, Sphinx) is just too much. The particular site I was working on is something like 30 pages, it's maintained as flat files (Smarty templates, but it's basically HTML on disk), and it really, really should not require megabytes of code and cron jobs to be able to search it!

That said, this is a simple search solution - it makes a lot of assumptions:

Assumptions

  1. Site is hosted on a linux/unix OS
  2. Files that correspond to pages are easily discerned from files that should not be searched
  3. You can exec a system command from PHP
  4. You can infer the name of the page to present in search results from the name of the file (or you don't care)

Solution

  1. Use grep
  2. No, seriously, use grep . Sound scary? Not if it's only a few dozen pages
  3. Combine grep -l calls to create "AND"-ed searches for all the keywords


So for a query like : 'tall tree' the resulting command would be something like:

find templates -name 'pageprefix_*.tpl' | xargs grep -il 'tall' | xargs grep -il 'tree'
 

The output of this command would simply be the list of files that contain both 'tall' and 'tree' (try it on the command line first). The function returns the clean bare page names (this is appropriate in my environment) so that the search page can present them the way it wishes.

Code

(As my site is PHP + Smarty, my solution is implemented as a Smarty plugin - but you can take the code and do whatever you like)

/**
  Smarty plugin
 
-------------------------------------------------------------
  Type:     function
 
Name:     filesearch - performs a file search among the public pages
       using filesystem grep
 

  Param
 

  -------------------------------------------------------------
 **/

function smarty_function_filesearch($params, &$smarty) {
   
    if(checkInvalidValues($params, 'keywords')) {  //this is one of my helpers, this is up to you to do your way
        return "NO GOOD";
    }
    $assign = $params['assign'];
   
    //sanitize incoming keywords for extra hacking safety
    $cleankeywordparam = str_replace(array('/','.','..','#','&','*', ';', '<', '>', '\'', '"', '}', '{', ']', '[', '$', '\\'), array(' '), $params['keywords']);
   
    $results = array();
    try {
        // This is where we hardcode the find expression for your pages, adjust as appropriate:
        $cmd = "
find templates -name 'pageprefix_*.tpl' | xargs grep -il  ";
        $keywords = explode(' ', $cleankeywordparam);
        foreach ($keywords as &$keyword) {
                $keyword = escapeshellarg($keyword);
        }
        $cmd .= implode("
| xargs grep -il ", $keywords);
//      echo "
Will execute '$cmd'";
       
        $output = shell_exec($cmd);
       
        $results = explode("
\n", $output); //the results are just filenames, one per line
        foreach ($results as &$result) {
                $result = basename($result, '.tpl');
//              echo "
<br/>processing $result ";
                $temp = explode('', $result, 2);  //my files are named like pageprefix_pagename.tpl
                $result = trim($temp[1]); //don't want the pageprefix
part;
        }
        $results = array_filter($results, '_trimPages');  //remove empty entries
       
    } catch (Exception $e) {
        echo "
Search error : " + $e;
    }
    if (checkInvalidValues($params, 'assign')) {
        return $results;
    } else {
        $smarty->assign_by_ref($assign, $results);
    }
}

function _trimPages($value) {
        return isset($value) && strlen($value) > 0;
}


The actual Smarty template is:

        <form method="post">
                <input type="text" name="keywords" size="40"
                {if $smarty.post.keywords}value="{$smarty.post.keywords}"{/if}
                ></input>
                <input type="submit" value="Search"/>
        </form>
 

        {if $smarty.post.keywords}
                {filesearch keywords=$smarty.post.keywords assign=searchresults}
                <h2>Search Results:</h2>
                <ul>
                {foreach from=$searchresults item=pagename}
                      {
This interpolates the filename back to a human-readable string *}
                        <li><a href="/{$pagename}">{$pagename|replace:'_':' '|replace:'-':' '|ucfirst}</a></li>
                {/foreach}
                </ul>
        {/if}
 

0 Trackbacks

  1. No Trackbacks

0 Comments

Display comments as (Linear | Threaded)
  1. No comments

Add Comment


You can use [geshi lang=lang_name [,ln={y|n}]][/geshi] tags to embed source code snippets.
Enclosing asterisks marks text as bold (*word*), underscore are made via _word_.
Standard emoticons like :-) and ;-) are converted to images.

To prevent automated Bots from commentspamming, please enter the string you see in the image below in the appropriate input box. Your comment will only be submitted if the strings match. Please ensure that your browser supports and accepts cookies, or your comment cannot be verified correctly.
CAPTCHA

What is the primary language of this blog? (Anti-SPAM question)


Submitted comments will be subject to moderation before being displayed.