Simple flat file site search in PHP/Smarty
Posted by Admin • Wednesday, January 19. 2011 • Category: Linux
Sometimes using a real search implementation (Lucene, Sphinx) is just too much. The particular site I was working on is something like 30 pages, it's maintained as flat files (Smarty templates, but it's basically HTML on disk), and it really, really should not require megabytes of code and cron jobs to be able to search it!
That said, this is a simple search solution - it makes a lot of assumptions:
So for a query like : 'tall tree' the resulting command would be something like:
find templates -name 'pageprefix_*.tpl' | xargs grep -il 'tall' | xargs grep -il 'tree'
The output of this command would simply be the list of files that contain both 'tall' and 'tree' (try it on the command line first). The function returns the clean bare page names (this is appropriate in my environment) so that the search page can present them the way it wishes.
The actual Smarty template is:
That said, this is a simple search solution - it makes a lot of assumptions:
Assumptions
- Site is hosted on a linux/unix OS
- Files that correspond to pages are easily discerned from files that should not be searched
- You can exec a system command from PHP
- You can infer the name of the page to present in search results from the name of the file (or you don't care)
Solution
- Use grep
- No, seriously, use grep . Sound scary? Not if it's only a few dozen pages
- Combine grep -l calls to create "AND"-ed searches for all the keywords
So for a query like : 'tall tree' the resulting command would be something like:
find templates -name 'pageprefix_*.tpl' | xargs grep -il 'tall' | xargs grep -il 'tree'
The output of this command would simply be the list of files that contain both 'tall' and 'tree' (try it on the command line first). The function returns the clean bare page names (this is appropriate in my environment) so that the search page can present them the way it wishes.
Code
(As my site is PHP + Smarty, my solution is implemented as a Smarty plugin - but you can take the code and do whatever you like)
/**
Smarty plugin
-------------------------------------------------------------
Type: function
Name: filesearch - performs a file search among the public pages
using filesystem grep
Param
-------------------------------------------------------------
**/
function smarty_function_filesearch($params, &$smarty) {
if(checkInvalidValues($params, 'keywords')) { //this is one of my helpers, this is up to you to do your way
return "NO GOOD";
}
$assign = $params['assign'];
//sanitize incoming keywords for extra hacking safety
$cleankeywordparam = str_replace(array('/','.','..','#','&','*', ';', '<', '>', '\'', '"', '}', '{', ']', '[', '$', '\\'), array(' '), $params['keywords']);
$results = array();
try {
// This is where we hardcode the find expression for your pages, adjust as appropriate:
$cmd = "find templates -name 'pageprefix_*.tpl' | xargs grep -il ";
$keywords = explode(' ', $cleankeywordparam);
foreach ($keywords as &$keyword) {
$keyword = escapeshellarg($keyword);
}
$cmd .= implode("| xargs grep -il ", $keywords);
// echo "Will execute '$cmd'";
$output = shell_exec($cmd);
$results = explode("\n", $output); //the results are just filenames, one per line
foreach ($results as &$result) {
$result = basename($result, '.tpl');
// echo "<br/>processing $result ";
$temp = explode('', $result, 2); //my files are named like pageprefix_pagename.tpl
$result = trim($temp[1]); //don't want the pageprefix part;
}
$results = array_filter($results, '_trimPages'); //remove empty entries
} catch (Exception $e) {
echo "Search error : " + $e;
}
if (checkInvalidValues($params, 'assign')) {
return $results;
} else {
$smarty->assign_by_ref($assign, $results);
}
}
function _trimPages($value) {
return isset($value) && strlen($value) > 0;
}
The actual Smarty template is:
<form method="post">
<input type="text" name="keywords" size="40"
{if $smarty.post.keywords}value="{$smarty.post.keywords}"{/if}
></input>
<input type="submit" value="Search"/>
</form>
{if $smarty.post.keywords}
{filesearch keywords=$smarty.post.keywords assign=searchresults}
<h2>Search Results:</h2>
<ul>
{foreach from=$searchresults item=pagename}
{ This interpolates the filename back to a human-readable string *}
<li><a href="/{$pagename}">{$pagename|replace:'_':' '|replace:'-':' '|ucfirst}</a></li>
{/foreach}
</ul>
{/if}
0 Comments
Add Comment