The query language processor is activated in the GUI simple search entry when the search mode selector is set to Query Language. It can also be used with the KIO slave or the command line search. It broadly has the same capabilities as the complex search interface in the GUI.
The language is based on the (seemingly defunct) Xesam user search language specification.
If the results of a query language search puzzle you and you
doubt what has been actually searched for, you can use the GUI
Show Query link at the top of the result list to
check the exact query which was finally executed by Xapian.
Here follows a sample request that we are going to explain:
author:"john doe" Beatles OR Lennon Live OR Unplugged -potatoes
This would search for all documents with
John Doe
appearing as a phrase in the author field (exactly what this is
would depend on the document type, ie: the
From: header, for an email message),
and containing either beatles or
lennon and either
live or
unplugged but not
potatoes (in any part of the document).
An element is composed of an optional field specification,
and a value, separated by a colon (the field separator is the last
colon in the element). Example:
Eugenie,
author:balzac,
dc:title:grandet
The colon, if present, means "contains". Xesam defines other relations, which are mostly unsupported for now (except in special cases, described further down).
All elements in the search entry are normally combined
with an implicit AND. It is possible to specify that elements be
OR'ed instead, as in Beatles
OR Lennon. The
OR must be entered literally (capitals), and
it has priority over the AND associations:
word1
word2 OR
word3
means
word1 AND
(word2 OR
word3)
not
(word1 AND
word2) OR
word3. Explicit
parenthesis are not supported.
An element preceded by a - specifies a
term that should not appear. Pure negative
queries are forbidden.
As usual, words inside quotes define a phrase
(the order of words is significant), so that
title:"prejudice pride" is not the same as
title:prejudice title:pride, and is
unlikely to find a result.
Modifiers can be set on a phrase clause, for example to specify a proximity search (unordered). See the modifier section.
Recoll currently manages the following default fields:
title,
subject or caption are
synonyms which specify data to be searched for in the
document title or subject.
author or
from for searching the documents
originators.
recipient or
to for searching the documents
recipients.
keyword for searching the
document-specified keywords (few documents actually have
any).
filename for the document's
file name.
ext specifies the file
name extension (Ex: ext:html)
The field syntax also supports a few field-like, but special, criteria:
dir for filtering the
results on file location
(Ex: dir:/home/me/somedir).
-dir
also works to find results not in the specified directory
(release >= 1.15.8). A tilde inside the value will be
expanded to the home directory. Wildcards will be
expanded, but
please have a
look at an important limitation of wildcards in
path filters.
Relative paths also make sense, for example,
dir:share/doc would match either
/usr/share/doc or
/usr/local/share/doc
Several dir clauses can be specified,
both positive and negative. For example the following makes sense:
dir:recoll dir:src -dir:utils -dir:common
This would select results which have both
recoll and src in the
path (in any order), and which have not either
utils or
common.
You can also use OR conjunctions
with dir: clauses.
A special aspect of dir clauses is
that the values in the index are not transcoded to UTF-8, and
never lower-cased or unaccented, but stored as binary. This means
that you need to enter the values in the exact lower or upper
case, and that searches for names with diacritics may sometimes
be impossible because of character set conversion
issues. Non-ASCII UNIX file paths are an unending source of
trouble and are best avoided.
You need to use double-quotes around the path value if it contains space characters.
size for filtering the
results on file size. Example:
size<10000. You can use
<, > or
= as operators. You can specify a range like the
following: size>100 size<1000. The usual
k/K, m/M, g/G, t/T can be used as (decimal)
multipliers. Ex: size>1k to search for files
bigger than 1000 bytes.
date for searching or filtering
on dates. The syntax for the argument is based on the ISO8601
standard for dates and time intervals. Only dates are supported, no
times. The general syntax is 2 elements separated by a
/ character. Each element can be a date or a
period of time. Periods are specified as
PnYnMnD.
The n numbers are the respective numbers
of years, months or days, any of which may be missing. Dates are
specified as
YYYY-MM-DD.
The days and months parts may be missing. If the
/ is present but an element is missing, the
missing element is interpreted as the lowest or highest date in the
index. Examples:
2001-03-01/2002-05-01 the
basic syntax for an interval of dates.
2001-03-01/P1Y2M the
same specified with a period.
2001/ from the beginning of
2001 to the latest date in the index.
2001 the whole year of
2001
P2D/ means 2 days ago up to
now if there are no documents with dates in the future.
/2003 all documents from
2003 or older.
Periods can also be specified with small letters (ie: p2y).
mime or
format for specifying the
mime type. This one is quite special because you can specify
several values which will be OR'ed (the normal default for the
language is AND). Ex: mime:text/plain
mime:text/html. Specifying an explicit boolean
operator before a
mime specification is not supported and
will produce strange results. You can filter out certain types
by using negation (-mime:some/type), and you can
use wildcards in the value (mime:text/*).
Note that mime is
the ONLY field with an OR default. You do need to use
OR with ext terms for
example.
type or
rclcat for specifying the category (as in
text/media/presentation/etc.). The classification of mime
types in categories is defined in the Recoll configuration
(mimeconf), and can be modified or
extended. The default category names are those which permit
filtering results in the main GUI screen. Categories are OR'ed
like mime types above. This can't be negated with
- either.
Words inside phrases and capitalized words are not stem-expanded. Wildcards may be used anywhere inside a term. Specifying a wild-card on the left of a term can produce a very slow search (or even an incorrect one if the expansion is truncated because of excessive size). Also see More about wildcards.
The document filters used while indexing have the possibility to create other fields with arbitrary names, and aliases may be defined in the configuration, so that the exact field search possibilities may be different for you if someone took care of the customisation.
Some characters are recognized as search modifiers when found
immediately after the closing double quote of a phrase, as in
"some term"modifierchars. The actual "phrase"
can be a single term of course. Supported modifiers:
l can be used to turn off
stemming (mostly makes sense with p because
stemming is off by default for phrases).
o can be used to specify a
"slack" for phrase and proximity searches: the number of
additional terms that may be found between the specified
ones. If o is followed by an integer number,
this is the slack, else the default is 10.
p can be used to turn the
default phrase search into a proximity one
(unordered). Example:"order any in"p
C will turn on case
sensitivity (if the index supports it).
D will turn on diacritics
sensitivity (if the index supports it).
A weight can be specified for a query element
by specifying a decimal value at the start of the
modifiers. Example: "Important"2.5.