Module aho_corasick::packed[][src]

A lower level API for packed multiple substring search, principally for a small number of patterns.

This sub-module provides vectorized routines for quickly finding matches of a small number of patterns. In general, users of this crate shouldn’t need to interface with this module directory, as the primary AhoCorasick searcher will use these routines automatically as a prefilter when applicable. However, in some cases, callers may want to bypass the Aho-Corasick machinery entirely and use this vectorized searcher directly.

Overview

The primary types in this sub-module are:

Examples

This example shows how to create a searcher from an iterator of patterns. By default, leftmost-first match semantics are used. (See the top-level MatchKind type for more details about match semantics, which apply similarly to packed substring search.)

use aho_corasick::packed::{MatchKind, Searcher};

let searcher = Searcher::new(["foobar", "foo"].iter().cloned())?;
let matches: Vec<usize> = searcher
    .find_iter("foobar")
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![0], matches);

This example shows how to use Config to change the match semantics to leftmost-longest:

use aho_corasick::packed::{Config, MatchKind};

let searcher = Config::new()
    .match_kind(MatchKind::LeftmostLongest)
    .builder()
    .add("foo")
    .add("foobar")
    .build()?;
let matches: Vec<usize> = searcher
    .find_iter("foobar")
    .map(|mat| mat.pattern())
    .collect();
assert_eq!(vec![1], matches);

Packed substring searching

Packed substring searching refers to the use of SIMD (Single Instruction, Multiple Data) to accelerate the detection of matches in a haystack. Unlike conventional algorithms, such as Aho-Corasick, SIMD algorithms for substring search tend to do better with a small number of patterns, where as Aho-Corasick generally maintains reasonably consistent performance regardless of the number of patterns you give it. Because of this, the vectorized searcher in this sub-module cannot be used as a general purpose searcher, since building the searcher may fail. However, in exchange, when searching for a small number of patterns, searching can be quite a bit faster than Aho-Corasick (sometimes by an order of magnitude).

The key take away here is that constructing a searcher from a list of patterns is a fallible operation. While the precise conditions under which building a searcher can fail is specifically an implementation detail, here are some common reasons:

Structs

Builder

A builder for constructing a packed searcher from a collection of patterns.

Config

The configuration for a packed multiple pattern searcher.

FindIter

An iterator over non-overlapping matches from a packed searcher.

Searcher

A packed searcher for quickly finding occurrences of multiple patterns.

Enums

MatchKind

A knob for controlling the match semantics of a packed multiple string searcher.