When applying a SpaCy pipeline to the input text, the annotations can be filtered by a custom set of rules.
These rules are based on linguistic features, such as:
In order to be part of the output set, an annotation needs to fulfill all filter rules.
Three different rule types are available:
length
may exclude annotations that exceed (max
) or fall below (min
) a certain character length threshold.non-stopwords
may only include annotations with any
or all
word tokens not being stopwords. (require
)deny
causes the service to block annotations without stopwords. (This is not recommended.)
linguistics
may only include annotations with any
or all
word tokens being members of the comma-separated list by a given linguistic feature. (require
)deny
causes the service to block annotations that match the given linguistic features.
In addition to the linguistic ruleset, a lemmatization
step can be enabled to lemmatize the text before entity search. The lemmatization can be enabled by adding an additional rule component of type lemmatize
to the filter rules.
You can add additional pretrained SpaCy pipelines in the settings page. The specific pipeline must be downloadable through:
python3 -m spacy download PIPELINE_NAME
Look for annotations longer than 4 characters
Look for annotations that consist of at least one NOUN
Look for annotations that do not include a single stopword