conf.py: improve SearchEnglish to handle terms with dots

While search queries already handled words with hyphens correctly, they
did not do so for words with dots.

To fix this, we

- enhance the word tokenizer to treat both dots ('.') and hyphens ('-')
  as valid characters within words.
  (For robustness, explicitly exclude dots/hyphens at the start or end
  of a word from indexing.)
- adjust query processing to avoid splitting on dots in search input

This allows search queries to correctly match terms such as
'local.conf', 'site.conf', and similar ones now.

Fixes: [YOCTO #14534]

(From yocto-docs rev: 80084a4cabdf7f61c7e93eda8ddbd5bc7d54e041)

Signed-off-by: Enrico Jörns <ejo@pengutronix.de>
Signed-off-by: Antonin Godard <antonin.godard@bootlin.com>
Signed-off-by: Richard Purdie <richard.purdie@linuxfoundation.org>
This commit is contained in:
Enrico Jörns 2025-07-03 12:23:01 +02:00 committed by Richard Purdie
parent eab29e5b54
commit b1acce1955

View File

@ -179,13 +179,13 @@ from sphinx.search import SearchEnglish
from sphinx.search import languages
class DashFriendlySearchEnglish(SearchEnglish):
# Accept words that can include hyphens
_word_re = re.compile(r'[\w\-]+')
# Accept words that can include 'inner' hyphens or dots
_word_re = re.compile(r'[\w]+(?:[\.\-][\w]+)*')
js_splitter_code = r"""
function splitQuery(query) {
return query
.split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}-]+/gu)
.split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}\-\.]+/gu)
.filter(term => term.length > 0);
}
"""