conf.py: improve SearchEnglish to handle terms with dots

While search queries already handled words with hyphens correctly, they
did not do so for words with dots.

To fix this, we

- enhance the word tokenizer to treat both dots ('.') and hyphens ('-')
  as valid characters within words.
  (For robustness, explicitly exclude dots/hyphens at the start or end
  of a word from indexing.)
- adjust query processing to avoid splitting on dots in search input

This allows search queries to correctly match terms such as
'local.conf', 'site.conf', and similar ones now.

Fixes: [YOCTO #14534]

(From yocto-docs rev: 462059ebbf25ed362a5ace1987f0684444441a7f)

Signed-off-by: Enrico Jörns <ejo@pengutronix.de>
Signed-off-by: Antonin Godard <antonin.godard@bootlin.com>
(cherry picked from commit 80084a4cabdf7f61c7e93eda8ddbd5bc7d54e041)
Signed-off-by: Antonin Godard <antonin.godard@bootlin.com>
Signed-off-by: Steve Sakoman <steve@sakoman.com>
This commit is contained in:
Enrico Jörns 2025-07-03 12:23:01 +02:00 committed by Steve Sakoman
parent db7fcba5dd
commit 2a8a9502f5

View File

@ -179,13 +179,13 @@ from sphinx.search import SearchEnglish
from sphinx.search import languages
class DashFriendlySearchEnglish(SearchEnglish):
# Accept words that can include hyphens
_word_re = re.compile(r'[\w\-]+')
# Accept words that can include 'inner' hyphens or dots
_word_re = re.compile(r'[\w]+(?:[\.\-][\w]+)*')
js_splitter_code = r"""
function splitQuery(query) {
return query
.split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}-]+/gu)
.split(/[^\p{Letter}\p{Number}_\p{Emoji_Presentation}\-\.]+/gu)
.filter(term => term.length > 0);
}
"""