
After the update to zstd-1.4.10 passing -O3 is no longer necessary to get good performance from zstd. Using the default optimization level -O2 is sufficient to get good performance. I've measured no significant change to compression speed, and a ~1% decompression speed loss, which is acceptable. This fixes the reported parisc -Wframe-larger-than=1536 errors [0]. The gcc-8-hppa-linux-gnu compiler performed very poorly with -O3, generating stacks that are ~3KB. With -O2 these same functions generate stacks in the < 100B, completely fixing the problem. Function size deltas are listed below: ZSTD_compressBlock_fast_extDict_generic: 3800 -> 68 ZSTD_compressBlock_fast: 2216 -> 40 ZSTD_compressBlock_fast_dictMatchState: 1848 -> 64 ZSTD_compressBlock_doubleFast_extDict_generic: 3744 -> 76 ZSTD_fillDoubleHashTable: 3252 -> 0 ZSTD_compressBlock_doubleFast: 5856 -> 36 ZSTD_compressBlock_doubleFast_dictMatchState: 5380 -> 84 ZSTD_copmressBlock_lazy2: 2420 -> 72 Additionally, this improves the reported code bloat [1]. With gcc-11 bloat-o-meter shows an 80KB code size improvement: ``` > ../scripts/bloat-o-meter vmlinux.old vmlinux add/remove: 31/8 grow/shrink: 24/155 up/down: 25734/-107924 (-82190) Total: Before=6418562, After=6336372, chg -1.28% ``` Compared to before the zstd-1.4.10 update we see a total code size regression of 105KB, down from 374KB at v5.16-rc1: ``` > ../scripts/bloat-o-meter vmlinux.old vmlinux add/remove: 292/62 grow/shrink: 56/88 up/down: 235009/-127487 (107522) Total: Before=6228850, After=6336372, chg +1.73% ``` [0] https://lkml.org/lkml/2021/11/15/710 [1] https://lkml.org/lkml/2021/11/14/189 Link: https://lore.kernel.org/r/20211117014949.1169186-4-nickrterrell@gmail.com/ Link: https://lore.kernel.org/r/20211117201459.1194876-4-nickrterrell@gmail.com/ Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Nick Terrell <terrelln@fb.com>
1.4 KiB
SPDX-License-Identifier: GPL-2.0+ OR BSD-3-Clause
Copyright (c) Facebook, Inc.
All rights reserved.
This source code is licensed under both the BSD-style license (found in the
LICENSE file in the root directory of this source tree) and the GPLv2 (found
in the COPYING file in the root directory of this source tree).
You may select, at your option, one of the above-listed licenses.
obj-$(CONFIG_ZSTD_COMPRESS) += zstd_compress.o obj-$(CONFIG_ZSTD_DECOMPRESS) += zstd_decompress.o
zstd_compress-y :=
zstd_compress_module.o
common/debug.o
common/entropy_common.o
common/error_private.o
common/fse_decompress.o
common/zstd_common.o
compress/fse_compress.o
compress/hist.o
compress/huf_compress.o
compress/zstd_compress.o
compress/zstd_compress_literals.o
compress/zstd_compress_sequences.o
compress/zstd_compress_superblock.o
compress/zstd_double_fast.o
compress/zstd_fast.o
compress/zstd_lazy.o
compress/zstd_ldm.o
compress/zstd_opt.o \
zstd_decompress-y :=
zstd_decompress_module.o
common/debug.o
common/entropy_common.o
common/error_private.o
common/fse_decompress.o
common/zstd_common.o
decompress/huf_decompress.o
decompress/zstd_ddict.o
decompress/zstd_decompress.o
decompress/zstd_decompress_block.o \