Commit Graph

24476 Commits

Author SHA1 Message Date
Thomas Fourier
39dac98aca scsi: qla4xxx: Fix missing DMA mapping error in qla4xxx_alloc_pdu()
[ Upstream commit 00f452a1b0 ]

dma_map_XXX() can fail and should be tested for errors with
dma_mapping_error().

Fixes: b3a271a94d ("[SCSI] qla4xxx: support iscsiadm session mgmt")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://lore.kernel.org/r/20250618071742.21822-2-fourier.thomas@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-10 16:03:06 +02:00
Thomas Fourier
c3ec87fbb0 scsi: qla2xxx: Fix DMA mapping test in qla24xx_get_port_database()
[ Upstream commit c3b214719a ]

dma_map_XXX() functions return as error values DMA_MAPPING_ERROR which is
often ~0.  The error value should be tested with dma_mapping_error() like
it was done in qla26xx_dport_diagnostics().

Fixes: 818c7f87a1 ("scsi: qla2xxx: Add changes in preparation for vendor extended FDMI/RDP")
Signed-off-by: Thomas Fourier <fourier.thomas@gmail.com>
Link: https://lore.kernel.org/r/20250617161115.39888-2-fourier.thomas@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-07-10 16:03:06 +02:00
Chen Yu
bf2c1643ab scsi: megaraid_sas: Fix invalid node index
commit 752eb816b5 upstream.

On a system with DRAM interleave enabled, out-of-bound access is
detected:

megaraid_sas 0000:3f:00.0: requested/available msix 128/128 poll_queue 0
------------[ cut here ]------------
UBSAN: array-index-out-of-bounds in ./arch/x86/include/asm/topology.h:72:28
index -1 is out of range for type 'cpumask *[1024]'
dump_stack_lvl+0x5d/0x80
ubsan_epilogue+0x5/0x2b
__ubsan_handle_out_of_bounds.cold+0x46/0x4b
megasas_alloc_irq_vectors+0x149/0x190 [megaraid_sas]
megasas_probe_one.cold+0xa4d/0x189c [megaraid_sas]
local_pci_probe+0x42/0x90
pci_device_probe+0xdc/0x290
really_probe+0xdb/0x340
__driver_probe_device+0x78/0x110
driver_probe_device+0x1f/0xa0
__driver_attach+0xba/0x1c0
bus_for_each_dev+0x8b/0xe0
bus_add_driver+0x142/0x220
driver_register+0x72/0xd0
megasas_init+0xdf/0xff0 [megaraid_sas]
do_one_initcall+0x57/0x310
do_init_module+0x90/0x250
init_module_from_file+0x85/0xc0
idempotent_init_module+0x114/0x310
__x64_sys_finit_module+0x65/0xc0
do_syscall_64+0x82/0x170
entry_SYSCALL_64_after_hwframe+0x76/0x7e

Fix it accordingly.

Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Link: https://lore.kernel.org/r/20250604042556.3731059-1-yu.c.chen@intel.com
Fixes: 8049da6f39 ("scsi: megaraid_sas: Use irq_set_affinity_and_hint()")
Cc: stable@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-07-06 11:00:14 +02:00
Vitaliy Shevtsov
a77d0a14ed scsi: elx: efct: Fix memory leak in efct_hw_parse_filter()
[ Upstream commit 2a8a5a5dd0 ]

strsep() modifies the address of the pointer passed to it so that it no
longer points to the original address. This means kfree() gets the wrong
pointer.

Fix this by passing unmodified pointer returned from kstrdup() to
kfree().

Found by Linux Verification Center (linuxtesting.org) with Svace.

Fixes: 4df84e8466 ("scsi: elx: efct: Driver initialization routines")
Signed-off-by: Vitaliy Shevtsov <v.shevtsov@mt-integration.ru>
Link: https://lore.kernel.org/r/20250612163616.24298-1-v.shevtsov@mt-integration.ru
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27 11:09:04 +01:00
Dexuan Cui
8d60df50e6 scsi: storvsc: Increase the timeouts to storvsc_timeout
commit b2f966568f upstream.

Currently storvsc_timeout is only used in storvsc_sdev_configure(), and
5s and 10s are used elsewhere. It turns out that rarely the 5s is not
enough on Azure, so let's use storvsc_timeout everywhere.

In case a timeout happens and storvsc_channel_init() returns an error,
close the VMBus channel so that any host-to-guest messages in the
channel's ringbuffer, which might come late, can be safely ignored.

Add a "const" to storvsc_timeout.

Cc: stable@kernel.org
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Link: https://lore.kernel.org/r/1749243459-10419-1-git-send-email-decui@microsoft.com
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27 11:08:59 +01:00
Daniel Wagner
34c0a67055 scsi: lpfc: Use memcpy() for BIOS version
[ Upstream commit ae82eaf4ae ]

The strlcat() with FORTIFY support is triggering a panic because it
thinks the target buffer will overflow although the correct target
buffer size is passed in.

Anyway, instead of memset() with 0 followed by a strlcat(), just use
memcpy() and ensure that the resulting buffer is NULL terminated.

BIOSVersion is only used for the lpfc_printf_log() which expects a
properly terminated string.

Signed-off-by: Daniel Wagner <wagi@kernel.org>
Link: https://lore.kernel.org/r/20250409-fix-lpfc-bios-str-v1-1-05dac9e51e13@kernel.org
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27 11:08:56 +01:00
Justin Tee
32f25633f3 scsi: lpfc: Fix lpfc_check_sli_ndlp() handling for GEN_REQUEST64 commands
[ Upstream commit 05ae6c9c73 ]

In lpfc_check_sli_ndlp(), the get_job_els_rsp64_did remote_id assignment
does not apply for GEN_REQUEST64 commands as it only has meaning for a
ELS_REQUEST64 command.  So, if (iocb->ndlp == ndlp) is false, we could
erroneously return the wrong value.  Fix by replacing the fallthrough
statement with a break statement before the remote_id check.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250425194806.3585-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-27 11:08:55 +01:00
Alok Tiwari
d4cbcf274c scsi: iscsi: Fix incorrect error path labels for flashnode operations
[ Upstream commit 9b17621366 ]

Correct the error handling goto labels used when host lookup fails in
various flashnode-related event handlers:

 - iscsi_new_flashnode()
 - iscsi_del_flashnode()
 - iscsi_login_flashnode()
 - iscsi_logout_flashnode()
 - iscsi_logout_flashnode_sid()

scsi_host_put() is not required when shost is NULL, so jumping to the
correct label avoids unnecessary operations. These functions previously
jumped to the wrong goto label (put_host), which did not match the
intended cleanup logic.

Use the correct exit labels (exit_new_fnode, exit_del_fnode, etc.) to
ensure proper error handling.  Also remove the unused put_host label
under iscsi_new_flashnode() as it is no longer needed.

No functional changes beyond accurate error path correction.

Fixes: c6a4bb2ef5 ("[SCSI] scsi_transport_iscsi: Add flash node mgmt support")
Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com>
Link: https://lore.kernel.org/r/20250530193012.3312911-1-alok.a.tiwari@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19 15:28:40 +02:00
Yihang Li
624b4cf6c4 scsi: hisi_sas: Call I_T_nexus after soft reset for SATA disk
[ Upstream commit e4d953ca55 ]

In commit 21c7e97247 ("scsi: hisi_sas: Disable SATA disk phy for severe
I_T nexus reset failure"), if the softreset fails upon certain
conditions, the PHY connected to the disk is disabled directly. Manual
recovery is required, which is inconvenient for users in actual use.

In addition, SATA disks do not support simultaneous connection of multiple
hosts. Therefore, when multiple controllers are connected to a SATA disk
at the same time, the controller which is connected later failed to issue
an ATA softreset to the SATA disk. As a result, the PHY associated with
the disk is disabled and cannot be automatically recovered.

Now that, we will not focus on the execution result of softreset. No
matter whether the execution is successful or not, we will directly carry
out I_T_nexus_reset.

Fixes: 21c7e97247 ("scsi: hisi_sas: Disable SATA disk phy for severe I_T nexus reset failure")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20250414080845.1220997-4-liyihang9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19 15:28:12 +02:00
Kees Cook
ee96502062 scsi: qedf: Use designated initializer for struct qed_fcoe_cb_ops
[ Upstream commit d8720235d5 ]

Recent fixes to the randstruct GCC plugin allowed it to notice
that this structure is entirely function pointers and is therefore
subject to randomization, but doing so requires that it always use
designated initializers. Explicitly specify the "common" member as being
initialized. Silences:

drivers/scsi/qedf/qedf_main.c:702:9: error: positional initialization of field in 'struct' declared with 'designated_init' attribute [-Werror=designated-init]
  702 |         {
      |         ^

Fixes: 035f7f87b7 ("randstruct: Enable Clang support")
Link: https://lore.kernel.org/r/20250502224156.work.617-kees@kernel.org
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-19 15:28:08 +02:00
Kai Mäkisara
32bcf54138 scsi: st: Restore some drive settings after reset
[ Upstream commit 7081dc75df ]

Some of the allowed operations put the tape into a known position to
continue operation assuming only the tape position has changed.  But reset
sets partition, density and block size to drive default values. These
should be restored to the values before reset.

Normally the current block size and density are stored by the drive.  If
the settings have been changed, the changed values have to be saved by the
driver across reset.

Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Link: https://lore.kernel.org/r/20250120194925.44432-2-Kai.Makisara@kolumbus.fi
Reviewed-by: John Meneghini <jmeneghi@redhat.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04 14:42:13 +02:00
Justin Tee
1960bb56a9 scsi: lpfc: Free phba irq in lpfc_sli4_enable_msi() when pci_irq_vector() fails
[ Upstream commit f0842902b3 ]

Fix smatch warning regarding missed calls to free_irq().  Free the phba IRQ
in the failed pci_irq_vector cases.

lpfc_init.c: lpfc_sli4_enable_msi() warn: 'phba->pcidev->irq' from
             request_irq() not released.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250131000524.163662-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04 14:42:13 +02:00
Justin Tee
3dfeee957a scsi: lpfc: Handle duplicate D_IDs in ndlp search-by D_ID routine
[ Upstream commit 56c3d809b7 ]

After a port swap between separate fabrics, there may be multiple nodes in
the vport's fc_nodes list with the same fabric well known address.
Duplication is temporary and eventually resolves itself after dev_loss_tmo
expires, but nameserver queries may still occur before dev_loss_tmo.  This
possibly results in returning stale fabric ndlp objects.  Fix by adding an
nlp_state check to ensure the ndlp search routine returns the correct newer
allocated ndlp fabric object.

Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20250131000524.163662-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04 14:42:13 +02:00
Shivasharan S
f38a1b35c8 scsi: mpt3sas: Send a diag reset if target reset fails
[ Upstream commit 5612d6d51e ]

When an IOCTL times out and driver issues a target reset, if firmware
fails the task management elevate the recovery by issuing a diag reset to
controller.

Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Link: https://lore.kernel.org/r/1739410016-27503-5-git-send-email-shivasharan.srikanteshwara@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04 14:42:06 +02:00
Kai Mäkisara
9f51fa1971 scsi: st: ERASE does not change tape location
[ Upstream commit ad77cebf97 ]

The SCSI ERASE command erases from the current position onwards.  Don't
clear the position variables.

Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Link: https://lore.kernel.org/r/20250311112516.5548-3-Kai.Makisara@kolumbus.fi
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04 14:41:58 +02:00
Kai Mäkisara
01195aa1d6 scsi: st: Tighten the page format heuristics with MODE SELECT
[ Upstream commit 8db816c6f1 ]

In the days when SCSI-2 was emerging, some drives did claim SCSI-2 but did
not correctly implement it. The st driver first tries MODE SELECT with the
page format bit set to set the block descriptor.  If not successful, the
non-page format is tried.

The test only tests the sense code and this triggers also from illegal
parameter in the parameter list. The test is limited to "old" devices and
made more strict to remove false alarms.

Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Link: https://lore.kernel.org/r/20250311112516.5548-4-Kai.Makisara@kolumbus.fi
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04 14:41:58 +02:00
Ranjan Kumar
c36f5f659a scsi: mpi3mr: Add level check to control event logging
[ Upstream commit b0b7ee3b57 ]

Ensure event logs are only generated when the debug logging level
MPI3_DEBUG_EVENT is enabled. This prevents unnecessary logging.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20250415101546.204018-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-06-04 14:41:53 +02:00
Steve Siwinski
c682a19344 scsi: sd_zbc: block: Respect bio vector limits for REPORT ZONES buffer
commit e8007fad54 upstream.

The REPORT ZONES buffer size is currently limited by the HBA's maximum
segment count to ensure the buffer can be mapped. However, the block
layer further limits the number of iovec entries to 1024 when allocating
a bio.

To avoid allocation of buffers too large to be mapped, further restrict
the maximum buffer size to BIO_MAX_INLINE_VECS.

Replace the UIO_MAXIOV symbolic name with the more contextually
appropriate BIO_MAX_INLINE_VECS.

Fixes: b091ac6168 ("sd_zbc: Fix report zones buffer allocation")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Siwinski <ssiwinski@atto.com>
Link: https://lore.kernel.org/r/20250508200122.243129-1-ssiwinski@atto.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22 14:12:22 +02:00
Michael Kelley
c0f3f0c88f Drivers: hv: Allow vmbus_sendpacket_mpb_desc() to create multiple ranges
commit 380b75d307 upstream.

vmbus_sendpacket_mpb_desc() is currently used only by the storvsc driver
and is hardcoded to create a single GPA range. To allow it to also be
used by the netvsc driver to create multiple GPA ranges, no longer
hardcode as having a single GPA range. Allow the calling driver to
specify the rangecount in the supplied descriptor.

Update the storvsc driver to reflect this new approach.

Cc: <stable@vger.kernel.org> # 6.1.x
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://patch.msgid.link/20250513000604.1396-2-mhklinux@outlook.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-22 14:12:21 +02:00
Igor Pylypiv
a862d24e1f scsi: pm80xx: Set phy_attached to zero when device is gone
[ Upstream commit f7b705c238 ]

When a fatal error occurs, a phy down event may not be received to set
phy->phy_attached to zero.

Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Signed-off-by: Salomon Dushimirimana <salomondush@google.com>
Link: https://lore.kernel.org/r/20250319230305.3172920-1-salomondush@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02 07:51:01 +02:00
Xingui Yang
fa99f1886e scsi: hisi_sas: Fix I/O errors caused by hardware port ID changes
[ Upstream commit daff37f00c ]

The hw port ID of phy may change when inserting disks in batches, causing
the port ID in hisi_sas_port and itct to be inconsistent with the hardware,
resulting in I/O errors. The solution is to set the device state to gone to
intercept I/O sent to the device, and then execute linkreset to discard and
find the disk to re-update its information.

Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Link: https://lore.kernel.org/r/20250312095135.3048379-3-yangxingui@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02 07:51:00 +02:00
Damien Le Moal
164bc7e26d scsi: Improve CDL control
commit 14a3cc7558 upstream.

With ATA devices supporting the CDL feature, using CDL requires that the
feature be enabled with a SET FEATURES command. This command is issued
as the translated command for the MODE SELECT command issued by
scsi_cdl_enable() when the user enables CDL through the device
cdl_enable sysfs attribute.

However, the implementation of scsi_cdl_enable() always issues a MODE
SELECT command for ATA devices when the enable argument is true, even if
CDL is already enabled on the device. While this does not cause any
issue with using CDL descriptors with read/write commands (the CDL
feature will be enabled on the drive), issuing the MODE SELECT command
even when the device CDL feature is already enabled will cause a reset
of the ATA device CDL statistics log page (as defined in ACS, any CDL
enable action must reset the device statistics).

Avoid this needless actions (and the implied statistics log page reset)
by modifying scsi_cdl_enable() to issue the MODE SELECT command to
enable CDL if and only if CDL is not reported as already enabled on the
device.

And while at it, simplify the initialization of the is_ata boolean
variable and move the declaration of the scsi mode data and sense header
variables to within the scope of ATA device handling.

Fixes: 1b22cfb141 ("scsi: core: Allow enabling and disabling command duration limits")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Igor Pylypiv <ipylypiv@google.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02 07:50:48 +02:00
Anastasia Kovaleva
4d6919dd7c scsi: core: Clear flags for scsi_cmnd that did not complete
[ Upstream commit 54bebe4687 ]

Commands that have not been completed with scsi_done() do not clear the
SCMD_INITIALIZED flag and therefore will not be properly reinitialized.
Thus, the next time the scsi_cmnd structure is used, the command may
fail in scsi_cmd_runtime_exceeded() due to the old jiffies_at_alloc
value:

  kernel: sd 16:0:1:84: [sdts] tag#405 timing out command, waited 720s
  kernel: sd 16:0:1:84: [sdts] tag#405 FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK cmd_age=66636s

Clear flags for commands that have not been completed by SCSI.

Fixes: 4abafdc436 ("block: remove the initialize_rq_fn blk_mq_ops method")
Signed-off-by: Anastasia Kovaleva <a.kovaleva@yadro.com>
Link: https://lore.kernel.org/r/20250324084933.15932-2-a.kovaleva@yadro.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-05-02 07:50:43 +02:00
Chandrakanth Patil
92d8a4e621 scsi: megaraid_sas: Block zero-length ATA VPD inquiry
commit aad9945623 upstream.

A firmware bug was observed where ATA VPD inquiry commands with a
zero-length data payload were not handled and failed with a non-standard
status code of 0xf0.

Avoid sending ATA VPD inquiry commands without data payload by setting
the device no_vpd_size flag to 1. In addition, if the firmware returns a
status code of 0xf0, set scsi_cmnd->result to CHECK_CONDITION to
facilitate proper error handling.

Suggested-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20250402193735.5098-1-chandrakanth.patil@broadcom.com
Tested-by: Ryan Lahfa <ryan@lahfa.xyz>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-04-25 10:45:51 +02:00
Miaoqian Lin
b0348f3394 scsi: iscsi: Fix missing scsi_host_put() in error path
[ Upstream commit 72eea84a10 ]

Add goto to ensure scsi_host_put() is called in all error paths of
iscsi_set_host_param() function. This fixes a potential memory leak when
strlen() check fails.

Fixes: ce51c81700 ("scsi: iscsi: Add strlen() check in iscsi_if_set{_host}_param()")
Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Link: https://lore.kernel.org/r/20250318094344.91776-1-linmq006@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25 10:45:38 +02:00
Xingui Yang
a70ea92964 scsi: hisi_sas: Enable force phy when SATA disk directly connected
[ Upstream commit 8aa580cd92 ]

when a SATA disk is directly connected the SAS controller determines the
disk to which I/Os are delivered based on the port ID in the DQ entry.

When many phys are disconnected and reconnect, the port ID of phys were
changed and used by other link, resulting in I/O being sent to incorrect
disk. Data inconsistency on the SATA disk may occur during I/O retries
using the old port ID. So enable force phy, then force the command to be
executed in a certain phy, and if the actual phy ID of the port does not
match the phy configured in the command, the chip will stop delivering the
I/O to disk.

Fixes: ce60689e12 ("scsi: hisi_sas: add v3 code to send ATA frame")
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Link: https://lore.kernel.org/r/20250312095135.3048379-2-yangxingui@huawei.com
Reviewed-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25 10:45:38 +02:00
Kai Mäkisara
e4d1ca0a84 scsi: st: Fix array overflow in st_setup()
[ Upstream commit a018d1cf99 ]

Change the array size to follow parms size instead of a fixed value.

Reported-by: Chenyuan Yang <chenyuan0y@gmail.com>
Closes: https://lore.kernel.org/linux-scsi/CALGdzuoubbra4xKOJcsyThdk5Y1BrAmZs==wbqjbkAgmKS39Aw@mail.gmail.com/
Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Link: https://lore.kernel.org/r/20250311112516.5548-2-Kai.Makisara@kolumbus.fi
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-25 10:45:16 +02:00
Steven Rostedt (Google)
f568fbe8c6 tracing: Allow creating instances with specified system events
[ Upstream commit d23569979c ]

A trace instance may only need to enable specific events. As the eventfs
directory of an instance currently creates all events which adds overhead,
allow internal instances to be created with just the events in systems
that they care about. This currently only deals with systems and not
individual events, but this should bring down the overhead of creating
instances for specific use cases quite bit.

The trace_array_get_by_name() now has another parameter "systems". This
parameter is a const string pointer of a comma/space separated list of
event systems that should be created by the trace_array. (Note if the
trace_array already exists, this parameter is ignored).

The list of systems is saved and if a module is loaded, its events will
not be added unless the system for those events also match the systems
string.

Link: https://lore.kernel.org/linux-trace-kernel/20231213093701.03fddec0@gandalf.local.home

Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Arun Easi   <aeasi@marvell.com>
Cc: Daniel Wagner <dwagner@suse.de>
Tested-by: Dmytro Maluka <dmaluka@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Stable-dep-of: 0b4ffbe488 ("tracing: Correct the refcount if the hist/hist_debug file fails to open")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-04-10 14:37:41 +02:00
Magnus Lindholm
ea371d1cde scsi: qla1280: Fix kernel oops when debug level > 2
[ Upstream commit 5233e3235d ]

A null dereference or oops exception will eventually occur when qla1280.c
driver is compiled with DEBUG_QLA1280 enabled and ql_debug_level > 2.  I
think its clear from the code that the intention here is sg_dma_len(s) not
length of sg_next(s) when printing the debug info.

Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Link: https://lore.kernel.org/r/20250125095033.26188-1-linmag7@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22 12:50:40 -07:00
Rik van Riel
e9d4044f4b scsi: core: Use GFP_NOIO to avoid circular locking dependency
[ Upstream commit 5363ee9d11 ]

Filesystems can write to disk from page reclaim with __GFP_FS
set. Marc found a case where scsi_realloc_sdev_budget_map() ends up in
page reclaim with GFP_KERNEL, where it could try to take filesystem
locks again, leading to a deadlock.

WARNING: possible circular locking dependency detected
6.13.0 #1 Not tainted
------------------------------------------------------
kswapd0/70 is trying to acquire lock:
ffff8881025d5d78 (&q->q_usage_counter(io)){++++}-{0:0}, at: blk_mq_submit_bio+0x461/0x6e0

but task is already holding lock:
ffffffff81ef5f40 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0x9f/0x760

The full lockdep splat can be found in Marc's report:

https://lkml.org/lkml/2025/1/24/1101

Avoid the potential deadlock by doing the allocation with GFP_NOIO, which
prevents both filesystem and block layer recursion.

Reported-by: Marc Aurèle La France <tsi@tuyoix.net>
Signed-off-by: Rik van Riel <riel@surriel.com>
Link: https://lore.kernel.org/r/20250129104525.0ae8421e@fangorn
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-22 12:50:40 -07:00
Ye Bin
06518de45e scsi: core: Clear driver private data when retrying request
[ Upstream commit dce5c4afd0 ]

After commit 1bad6c4a57 ("scsi: zero per-cmd private driver data for each
MQ I/O"), the xen-scsifront/virtio_scsi/snic drivers all removed code that
explicitly zeroed driver-private command data.

In combination with commit 464a00c9e0 ("scsi: core: Kill DRIVER_SENSE"),
after virtio_scsi performs a capacity expansion, the first request will
return a unit attention to indicate that the capacity has changed. And then
the original command is retried. As driver-private command data was not
cleared, the request would return UA again and eventually time out and fail.

Zero driver-private command data when a request is retried.

Fixes: f7de50da14 ("scsi: xen-scsifront: Remove code that zeroes driver-private command data")
Fixes: c2bb87318b ("scsi: virtio_scsi: Remove code that zeroes driver-private command data")
Fixes: c3006a9264 ("scsi: snic: Remove code that zeroes driver-private command data")
Signed-off-by: Ye Bin <yebin10@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20250217021628.2929248-1-yebin@huaweicloud.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-03-07 16:45:37 +01:00
Igor Pylypiv
3cfce644d8 scsi: core: Do not retry I/Os during depopulation
[ Upstream commit 9ff7c383b8 ]

Fail I/Os instead of retry to prevent user space processes from being
blocked on the I/O completion for several minutes.

Retrying I/Os during "depopulation in progress" or "depopulation restore in
progress" results in a continuous retry loop until the depopulation
completes or until the I/O retry loop is aborted due to a timeout by the
scsi_cmd_runtime_exceeced().

Depopulation is slow and can take 24+ hours to complete on 20+ TB HDDs.
Most I/Os in the depopulation retry loop end up taking several minutes
before returning the failure to user space.

Cc: stable@vger.kernel.org # 4.18.x: 2bbeb8d scsi: core: Handle depopulation and restoration in progress
Cc: stable@vger.kernel.org # 4.18.x
Fixes: e37c7d9a03 ("scsi: core: sanitize++ in progress")
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20250131184408.859579-1-ipylypiv@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27 04:10:46 -08:00
Douglas Gilbert
7f818ac0ac scsi: core: Handle depopulation and restoration in progress
[ Upstream commit 2bbeb8d124 ]

The default handling of the NOT READY sense key is to wait for the device
to become ready. The "wait" is assumed to be relatively short. However
there is a sub-class of NOT READY that have the "... in progress" phrase in
their additional sense code and these can take much longer.  Following on
from commit 505aa4b6a8 ("scsi: sd: Defer spinning up drive while SANITIZE
is in progress") we now have element depopulation and restoration that can
take a long time.  For example, over 24 hours for a 20 TB, 7200 rpm hard
disk to depopulate 1 of its 20 elements.

Add handling of ASC/ASCQ: 0x4,0x24 (depopulation in progress)
and ASC/ASCQ: 0x4,0x25 (depopulation restoration in progress)
to sd.c . The scsi_lib.c has incomplete handling of these
two messages, so complete it.

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20231015050650.131145-1-dgilbert@interlog.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: 9ff7c383b8 ("scsi: core: Do not retry I/Os during depopulation")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-27 04:10:46 -08:00
Long Li
7df68980e8 scsi: storvsc: Set correct data length for sending SCSI command without payload
commit 87c4b5e8a6 upstream.

In StorVSC, payload->range.len is used to indicate if this SCSI command
carries payload. This data is allocated as part of the private driver data
by the upper layer and may get passed to lower driver uninitialized.

For example, the SCSI error handling mid layer may send TEST_UNIT_READY or
REQUEST_SENSE while reusing the buffer from a failed command. The private
data section may have stale data from the previous command.

If the SCSI command doesn't carry payload, the driver may use this value as
is for communicating with host, resulting in possible corruption.

Fix this by always initializing this value.

Fixes: be0cf6ca30 ("scsi: storvsc: Set the tablesize based on the information given by the host")
Cc: stable@kernel.org
Tested-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Roman Kisel <romank@linux.microsoft.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Long Li <longli@microsoft.com>
Link: https://lore.kernel.org/r/1737601642-7759-1-git-send-email-longli@linuxonhyperv.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17 09:40:27 +01:00
Quinn Tran
217230bc87 scsi: qla2xxx: Move FCE Trace buffer allocation to user control
commit 841df27d61 upstream.

Currently FCE Tracing is enabled to log additional ELS events. Instead,
user will enable or disable this feature through debugfs.

Modify existing DFS knob to allow user to enable or disable this
feature.

echo [1 | 0] > /sys/kernel/debug/qla2xxx/qla2xxx_??/fce
cat  /sys/kernel/debug/qla2xxx/qla2xxx_??/fce

Cc: stable@vger.kernel.org
Fixes: df613b9607 ("[SCSI] qla2xxx: Add Fibre Channel Event (FCE) tracing support.")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20241115130313.46826-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17 09:40:26 +01:00
Kai Mäkisara
7bfa83ee25 scsi: st: Don't set pos_unknown just after device recognition
commit 98b37881b7 upstream.

Commit 9604eea5bd ("scsi: st: Add third party poweron reset handling") in
v6.6 added new code to handle the Power On/Reset Unit Attention (POR UA)
sense data. This was in addition to the existing method. When this Unit
Attention is received, the driver blocks attempts to read, write and some
other operations because the reset may have rewinded the tape. Because of
the added code, also the initial POR UA resulted in blocking operations,
including those that are used to set the driver options after the device is
recognized. Also, reading and writing are refused, whereas they succeeded
before this commit.

Add code to not set pos_unknown to block operations if the POR UA is
received from the first test_ready() call after the st device has been
created. This restores the behavior before v6.6.

Signed-off-by: Kai Mäkisara <Kai.Makisara@kolumbus.fi>
Link: https://lore.kernel.org/r/20241216113755.30415-1-Kai.Makisara@kolumbus.fi
Fixes: 9604eea5bd ("scsi: st: Add third party poweron reset handling")
CC: stable@vger.kernel.org
Closes: https://lore.kernel.org/linux-scsi/2201CF73-4795-4D3B-9A79-6EE5215CF58D@kolumbus.fi/
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-17 09:40:26 +01:00
Paul Menzel
495dcb00d4 scsi: mpt3sas: Set ioc->manu_pg11.EEDPTagMode directly to 1
[ Upstream commit ad7c3c0cb8 ]

Currently, the code does:

    if (x == 0) {
    	x &= ~0x3;
	x |= 0x1;
    }

Zeroing bits 0 and 1 of a variable that is 0 is not necessary. So directly
set the variable to 1.

Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Fixes: f92363d123 ("[SCSI] mpt3sas: add new driver supporting 12GB SAS")
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Link: https://lore.kernel.org/r/20241212221817.78940-2-pmenzel@molgen.mpg.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-08 09:52:26 +01:00
Easwar Hariharan
088bde862f scsi: storvsc: Ratelimit warning logs to prevent VM denial of service
commit d2138eab8c upstream.

If there's a persistent error in the hypervisor, the SCSI warning for
failed I/O can flood the kernel log and max out CPU utilization,
preventing troubleshooting from the VM side. Ratelimit the warning so
it doesn't DoS the VM.

Closes: https://github.com/microsoft/WSL/issues/9173
Signed-off-by: Easwar Hariharan <eahariha@linux.microsoft.com>
Link: https://lore.kernel.org/r/20250107-eahariha-ratelimit-storvsc-v1-1-7fc193d1f2b0@linux.microsoft.com
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-02-01 18:37:55 +01:00
Xiang Zhang
44c485f0fc scsi: iscsi: Fix redundant response for ISCSI_UEVENT_GET_HOST_STATS request
[ Upstream commit 63ca02221c ]

The ISCSI_UEVENT_GET_HOST_STATS request is already handled in
iscsi_get_host_stats(). This fix ensures that redundant responses are
skipped in iscsi_if_rx().

 - On success: send reply and stats from iscsi_get_host_stats()
   within if_recv_msg().

 - On error: fall through.

Signed-off-by: Xiang Zhang <hawkxiang.cpp@gmail.com>
Link: https://lore.kernel.org/r/20250107022432.65390-1-hawkxiang.cpp@gmail.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-02-01 18:37:51 +01:00
Yihang Li
9722973ad0 scsi: hisi_sas: Remove redundant checks for automatic debugfs dump
commit 3f03055047 upstream.

In commit 63f0733d07 ("scsi: hisi_sas: Allocate DFX memory during dump
trigger"), the memory allocation time of the DFX is changed from device
initialization to dump occurs, so .debugfs_itct is not a valid address and
do not need to check.

The parameter hisi_sas_debugfs_enable is enough to check whether automatic
debugfs dump is triggered, so remove redunant checks.

Fixes: 63f0733d07 ("scsi: hisi_sas: Allocate DFX memory during dump trigger")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1705904747-62186-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-01-09 13:32:09 +01:00
Yihang Li
a47f0b0314 scsi: hisi_sas: Fix a deadlock issue related to automatic dump
[ Upstream commit 3c4f53b2c3 ]

If we issue a disabling PHY command, the device attached with it will go
offline, if a 2 bit ECC error occurs at the same time, a hung task may be
found:

[ 4613.652388] INFO: task kworker/u256:0:165233 blocked for more than 120 seconds.
[ 4613.666297] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4613.674809] task:kworker/u256:0  state:D stack:    0 pid:165233 ppid:     2 flags:0x00000208
[ 4613.683959] Workqueue: 0000:74:02.0_disco_q sas_revalidate_domain [libsas]
[ 4613.691518] Call trace:
[ 4613.694678]  __switch_to+0xf8/0x17c
[ 4613.698872]  __schedule+0x660/0xee0
[ 4613.703063]  schedule+0xac/0x240
[ 4613.706994]  schedule_timeout+0x500/0x610
[ 4613.711705]  __down+0x128/0x36c
[ 4613.715548]  down+0x240/0x2d0
[ 4613.719221]  hisi_sas_internal_abort_timeout+0x1bc/0x260 [hisi_sas_main]
[ 4613.726618]  sas_execute_internal_abort+0x144/0x310 [libsas]
[ 4613.732976]  sas_execute_internal_abort_dev+0x44/0x60 [libsas]
[ 4613.739504]  hisi_sas_internal_task_abort_dev.isra.0+0xbc/0x1b0 [hisi_sas_main]
[ 4613.747499]  hisi_sas_dev_gone+0x174/0x250 [hisi_sas_main]
[ 4613.753682]  sas_notify_lldd_dev_gone+0xec/0x2e0 [libsas]
[ 4613.759781]  sas_unregister_common_dev+0x4c/0x7a0 [libsas]
[ 4613.765962]  sas_destruct_devices+0xb8/0x120 [libsas]
[ 4613.771709]  sas_do_revalidate_domain.constprop.0+0x1b8/0x31c [libsas]
[ 4613.778930]  sas_revalidate_domain+0x60/0xa4 [libsas]
[ 4613.784716]  process_one_work+0x248/0x950
[ 4613.789424]  worker_thread+0x318/0x934
[ 4613.793878]  kthread+0x190/0x200
[ 4613.797810]  ret_from_fork+0x10/0x18
[ 4613.802121] INFO: task kworker/u256:4:316722 blocked for more than 120 seconds.
[ 4613.816026] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4613.824538] task:kworker/u256:4  state:D stack:    0 pid:316722 ppid:     2 flags:0x00000208
[ 4613.833670] Workqueue: 0000:74:02.0 hisi_sas_rst_work_handler [hisi_sas_main]
[ 4613.841491] Call trace:
[ 4613.844647]  __switch_to+0xf8/0x17c
[ 4613.848852]  __schedule+0x660/0xee0
[ 4613.853052]  schedule+0xac/0x240
[ 4613.856984]  schedule_timeout+0x500/0x610
[ 4613.861695]  __down+0x128/0x36c
[ 4613.865542]  down+0x240/0x2d0
[ 4613.869216]  hisi_sas_controller_prereset+0x58/0x1fc [hisi_sas_main]
[ 4613.876324]  hisi_sas_rst_work_handler+0x40/0x8c [hisi_sas_main]
[ 4613.883019]  process_one_work+0x248/0x950
[ 4613.887732]  worker_thread+0x318/0x934
[ 4613.892204]  kthread+0x190/0x200
[ 4613.896118]  ret_from_fork+0x10/0x18
[ 4613.900423] INFO: task kworker/u256:1:348985 blocked for more than 121 seconds.
[ 4613.914341] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 4613.922852] task:kworker/u256:1  state:D stack:    0 pid:348985 ppid:     2 flags:0x00000208
[ 4613.931984] Workqueue: 0000:74:02.0_event_q sas_port_event_worker [libsas]
[ 4613.939549] Call trace:
[ 4613.942702]  __switch_to+0xf8/0x17c
[ 4613.946892]  __schedule+0x660/0xee0
[ 4613.951083]  schedule+0xac/0x240
[ 4613.955015]  schedule_timeout+0x500/0x610
[ 4613.959725]  wait_for_common+0x200/0x610
[ 4613.964349]  wait_for_completion+0x3c/0x5c
[ 4613.969146]  flush_workqueue+0x198/0x790
[ 4613.973776]  sas_porte_broadcast_rcvd+0x1e8/0x320 [libsas]
[ 4613.979960]  sas_port_event_worker+0x54/0xa0 [libsas]
[ 4613.985708]  process_one_work+0x248/0x950
[ 4613.990420]  worker_thread+0x318/0x934
[ 4613.994868]  kthread+0x190/0x200
[ 4613.998800]  ret_from_fork+0x10/0x18

This is because when the device goes offline, we obtain the hisi_hba
semaphore and send the ABORT_DEV command to the device. However, the
internal abort timed out due to the 2 bit ECC error and triggers automatic
dump. In addition, since the hisi_hba semaphore has been obtained, the dump
cannot be executed and the controller cannot be reset.

Therefore, the deadlocks occur on the following circular dependencies:
hisi_sas_dev_gone() -> down() -> hisi_sas_internal_task_abort_dev() -> ...
-> hisi_sas_internal_abort_timeout() -> down().

The deadlock is triggered only when the timeout occurs during device goes
offline. To fix this issue, use .rst_ha_timeout to distinguish the scenario
where a device goes offline from other scenarios.

Fixes: 2ff07b5c6f ("scsi: hisi_sas: Directly call register snapshot instead of using workqueue")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1705904747-62186-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09 13:31:53 +01:00
Ranjan Kumar
f5a2042408 scsi: mpi3mr: Start controller indexing from 0
[ Upstream commit 0d32014f1e ]

Instead of displaying the controller index starting from '1' make the
driver display the controller index starting from '0'.

Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20241110194405.10108-4-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09 13:31:51 +01:00
Guixin Liu
d424303d8d scsi: mpi3mr: Use ida to manage mrioc ID
[ Upstream commit 29b75184f7 ]

To ensure that the same ID is not obtained during concurrent execution of
the probe, an ida is used to manage the mrioc's ID.

Signed-off-by: Guixin Liu <kanie@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231229040331.52518-1-kanie@linux.alibaba.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: 0d32014f1e ("scsi: mpi3mr: Start controller indexing from 0")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09 13:31:51 +01:00
Yihang Li
7c8c50c985 scsi: hisi_sas: Create all dump files during debugfs initialization
[ Upstream commit 9f564f15f8 ]

For the current debugfs of hisi_sas, after user triggers dump, the
driver allocate memory space to save the register information and create
debugfs files to display the saved information. In this process, the
debugfs files created after each dump.

Therefore, when the dump is triggered while the driver is unbind, the
following hang occurs:

[67840.853907] Unable to handle kernel NULL pointer dereference at virtual address 00000000000000a0
[67840.862947] Mem abort info:
[67840.865855]   ESR = 0x0000000096000004
[67840.869713]   EC = 0x25: DABT (current EL), IL = 32 bits
[67840.875125]   SET = 0, FnV = 0
[67840.878291]   EA = 0, S1PTW = 0
[67840.881545]   FSC = 0x04: level 0 translation fault
[67840.886528] Data abort info:
[67840.889524]   ISV = 0, ISS = 0x00000004, ISS2 = 0x00000000
[67840.895117]   CM = 0, WnR = 0, TnD = 0, TagAccess = 0
[67840.900284]   GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
[67840.905709] user pgtable: 4k pages, 48-bit VAs, pgdp=0000002803a1f000
[67840.912263] [00000000000000a0] pgd=0000000000000000, p4d=0000000000000000
[67840.919177] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[67840.996435] pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[67841.003628] pc : down_write+0x30/0x98
[67841.007546] lr : start_creating.part.0+0x60/0x198
[67841.012495] sp : ffff8000b979ba20
[67841.016046] x29: ffff8000b979ba20 x28: 0000000000000010 x27: 0000000000024b40
[67841.023412] x26: 0000000000000012 x25: ffff20202b355ae8 x24: ffff20202b35a8c8
[67841.030779] x23: ffffa36877928208 x22: ffffa368b4972240 x21: ffff8000b979bb18
[67841.038147] x20: ffff00281dc1e3c0 x19: fffffffffffffffe x18: 0000000000000020
[67841.045515] x17: 0000000000000000 x16: ffffa368b128a530 x15: ffffffffffffffff
[67841.052888] x14: ffff8000b979bc18 x13: ffffffffffffffff x12: ffff8000b979bb18
[67841.060263] x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa368b1289b18
[67841.067640] x8 : 0000000000000012 x7 : 0000000000000000 x6 : 00000000000003a9
[67841.075014] x5 : 0000000000000000 x4 : ffff002818c5cb00 x3 : 0000000000000001
[67841.082388] x2 : 0000000000000000 x1 : ffff002818c5cb00 x0 : 00000000000000a0
[67841.089759] Call trace:
[67841.092456]  down_write+0x30/0x98
[67841.096017]  start_creating.part.0+0x60/0x198
[67841.100613]  debugfs_create_dir+0x48/0x1f8
[67841.104950]  debugfs_create_files_v3_hw+0x88/0x348 [hisi_sas_v3_hw]
[67841.111447]  debugfs_snapshot_regs_v3_hw+0x708/0x798 [hisi_sas_v3_hw]
[67841.118111]  debugfs_trigger_dump_v3_hw_write+0x9c/0x120 [hisi_sas_v3_hw]
[67841.125115]  full_proxy_write+0x68/0xc8
[67841.129175]  vfs_write+0xd8/0x3f0
[67841.132708]  ksys_write+0x70/0x108
[67841.136317]  __arm64_sys_write+0x24/0x38
[67841.140440]  invoke_syscall+0x50/0x128
[67841.144385]  el0_svc_common.constprop.0+0xc8/0xf0
[67841.149273]  do_el0_svc+0x24/0x38
[67841.152773]  el0_svc+0x38/0xd8
[67841.156009]  el0t_64_sync_handler+0xc0/0xc8
[67841.160361]  el0t_64_sync+0x1a4/0x1a8
[67841.164189] Code: b9000882 d2800002 d2800023 f9800011 (c85ffc05)
[67841.170443] ---[ end trace 0000000000000000 ]---

To fix this issue, create all directories and files during debugfs
initialization. In this way, the driver only needs to allocate memory
space to save information each time the user triggers dumping.

Signed-off-by: Yihang Li <liyihang9@huawei.com>
Link: https://lore.kernel.org/r/20241008021822.2617339-13-liyihang9@huawei.com
Reviewed-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09 13:31:44 +01:00
Yihang Li
0449286798 scsi: hisi_sas: Allocate DFX memory during dump trigger
[ Upstream commit 63f0733d07 ]

Currently, if CONFIG_SCSI_HISI_SAS_DEBUGFS_DEFAULT_ENABLE is enabled, the
memory space used by DFX is allocated during device initialization, which
occupies a large number of memory resources. The memory usage before and
after the driver is loaded is as follows:

Memory usage before the driver is loaded:
$ free -m
         total        used        free      shared  buff/cache   available
Mem:     867352        2578      864037          11         735  861681
Swap:    4095           0        4095

Memory usage after the driver which include 4 HBAs is loaded:
$ insmod hisi_sas_v3_hw.ko
$ free -m
         total        used        free      shared  buff/cache	available
Mem:     867352        4760      861848          11	  743   859495
Swap:    4095           0        4095

The driver with 4 HBAs connected will allocate about 110 MB of memory
without enabling debugfs.

Therefore, to avoid wasting memory resources, DFX memory is allocated
during dump triggering. The dump may fail due to memory allocation
failure. After this change, each dump costs about 10 MB of memory, and each
dump lasts about 100 ms.

Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1694571327-78697-4-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: 9f564f15f8 ("scsi: hisi_sas: Create all dump files during debugfs initialization")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09 13:31:44 +01:00
Yihang Li
91e035e98f scsi: hisi_sas: Directly call register snapshot instead of using workqueue
[ Upstream commit 2ff07b5c6f ]

Currently, register information dump is performed via workqueue, regardless
of the trigger mode (automatic or manual). There is a delay in dumping
register through workqueue, the exact register information at trigger time
cannot be obtained.

Call register snapshot directly instead of through a workqueue.

Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1694571327-78697-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stable-dep-of: 9f564f15f8 ("scsi: hisi_sas: Create all dump files during debugfs initialization")
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-09 13:31:44 +01:00
Cathy Avery
3556af9a68 scsi: storvsc: Do not flag MAINTENANCE_IN return of SRB_STATUS_DATA_OVERRUN as an error
[ Upstream commit b1aee7f034 ]

This partially reverts commit 812fe6420a ("scsi: storvsc: Handle
additional SRB status values").

HyperV does not support MAINTENANCE_IN resulting in FC passthrough
returning the SRB_STATUS_DATA_OVERRUN value. Now that
SRB_STATUS_DATA_OVERRUN is treated as an error, multipath ALUA paths go
into a faulty state as multipath ALUA submits RTPG commands via
MAINTENANCE_IN.

[    3.215560] hv_storvsc 1d69d403-9692-4460-89f9-a8cbcc0f94f3:
tag#230 cmd 0xa3 status: scsi 0x0 srb 0x12 hv 0xc0000001
[    3.215572] scsi 1:0:0:32: alua: rtpg failed, result 458752

Make MAINTENANCE_IN return success to avoid the error path as is
currently done with INQUIRY and MODE_SENSE.

Suggested-by: Michael Kelley <mhklinux@outlook.com>
Signed-off-by: Cathy Avery <cavery@redhat.com>
Link: https://lore.kernel.org/r/20241127181324.3318443-1-cavery@redhat.com
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02 10:32:05 +01:00
Ranjan Kumar
cf4bea16bb scsi: mpt3sas: Diag-Reset when Doorbell-In-Use bit is set during driver load time
[ Upstream commit 3f5eb062e8 ]

Issue a Diag-Reset when the "Doorbell-In-Use" bit is set during the
driver load/initialization.

Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20241110173341.11595-2-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02 10:32:05 +01:00
Tomas Henzl
f50783148e scsi: megaraid_sas: Fix for a potential deadlock
[ Upstream commit 50740f4dc7 ]

This fixes a 'possible circular locking dependency detected' warning
      CPU0                    CPU1
      ----                    ----
 lock(&instance->reset_mutex);
                              lock(&shost->scan_mutex);
                              lock(&instance->reset_mutex);
 lock(&shost->scan_mutex);

Fix this by temporarily releasing the reset_mutex.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Link: https://lore.kernel.org/r/20240923174833.45345-1-thenzl@redhat.com
Acked-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02 10:32:04 +01:00
Magnus Lindholm
fbd7deb459 scsi: qla1280: Fix hw revision numbering for ISP1020/1040
[ Upstream commit c064de86d2 ]

Fix the hardware revision numbering for Qlogic ISP1020/1040 boards.  HWMASK
suggests that the revision number only needs four bits, this is consistent
with how NetBSD does things in their ISP driver. Verified on a IPS1040B
which is seen as rev 5 not as BIT_4.

Signed-off-by: Magnus Lindholm <linmag7@gmail.com>
Link: https://lore.kernel.org/r/20241113225636.2276-1-linmag7@gmail.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2025-01-02 10:32:04 +01:00