To improve performance, we have implemented zero-copy optimization for
neutron NPU, and third-party inference engines such as tflite can use
neutron memory directly, thus avoiding to perform memcpy between neutron
ddr memory and application context as follows:
- Avoid copying input data from application to neutron memory.
- Avoid copying output data from neutron memory back to the application.
This patch enables the memory cache and let the driver maintain the
memory and cache coherency.The main changes are:
- Flush the input buffer cache for device before starting inference.
- Invalidate the output buffer cache for cpu after inference is complete.
- Flush other constant data for device via IOCTL in preparation.
Signed-off-by: Jiwei.Fu <jiwei.fu@nxp.com>
Reviewed-by: Forrest Shi <xuelin.shi@nxp.com>
Acked-by: Jason Liu <jason.hui.liu@nxp.com>
While preparing the firmware, we should not stop neutron as there may
be other tasks running. Otherwise the tasks may fail due to neutron core
has stopped.
Signed-off-by: Jiwei.Fu <jiwei.fu@nxp.com>
Reviewed-by: Iuliana Prodan <iuliana.prodan@nxp.com>
Acked-by: Jason Liu <jason.hui.liu@nxp.com>
error log:
error: variable 'data_ddr' is uninitialized when used here [-Werror,-Wuninitialized]
ret = neutron_rproc_elf_load(rproc, buf->firmware_p, data_ddr, 0x1);
^~~~~~~~
Change-Id: If3852cf0214f594b7fc6bcc147226558ecb75ec0
Signed-off-by: Zhipeng Wang <zhipeng.wang_1@nxp.com>
There is a new neutron fine-tuning workflow. Different model files have
different firmware, which requires the neutron linux driver to support
loading different firmware for different models. This patch adds new
interfaces to support loading firmware during runtime.
This is a temporary workflow, but will be around for a whiles.
Signed-off-by: Jiwei.Fu <jiwei.fu@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
When the Neutron NPU (Neural Processing Unit) is in idle status and not
doing inference tasks, it is desirable for the system to automatically
move NPU into suspend state. This behavior could reduce power consumption
for the whole Soc.
This patch is to update the Neutron driver by integrating autosuspend
and resume capabilities. This update allows the driver to intelligently
manage the power state of the NPU, automatically placing it into a
suspend status when it detects an extended period of inactivity.
Signed-off-by: Jiwei.Fu <jiwei.fu@nxp.com>
Reviewed-by: Forrest Shi <xuelin.shi@nxp.com>
The interval of the linux standard timer is too large to get the
inference results in time. In polling mode, use hrtimer to get the
results as quickly as possible. It will be helpful to measure performance.
Signed-off-by: Jiwei.Fu <jiwei.fu@nxp.com>
Reviewed-by: Forrest Shi <xuelin.shi@nxp.com>
Added inference ioctl interface, currently only supports querying the
inference job status.
Signed-off-by: Jiwei.Fu <jiwei.fu@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
In multi-threading scenario, there may be synchronization issues in reading
and updating queue_count. And the following kernel calltrace is observed
sometimes during intensive testing.
This issue can be fixed by moving reading/updating queue_count into the
spin_lock() context. And atomic_xxx() functions are no longer required in
the spin_lock() context, so queue_count is defined as a normal variable
instead of atomic_t.
[ 1301.081486] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000068
[ 1301.189933] Workqueue: neutron_workqueue inference_done_callback
[ 1301.195938] pstate: 80400009 (Nzcv daif +PAN UAO -TCO -DIT -SSBS BTYPE=-)
[ 1301.202888] pc : neutron_inference_run+0x50/0x2ac
[ 1301.207585] lr : inference_done_callback+0x154/0x218
[ 1301.212534] sp : ffff8000834dbd30
[...]
[ 1301.287069] Call trace:
[ 1301.287071] neutron_inference_run+0x50/0x2ac
[ 1301.287077] inference_done_callback+0x154/0x218
[ 1301.298456] process_one_work+0x138/0x248
[ 1301.298464] worker_thread+0x320/0x438
[ 1301.306196] kthread+0x110/0x114
[ 1301.306201] ret_from_fork+0x10/0x20
Signed-off-by: Jiwei Fu <jiwei.fu@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
There is building warning when using clang compiler as follows:
drivers/staging/neutron/neutron_device.c:308:10:
warning: variable 'ret' is uninitialized when used here [-Wuninitialized]
Signed-off-by: Jiwei Fu <jiwei.fu@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
In some cases, neutron npu may get stuck, although this is rare.
This patch adds function to restart the firmware when it gets stuck.
we will continue to improve the firmware in case the neutron gets stuck and try to recover on the firmware side if it is hung.
As of now, this patch is required in case the firmware is failed to recover on its own.
Signed-off-by: Jiwei Fu <jiwei.fu@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
If there are no inference jobs, the neutron core is not started by default.
Therefore, if the neutron core was stopped before sleeping, the neutron core should not be started in resume() function.
This patch adds the power_state flag to determine whether the neutron core needs to be started in the resume() function.
Signed-off-by: Jiwei Fu <jiwei.fu@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
The rproc reference counter should be released before removing the neutron device,
otherwise neutron-rproc model can't be removed via rmmod when neutron-rproc and neutron driver are built into module.
Signed-off-by: Jiwei Fu <jiwei.fu@nxp.com>
Reviewed-by: Peng Fan <peng.fan@nxp.com>
The user space application uses ioctl interface to send commands to neutron.
Current supported IOCTL command: allocate buffer, load neural network kernel and run inference.
Signed-off-by: Jiwei Fu <jiwei.fu@nxp.com>
Acked-by: Peng Fan <peng.fan@nxp.com>
The inference driver receives requests from user space and send to neutron.
It also uses queue to support multiple inference jobs.
Signed-off-by: Jiwei Fu <jiwei.fu@nxp.com>
Acked-by: Peng Fan <peng.fan@nxp.com>
There are 8 mbox registers for neutron to exchange message between host and neutron,
5 registers are used for sending messages and 3 are used for receiving.
Signed-off-by: Jiwei Fu <jiwei.fu@nxp.com>
Acked-by: Peng Fan <peng.fan@nxp.com>
The neutron hardware should use physical addresses,
Buffer allocation driver is used to allocate dma memory for neutron.
Signed-off-by: Jiwei Fu <jiwei.fu@nxp.com>
Acked-by: Peng Fan <peng.fan@nxp.com>
NXP Neutron Neural Processing Unit (NPU) is a highly scalable accelerator
providing machine learning (ML) acceleration.
The Neutron NPU device consists of following components:
- RISC-V microcontroller: running the firmware.
- DataMover: the DMA engine used to move data between host DDR and TCM on Neutron side.
- Neutron-Core: does the actual MAC compute for the inference job. Not controlled by host, programmed by RISC-V.
- Mbox: mail box registers used to exchange message between host and Neutron.
This patch includes only besic functionality:
- Device registration.
- Neutron firmware boot via remote-rproc inferface.
- Provide open, close, read and other interfaces for user space.
The driver depends on IMX_NEUTRON_REMOTEPROC which is used to load Neutron NPU firmware via
remote-proc framework.
Signed-off-by: Jiwei Fu <jiwei.fu@nxp.com>
Acked-by: Peng Fan <peng.fan@nxp.com>