linux-yocto

mirror of git://git.yoctoproject.org/linux-yocto.git synced 2025-07-07 22:35:42 +02:00

Author	SHA1	Message	Date
Ingo Molnar	0bffedbce9	Linux 5.7-rc7 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl7K9iEeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGzTAH/0ifZEG4BQ8x/WlB 8YLSLE6QQTSXYi25nyExuJbFkkKY5Tik8M2HD/36xwY/HnZOlH9jH6m0ntqZxpaA 3EU9lr1ct79nCBMYhiJssvz8d9AOZXlyogFW9y2y9pmPjlmUtseZ7yGh1xD465cj B5Ty2w2W34cs7zF3og2xn5agOJMtWWXLXZ5mRa9EOquKC5zeYyRicmd0T+plYQD6 hbRYmxFfDfppVnBCBARPNN0+NU5JJD94H+8bOuf1tl48XNrLiZMOicmtohKNQ6+W rZNpJNEGEp7KMtqWH0Nl3hmy3yfZHMwe1DXM/AZDqR7jTHZY4mZ0GEpLyfI9AU4n 34jVHwU= =SmJ9 -----END PGP SIGNATURE----- Merge tag 'v5.7-rc7' into perf/core, to pick up fixes Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-05-28 07:58:12 +02:00
Ville Syrjälä	802a5820fc	drm/i915: Extract i915_cs_timestamp_{ns_to_ticks,tick_to_ns}() Pull the code to do the CS timestamp ns<->ticks conversion into helpers and use them all over. The check in i915_perf_noa_delay_set() seems a bit dubious, so we switch it to do what I assume it wanted to do all along (ie. make sure the resulting delay in CS timestamp ticks doesn't exceed 32bits)? Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200302143943.32676-5-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2020-05-14 20:04:02 +03:00
Ville Syrjälä	56f1b31f1d	drm/i915: Store CS timestamp frequency in Hz kHz isn't accurate enough for storing the CS timestamp frequency on some of the platforms. Store the value in Hz instead. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200302143943.32676-2-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>	2020-05-14 19:59:53 +03:00
Ville Syrjälä	2e2701582a	drm/i915: Nuke pointless div by 64bit Bunch of places use a 64bit divisor needlessly. Switch to 32bit divisor. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200302143943.32676-1-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>	2020-05-14 19:58:11 +03:00
Chris Wilson	1bc6a60143	drm/i915/execlists: Track inflight CCID The presumption is that by using a circular counter that is twice as large as the maximum ELSP submission, we would never reuse the same CCID for two inflight contexts. However, if we continually preempt an active context such that it always remains inflight, it can be resubmitted with an arbitrary number of paired contexts. As each of its paired contexts will use a new CCID, eventually it will wrap and submit two ELSP with the same CCID. Rather than use a simple circular counter, switch over to a small bitmap of inflight ids so we can avoid reusing one that is still potentially active. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796 Fixes: `2935ed5339` ("drm/i915: Remove logical HW ID") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-2-chris@chris-wilson.co.uk (cherry picked from commit `5c4a53e3b1`) (cherry picked from commit 134711240307d5586ae8e828d2699db70a8b74f2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-05-06 15:37:59 -07:00
Chris Wilson	53b2622e77	drm/i915/execlists: Avoid reusing the same logical CCID The bspec is confusing on the nature of the upper 32bits of the LRC descriptor. Once upon a time, it said that it uses the upper 32b to decide if it should perform a lite-restore, and so we must ensure that each unique context submitted to HW is given a unique CCID [for the duration of it being on the HW]. Currently, this is achieved by using a small circular tag, and assigning every context submitted to HW a new id. However, this tag is being cleared on repinning an inflight context such that we end up re-using the 0 tag for multiple contexts. To avoid accidentally clearing the CCID in the upper 32bits of the LRC descriptor, split the descriptor into two dwords so we can update the GGTT address separately from the CCID. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796 Fixes: `2935ed5339` ("drm/i915: Remove logical HW ID") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk (cherry picked from commit `2632f174a2`) (cherry picked from commit a4b70fcc587860f4b972f68217d8ebebe295ec15) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-05-06 15:37:46 -07:00
Al Viro	598caf1a40	i915: alloc_oa_regs(): get rid of pointless access_ok() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2020-05-01 20:35:35 -04:00
Chris Wilson	5c4a53e3b1	drm/i915/execlists: Track inflight CCID The presumption is that by using a circular counter that is twice as large as the maximum ELSP submission, we would never reuse the same CCID for two inflight contexts. However, if we continually preempt an active context such that it always remains inflight, it can be resubmitted with an arbitrary number of paired contexts. As each of its paired contexts will use a new CCID, eventually it will wrap and submit two ELSP with the same CCID. Rather than use a simple circular counter, switch over to a small bitmap of inflight ids so we can avoid reusing one that is still potentially active. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796 Fixes: `2935ed5339` ("drm/i915: Remove logical HW ID") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-2-chris@chris-wilson.co.uk	2020-04-28 22:17:36 +01:00
Chris Wilson	2632f174a2	drm/i915/execlists: Avoid reusing the same logical CCID The bspec is confusing on the nature of the upper 32bits of the LRC descriptor. Once upon a time, it said that it uses the upper 32b to decide if it should perform a lite-restore, and so we must ensure that each unique context submitted to HW is given a unique CCID [for the duration of it being on the HW]. Currently, this is achieved by using a small circular tag, and assigning every context submitted to HW a new id. However, this tag is being cleared on repinning an inflight context such that we end up re-using the 0 tag for multiple contexts. To avoid accidentally clearing the CCID in the upper 32bits of the LRC descriptor, split the descriptor into two dwords so we can update the GGTT address separately from the CCID. Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/1796 Fixes: `2935ed5339` ("drm/i915: Remove logical HW ID") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: <stable@vger.kernel.org> # v5.5+ Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200428184751.11257-1-chris@chris-wilson.co.uk	2020-04-28 22:17:36 +01:00
Mika Kuoppala	b4892e4404	drm/i915: Make define for lrc state offset More often than not, we need a byte offset into lrc register state from the start of the hw state. Make it so. Signed-off-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200423182355.21837-3-mika.kuoppala@linux.intel.com	2020-04-24 00:52:14 +01:00
Ingo Molnar	87cfeb1920	perf/core fixes and improvements: kernel + tools/perf: Alexey Budankov: - Introduce CAP_PERFMON to kernel and user space. callchains: Adrian Hunter: - Allow using Intel PT to synthesize callchains for regular events. Kan Liang: - Stitch LBR records from multiple samples to get deeper backtraces, there are caveats, see the csets for details. perf script: Andreas Gerstmayr: - Add flamegraph.py script BPF: Jiri Olsa: - Synthesize bpf_trampoline/dispatcher ksymbol events. perf stat: Arnaldo Carvalho de Melo: - Honour --timeout for forked workloads. Stephane Eranian: - Force error in fallback on :k events, to avoid counting nothing when the user asks for kernel events but is not allowed to. perf bench: Ian Rogers: - Add event synthesis benchmark. tools api fs: Stephane Eranian: - Make xxx__mountpoint() more scalable libtraceevent: He Zhe: - Handle return value of asprintf. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCXp2LlQAKCRCyPKLppCJ+ J95oAP0ZihVUhESv/gdeX0IDE5g6Rd2V6LNcRj+jb7gX9NlQkwD/UfS454WV1ftQ qTwrkKPzY/5Tm2cLuVE7r7fJ6naDHgU= =FHm4 -----END PGP SIGNATURE----- Merge tag 'perf-core-for-mingo-5.8-20200420' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core Pull perf/core fixes and improvements from Arnaldo Carvalho de Melo: kernel + tools/perf: Alexey Budankov: - Introduce CAP_PERFMON to kernel and user space. callchains: Adrian Hunter: - Allow using Intel PT to synthesize callchains for regular events. Kan Liang: - Stitch LBR records from multiple samples to get deeper backtraces, there are caveats, see the csets for details. perf script: Andreas Gerstmayr: - Add flamegraph.py script BPF: Jiri Olsa: - Synthesize bpf_trampoline/dispatcher ksymbol events. perf stat: Arnaldo Carvalho de Melo: - Honour --timeout for forked workloads. Stephane Eranian: - Force error in fallback on :k events, to avoid counting nothing when the user asks for kernel events but is not allowed to. perf bench: Ian Rogers: - Add event synthesis benchmark. tools api fs: Stephane Eranian: - Make xxx__mountpoint() more scalable libtraceevent: He Zhe: - Handle return value of asprintf. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>	2020-04-22 14:08:28 +02:00
Alexey Budankov	4e3d3456b7	drm/i915/perf: Open access for CAP_PERFMON privileged process Open access to i915_perf monitoring for CAP_PERFMON privileged process. Providing the access under CAP_PERFMON capability singly, without the rest of CAP_SYS_ADMIN credentials, excludes chances to misuse the credentials and makes operation more secure. CAP_PERFMON implements the principle of least privilege for performance monitoring and observability operations (POSIX IEEE 1003.1e 2.2.2.39 principle of least privilege: A security design principle that states that a process or program be granted only those privileges (e.g., capabilities) necessary to accomplish its legitimate function, and only for the time that such privileges are actually required) For backward compatibility reasons access to i915_events subsystem remains open for CAP_SYS_ADMIN privileged processes but CAP_SYS_ADMIN usage for secure i915_events monitoring is discouraged with respect to CAP_PERFMON capability. Signed-off-by: Alexey Budankov <alexey.budankov@linux.intel.com> Reviewed-by: James Morris <jamorris@linux.microsoft.com> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Igor Lubashev <ilubashe@akamai.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Serge Hallyn <serge@hallyn.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: intel-gfx@lists.freedesktop.org Cc: linux-doc@vger.kernel.org Cc: linux-man@vger.kernel.org Cc: linux-security-module@vger.kernel.org Cc: selinux@vger.kernel.org Link: http://lore.kernel.org/lkml/e3e3292f-f765-ea98-e59c-fbe2db93fd34@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2020-04-16 12:19:08 -03:00
Ashutosh Dixit	bcad588dea	drm/i915/perf: Do not clear pollin for small user read buffers It is wrong to block the user thread in the next poll when OA data is already available which could not fit in the user buffer provided in the previous read. In several cases the exact user buffer size is not known. Blocking user space in poll can lead to data loss when the buffer size used is smaller than the available data. This change fixes this issue and allows user space to read all OA data even when using a buffer size smaller than the available data using multiple non-blocking reads rather than staying blocked in poll till the next timer interrupt. v2: Fix ret value for blocking reads (Umesh) v3: Mistake during patch send (Ashutosh) v4: Remove -EAGAIN from comment (Umesh) v5: Improve condition for clearing pollin and return (Lionel) v6: Improve blocking read loop and other cleanups (Lionel) v7: Added Cc stable Testcase: igt/perf/polling-small-buf Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200403010120.3067-1-ashutosh.dixit@intel.com (cherry-picked from commit `6352219c39`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-04-13 14:09:48 -07:00
Chris Wilson	442dbc5c68	drm/i915: Make exclusive awaits on i915_active optional Later use will require asynchronous waits on the active timelines, but will not utilize an async wait on the exclusive channel. Make the await on the exclusive fence explicit in the selection flags. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200406155840.1728-1-chris@chris-wilson.co.uk	2020-04-06 19:48:05 +01:00
Ashutosh Dixit	6352219c39	drm/i915/perf: Do not clear pollin for small user read buffers It is wrong to block the user thread in the next poll when OA data is already available which could not fit in the user buffer provided in the previous read. In several cases the exact user buffer size is not known. Blocking user space in poll can lead to data loss when the buffer size used is smaller than the available data. This change fixes this issue and allows user space to read all OA data even when using a buffer size smaller than the available data using multiple non-blocking reads rather than staying blocked in poll till the next timer interrupt. v2: Fix ret value for blocking reads (Umesh) v3: Mistake during patch send (Ashutosh) v4: Remove -EAGAIN from comment (Umesh) v5: Improve condition for clearing pollin and return (Lionel) v6: Improve blocking read loop and other cleanups (Lionel) v7: Added Cc stable Testcase: igt/perf/polling-small-buf Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200403010120.3067-1-ashutosh.dixit@intel.com	2020-04-03 18:55:02 +01:00
Lionel Landwerlin	d16e137e7f	drm/i915/perf: don't read head/tail pointers outside critical section Reading or writing those fields should only happen under stream->oa_buffer.ptr_lock. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `d1df41eb72` ("drm/i915/perf: rework aging tail workaround") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200330091411.37357-1-lionel.g.landwerlin@intel.com	2020-03-31 11:47:19 +03:00
Chris Wilson	d7d50f801d	drm/i915/perf: Schedule oa_config after modifying the contexts We wish that the scheduler emit the context modification commands prior to enabling the oa_config, for which we must explicitly inform it of the ordering constraints. This is especially important as we now wait for the final oa_config setup to be completed and as this wait may be on a distinct context to the state modifications, we need that command packet to be always last in the queue. We borrow the i915_active for its ability to track multiple timelines and the last dma_fence on each; a flexible dma_resv. Keeping track of each dma_fence is important for us so that we can efficiently schedule the requests and reprioritise as required. Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200327112212.16046-3-chris@chris-wilson.co.uk	2020-03-30 18:20:34 +01:00
Lionel Landwerlin	4ef10fe05b	drm/i915/perf: add new open param to configure polling of OA buffer This new parameter let's the application choose how often the OA buffer should be checked on the CPU side for data availability. Longer polling period tend to reduce CPU overhead if the application does not care about somewhat real time data collection. v2: Allow disabling polling completely with 0 value (Lionel) v3: Version the new parameter (Joonas) v4: Rebase (Umesh) v5: Make poll delay value of 0 invalid (Umesh) v6: - Describe poll_oa_period (Ashutosh) - Fix comment for new poll parameter (Lionel) - Drop open_flags in read_properties_unlocked (Lionel) - Rename uapi parameter (Ashutosh) v7: Reword the comment in uapi (Ashutosh) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200324185457.14635-4-umesh.nerlige.ramappa@intel.com	2020-03-27 13:10:05 +02:00
Lionel Landwerlin	c51dbc6e8f	drm/i915/perf: move pollin setup to non hw specific code This isn't really gen specific stuff, so just move it to the common code. v2: Rebase (Umesh) v3: Remove comment, pollin is a per stream state already (Ashutosh) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200324185457.14635-3-umesh.nerlige.ramappa@intel.com	2020-03-27 13:10:04 +02:00
Lionel Landwerlin	d1df41eb72	drm/i915/perf: rework aging tail workaround We're about to introduce an options to open the perf stream, giving the user ability to configure how often it wants the kernel to poll the OA registers for available data. Right now the workaround against the OA tail pointer race condition requires at least twice the internal kernel polling timer to make any data available. This changes introduce checks on the OA data written into the circular buffer to make as much data as possible available on the first iteration of the polling timer. v2: Use OA_TAKEN macro without the gtt_offset (Lionel) v3: (Umesh) - Rebase - Change report to report32 from below review https://patchwork.freedesktop.org/patch/330704/?series=66697&rev=1 v4: (Ashutosh, Lionel) - Fix checkpatch errors - Fix aging_timestamp initialization - Check for only one valid landed report - Fix check for unlanded report v5: (Ashutosh) - Fix bug in accurately determining landed report. - Optimize the check for landed reports by going as far as the previously determined aged tail. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200324185457.14635-2-umesh.nerlige.ramappa@intel.com	2020-03-27 13:10:03 +02:00
Umesh Nerlige Ramappa	c06aa1b438	drm/i915/perf: Invalidate OA TLB on when closing perf stream On running several back to back perf capture sessions involving closing and opening the perf stream, invalid OA reports are seen in the beginning of the OA buffer in some sessions. Fix this by invalidating OA TLB when the perf stream is closed or disabled on gen12. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `00a7f0d715` ("drm/i915/tgl: Add perf support on TGL") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309211057.38575-1-umesh.nerlige.ramappa@intel.com (cherry picked from commit `a639b0c150`) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2020-03-20 07:04:44 -07:00
Umesh Nerlige Ramappa	a639b0c150	drm/i915/perf: Invalidate OA TLB on when closing perf stream On running several back to back perf capture sessions involving closing and opening the perf stream, invalid OA reports are seen in the beginning of the OA buffer in some sessions. Fix this by invalidating OA TLB when the perf stream is closed or disabled on gen12. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `00a7f0d715` ("drm/i915/tgl: Add perf support on TGL") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200309211057.38575-1-umesh.nerlige.ramappa@intel.com	2020-03-18 00:30:52 +02:00
Lionel Landwerlin	11ecbdddf2	drm/i915/perf: introduce global sseu pinning On Gen11 powergating half the execution units is a functional requirement when using the VME samplers. Not fullfilling this requirement can lead to hangs. This unfortunately plays fairly poorly with the NOA requirements. NOA requires a stable power configuration to maintain its configuration. As a result using OA (and NOA feeding into it) so far has required us to use a power configuration that can work for all contexts. The only power configuration fullfilling this is powergating half the execution units. This makes performance analysis for 3D workloads somewhat pointless. Failing to find a solution that would work for everybody, this change introduces a new i915-perf stream open parameter that punts the decision off to userspace. If this parameter is omitted, the existing Gen11 behavior remains (half EU array powergating). This change takes the initiative to move all perf related sseu configuration into i915_perf.c v2: Make parameter priviliged if different from default v3: Fix context modifying its sseu config while i915-perf is enabled v4: Always consider global sseu a privileged operation (Tvrtko) Override req_sseu point in intel_sseu_make_rpcs() (Tvrtko) Remove unrelated changes (Tvrtko) v5: Some typos (Tvrtko) Process sseu param in read_properties_unlocked() (Tvrtko) v6: Actually commit the bits from v5... Fixup some checkpath warnings v7: Only compare engine uabi field (Chris) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200317132222.2638719-3-lionel.g.landwerlin@intel.com	2020-03-17 15:27:55 +02:00
Lionel Landwerlin	371aba6e26	drm/i915/perf: remove redundant power configuration register override The caller of i915_oa_init_reg_state() already sets this. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200317132222.2638719-2-lionel.g.landwerlin@intel.com	2020-03-17 15:27:54 +02:00
Lionel Landwerlin	9aba9c188d	drm/i915/perf: remove generated code A little bit of history : Back when i915-perf was introduced (4.13), there was no way to dynamically add new OA configurations to i915. Only the generated configs baked in at build time were allowed. It quickly became obvious that we would need to allow applications to upload their own configurations, for instance to be able to test new ones, and so by the next stable version (4.14) we added uAPIs to allow uploading new configurations. When adding that capability, we took the opportunity to remove most HW configurations except the TestOa one which is a configuration IGT would rely on to verify that the HW is outputting correct values. At the time it made sense to have that confiuration in at the same time a given HW platform added to the i915-perf driver. Now that IGT has become the reference point for HW configurations (see commit 53f8f541ca ("lib: Add i915_perf library"), previously this was located in the GPUTop repository), the need for having those configurations in i915-perf is gone. On the Mesa side, we haven't relied on this test configuration for a while. The MDAPI library always required 4.14 feature level and always loaded its configuration into i915. I'm sure nobody will miss this generated stuff in i915 :) v2: Fix selftests by creating an empty config v3: Fix unlocking on allocation error (Dan Carpenter) v4: Fixup checkpatch warnings v5: Fix incorrect unlock in error path (Umesh) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200317132222.2638719-1-lionel.g.landwerlin@intel.com	2020-03-17 15:27:50 +02:00
Chris Wilson	4b4e973d5e	drm/i915/perf: Reintroduce wait on OA configuration completion We still need to wait for the initial OA configuration to happen before we enable OA report writes to the OA buffer. Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `15d0ace1f8` ("drm/i915/perf: execute OA configuration from command stream") Closes: https://gitlab.freedesktop.org/drm/intel/issues/1356 Testcase: igt/perf/stream-open-close Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200302085812.4172450-7-chris@chris-wilson.co.uk	2020-03-02 20:34:18 +00:00
Chris Wilson	d236e2ac53	drm/i915/perf: Manually acquire engine-wakeref around use of kernel_context The engine->kernel_context is a special case for request emission. Since it is used as the barrier within the engine's wakeref, we must acquire the wakeref before submitting a request to the kernel_context. Reported-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-3-chris@chris-wilson.co.uk	2020-02-28 14:05:34 +00:00
Chris Wilson	a5af081d01	drm/i915/perf: Mark up the racy use of perf->exclusive_stream Inside the general i915_oa_init_reg_state() we avoid using the perf->mutex. However, we rely on perf->exclusive_stream being valid to access at that point, and for that we have to control the race with disabling perf. This relies on the disabling being a heavy barrier that inspects all active contexts, after marking the perf->exclusive_stream as not available. This should ensure that there are no more concurrent accesses to the perf->exclusive_stream as we destroy it. Mark up the races around the perf->exclusive_stream so that they stand out much more. (And hopefully we will be running kcsan to start validating that the only races we have are carefully controlled.) Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200227085723.1961649-2-chris@chris-wilson.co.uk	2020-02-28 14:05:33 +00:00
Wambui Karuga	0bf857358f	drm/i915/perf: conversion to struct drm_device based logging macros. Manual conversion of instances of printk based drm logging macros to the struct drm_device based logging macros in i915/i915_perf.c. Also involves extraction of the struct drm_i915_private device from various intel types for use in the macros. Instances of the DRM_DEBUG printk macro were not converted due to the lack of an analogous struct drm_device based logging macro. v2: remove instances of DRM_DEBUG that were converted. References: https://lists.freedesktop.org/archives/dri-devel/2020-January/253381.html Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200218173936.19664-1-wambui.karugax@gmail.com	2020-02-21 11:20:42 +02:00
Wambui Karuga	00376ccfb2	drm/i915: conversion to drm_device logging macros when drm_i915_private is present. Converts various instances of the printk drm logging macros to the struct drm_device based logging macros in the drm/i915 folder using the following coccinelle script that transforms based on the existence of the struct drm_i915_private device pointer: @@ identifier fn, T; @@ fn(...) { ... struct drm_i915_private T = ...; <+... ( -DRM_INFO( +drm_info(&T->drm, ...) \| -DRM_ERROR( +drm_err(&T->drm, ...) \| -DRM_WARN( +drm_warn(&T->drm, ...) \| -DRM_DEBUG_KMS( +drm_dbg_kms(&T->drm, ...) \| -DRM_DEBUG_DRIVER( +drm_dbg(&T->drm, ...) \| -DRM_DEBUG_ATOMIC( +drm_dbg_atomic(&T->drm, ...) ) ...+> } @@ identifier fn, T; @@ fn(...,struct drm_i915_private T,...) { <+... ( -DRM_INFO( +drm_info(&T->drm, ...) \| -DRM_ERROR( +drm_err(&T->drm, ...) \| -DRM_WARN( +drm_warn(&T->drm, ...) \| -DRM_DEBUG_DRIVER( +drm_dbg(&T->drm, ...) \| -DRM_DEBUG_KMS( +drm_dbg_kms(&T->drm, ...) \| -DRM_DEBUG_ATOMIC( +drm_dbg_atomic(&T->drm, ...) ) ...+> } Checkpatch warnings were fixed manually. Instances of the DRM_DEBUG macro were not converted due to lack of a consensus of an analogous struct drm_device based macro. References: https://lists.freedesktop.org/archives/dri-devel/2020-January/253381.html Signed-off-by: Wambui Karuga <wambui.karugax@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200131093416.28431-2-wambui.karugax@gmail.com	2020-02-04 11:29:27 +02:00
Umesh Nerlige Ramappa	6f280b133d	drm/i915/perf: Fix OA context id overlap with idle context id Engine context pinned in perf OA was set to same context id as the idle context. Set the context id to an unused value. Clear the sw context id field in lrc descriptor before ORing with ce->tag (Chris) Closes: https://gitlab.freedesktop.org/drm/intel/issues/756 Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20200124013701.40609-1-umesh.nerlige.ramappa@intel.com	2020-01-27 21:11:59 +00:00
Pankaj Bharadiya	a9f236d1fc	drm/i915: Make WARN* drm specific where uncore or stream ptr is available drm specific WARN* calls include device information in the backtrace, so we know what device the warnings originate from. Covert all the calls of WARN* with device specific drm_WARN* variants in functions where intel_uncore/i915_perf_stream struct pointer is readily available. The conversion was done automatically with below coccinelle semantic patch. checkpatch errors/warnings are fixed manually. @@ identifier func, T; @@ func(...) { ... struct intel_uncore T = ...; <... ( -WARN( +drm_WARN(&T->i915->drm, ...) \| -WARN_ON( +drm_WARN_ON(&T->i915->drm, ...) \| -WARN_ONCE( +drm_WARN_ONCE(&T->i915->drm, ...) \| -WARN_ON_ONCE( +drm_WARN_ON_ONCE(&T->i915->drm, ...) ) ...> } @@ identifier func, T; @@ func(struct intel_uncore T,...) { <... ( -WARN( +drm_WARN(&T->i915->drm, ...) \| -WARN_ON( +drm_WARN_ON(&T->i915->drm, ...) \| -WARN_ONCE( +drm_WARN_ONCE(&T->i915->drm, ...) \| -WARN_ON_ONCE( +drm_WARN_ON_ONCE(&T->i915->drm, ...) ) ...> } @@ identifier func, T; @@ func(struct i915_perf_stream T,...) { +struct drm_i915_private i915 = T->perf->i915; <+... ( -WARN( +drm_WARN(&i915->drm, ...) \| -WARN_ON( +drm_WARN_ON(&i915->drm, ...) \| -WARN_ONCE( +drm_WARN_ONCE(&i915->drm, ...) \| -WARN_ON_ONCE( +drm_WARN_ON_ONCE(&i915->drm, ...) ) ...+> } command: ls drivers/gpu/drm/i915/*.c \| xargs spatch --sp-file <script> \ --linux-spacing --in-place Signed-off-by: Pankaj Bharadiya <pankaj.laxminarayan.bharadiya@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200115034455.17658-11-pankaj.laxminarayan.bharadiya@intel.com	2020-01-22 17:57:39 +02:00
Chris Wilson	feed5c7be2	drm/i915: Pin the context as we work on it Since we now allow the intel_context_unpin() to run unserialised, we risk our operations under the intel_context_lock_pinned() being run as the context is unpinned (and thus invalidating our state). We can atomically acquire the pin, testing to see if it is pinned in the process, thus ensuring that the state remains consistent during the course of the whole operation. Fixes: `8413502238` ("drm/i915/gt: Drop mutex serialisation between context pin/unpin") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20200109085142.871563-1-chris@chris-wilson.co.uk	2020-01-09 12:50:26 +00:00
Chris Wilson	e6ba764802	drm/i915: Remove i915->kernel_context Allocate only an internal intel_context for the kernel_context, forgoing a global GEM context for internal use as we only require a separate address space (for our own protection). Now having weaned GT from requiring ce->gem_context, we can stop referencing it entirely. This also means we no longer have to create random and unnecessary GEM contexts for internal use. GEM contexts are now entirely for tracking GEM clients, and intel_context the execution environment on the GPU. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Acked-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191221160324.1073045-1-chris@chris-wilson.co.uk	2019-12-21 16:37:10 +00:00
Chris Wilson	9f3ccd40ac	drm/i915: Drop GEM context as a direct link from i915_request Keep the intel_context as being the primary state for i915_request, with the GEM context a backpointer from the low level state for the rarer cases we need client information. Our goal is to remove such references to clients from the backend, and leave the HW submission agnostic to client interfaces and self-contained. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Andi Shyti <andi.shyti@intel.com> Reviewed-by: Andi Shyti <andi.shyti@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191220101230.256839-1-chris@chris-wilson.co.uk	2019-12-20 10:52:21 +00:00
Venkata Sandeep Dhanalakota	3dc716fd3c	drm/i915/perf: Register sysctl path globally We do not require to register the sysctl paths per instance, so making registration global. v2: make sysctl path register and unregister function driver specific (Tvrtko and Lucas). Cc: Sudeep Dutt <sudeep.dutt@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191213155152.69182-1-venkata.s.dhanalakota@intel.com	2019-12-13 20:16:23 +00:00
Umesh Nerlige Ramappa	ccdeed4970	drm/i915/perf: Configure OAR for specific context Gen12 supports saving/restoring render counters per context. Apply OAR configuration only for the context that is passed in to perf. v2: - Fix OACTXCONTROL value to only stop/resume counters. - Remove gen12_update_reg_state_unlocked as power state is already applied by the caller. v3: (Lionel) - Move register initialization into the array - Assume a valid oa_config in enable_metric_set Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: `00a7f0d715` ("drm/i915/tgl: Add perf support on TGL") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191206194339.31356-2-umesh.nerlige.ramappa@intel.com	2019-12-09 15:38:17 +02:00
Umesh Nerlige Ramappa	322d56aa31	drm/i915/perf: Allow non-privileged access when OA buffer is not sampled SAMPLE_OA_REPORT enables sampling of OA reports from the OA buffer. Since reports from OA buffer had system wide visibility, collecting samples from the OA buffer was a privileged operation on previous platforms. Prior to TGL, it was also necessary to sample the OA buffer to normalize reports from MI REPORT PERF COUNT. TGL has a dedicated OAR unit to sample perf reports for a specific render context. This removes the necessity to sample OA buffer. - If not sampling the OA buffer, allow non-privileged access. An earlier patch allows the non-privilege access: https://patchwork.freedesktop.org/patch/337716/?series=68582&rev=1 - Clear up the path for non-privileged access in this patch Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Fixes: `00a7f0d715` ("drm/i915/tgl: Add perf support on TGL") Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191206194339.31356-1-umesh.nerlige.ramappa@intel.com	2019-12-09 15:38:16 +02:00
Mao Wenan	c415ef2a26	drm/i915/perf: drop pointless static qualifier in i915_perf_add_config_ioctl() There is no need to have the 'T *v' variable static since new value always be assigned before use it. Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191204010154.152396-1-maowenan@huawei.com	2019-12-04 15:11:44 +00:00
Chris Wilson	de5825beae	drm/i915: Serialise with engine-pm around requests on the kernel_context As the engine->kernel_context is used within the engine-pm barrier, we have to be careful when emitting requests outside of the barrier, as the strict timeline locking rules do not apply. Instead, we must ensure the engine_park() cannot be entered as we build the request, which is simplest by taking an explicit engine-pm wakeref around the request construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191125105858.1718307-1-chris@chris-wilson.co.uk	2019-11-25 13:17:18 +00:00
Lionel Landwerlin	dd590f6800	drm/i915/perf: Add preemption check while waiting for OA While we're waiting for the OA configuration to apply, let's give a chance to other contexts that might need to run other workloads. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191114140224.21818-1-lionel.g.landwerlin@intel.com	2019-11-15 16:43:33 +00:00
Lionel Landwerlin	93937659dc	drm/i915/perf: don't forget noa wait after oa config I'm observing incoherence metric values, changing from run to run. It appears the patches introducing noa wait & reconfiguration from command stream switched places in the series multiple times during the review. This lead to the dependency of one onto the order to go missing... Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `15d0ace1f8` ("drm/i915/perf: execute OA configuration from command stream") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191113154639.27144-1-lionel.g.landwerlin@intel.com	2019-11-14 07:20:56 +00:00
Lionel Landwerlin	0b0120d4c7	drm/i915/perf: always consider holding preemption a privileged op The ordering of the checks in the existing code can lead to holding preemption not being considered as privileged op. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `9cd20ef780` ("drm/i915/perf: allow holding preemption on filtered ctx") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191111095308.2550-1-lionel.g.landwerlin@intel.com	2019-11-11 12:14:01 +02:00
Chris Wilson	9278bbb6e4	drm/i915/perf: Reverse a ternary to make sparse happy Avoid drivers/gpu/drm/i915/i915_perf.c:2442:85: warning: dubious: x \| !y simply by inverting the predicate and reversing the ternary. v2: Move the long lines into their own function so there is no confusion on operator precedence. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191101192116.12647-1-chris@chris-wilson.co.uk	2019-11-01 19:45:56 +00:00
Lionel Landwerlin	00a7f0d715	drm/i915/tgl: Add perf support on TGL The design of the OA unit has been split into several units. We now have a global unit (OAG) and a render specific unit (OAR). This leads to some changes on how we program things. Some details : OAR: - has its own set of counter registers, they are per-context saved/restored - counters are not written to the circular OA buffer - a snapshot of the counters can be acquired with MI_RECORD_PERF_COUNT, or a single counter can be read with MI_STORE_REGISTER_MEM. OAG: - has global counters that increment across context switches - counters are written into the circular OA buffer (if requested) v2: Fix checkpatch warnings on code style (Lucas) v3: (Umesh) - Update register from which tail, status and head are read - Update logic to sample context reports - Update whitelist mux and b counter regs v4: Fix a bug when updating context image for new contexts (Umesh) v5: Squash patch enabling save/restore of counters into context image We want this so we can preempt performance queries and keep the system responsive even when long running queries are ongoing. We avoid doing it for all contexts. - use LRI to modify context control (Chris) - use MASKED_FIELD to program just the masked bits (Chris) - disable save/restore of counters on cleanup (Chris) v6: Do not use implicit parameters (Chris) BSpec: 28727, 30021 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Acked-by: Chris Wilson <chris.p.wilson@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191025193746.47155-2-umesh.nerlige.ramappa@intel.com	2019-10-29 12:53:54 +02:00
Umesh Nerlige Ramappa	fc21523041	drm/i915/perf: Add helper macros for comparing with whitelisted registers Add helper macros for range and equality comparisons and use them to check with whitelisted registers in oa configurations. Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191025193746.47155-1-umesh.nerlige.ramappa@intel.com	2019-10-29 12:53:54 +02:00
Michal Wajdeczko	19c17b763f	drm/i915/execlists: Use vfunc to check engine submission mode While processing CSB there is no need to look at GuC submission settings, just check if engine is configured for execlists mode. While today GuC submission is disabled it's settings are still based on modparam values that might not correctly reflect actual submission status in case of any fallback. Until that is fully fixed, use alternate method to confirm that engine really runs in execlists mode by comparing set_default_submission vfunc. v2: add other immediate use of new helper Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191028164520.31772-1-michal.wajdeczko@intel.com	2019-10-28 20:40:07 +00:00
Chris Wilson	2871ea85c1	drm/i915/gt: Split intel_ring_submission Split the legacy submission backend from the common CS ring buffer handling. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191024100344.5041-1-chris@chris-wilson.co.uk	2019-10-24 12:14:21 +01:00
Chris Wilson	0587152bf9	drm/i915: Drop assertion that ce->pin_mutex guards state updates The actual conditions are that we know the GPU is not accessing the context, and we hold a pin on the context image to allow CPU access. We used a fake lock on ce->pin_mutex so that we could try and use lockdep to assert that access is serialised, but the various different hardirq/softirq contexts where we need to fake holding the pin_mutex are causing more trouble. Still it would be nice if we did have a way to reassure ourselves that the direct update to the context image is serialised with GPU execution. In the meantime, stop lockdep complaining about false irq inversions. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111923 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191022122845.25038-1-chris@chris-wilson.co.uk	2019-10-22 13:32:01 +01:00
Lionel Landwerlin	8814c6d01f	drm/i915/perf: fix oa config reconfiguration The current logic just reapplies the same configuration already stored into stream->oa_config instead of the newly selected one. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `7831e9a965` ("drm/i915/perf: Allow dynamic reconfiguration of the OA stream") Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191019214647.27866-1-lionel.g.landwerlin@intel.com	2019-10-20 10:01:11 +01:00
Lionel Landwerlin	9cd20ef780	drm/i915/perf: allow holding preemption on filtered ctx We would like to make use of perf in Vulkan. The Vulkan API is much lower level than OpenGL, with applications directly exposed to the concept of command buffers (pretty much equivalent to our batch buffers). In Vulkan, queries are always limited in scope to a command buffer. In OpenGL, the lack of command buffer concept meant that queries' duration could span multiple command buffers. With that restriction gone in Vulkan, we would like to simplify measuring performance just by measuring the deltas between the counter snapshots written by 2 MI_RECORD_PERF_COUNT commands, rather than the more complex scheme we currently have in the GL driver, using 2 MI_RECORD_PERF_COUNT commands and doing some post processing on the stream of OA reports, coming from the global OA buffer, to remove any unrelated deltas in between the 2 MI_RECORD_PERF_COUNT. Disabling preemption only apply to a single context with which want to query performance counters for and is considered a privileged operation, by default protected by CAP_SYS_ADMIN. It is possible to enable it for a normal user by disabling the paranoid stream setting. v2: Store preemption setting in intel_context (Chris) v3: Use priorities to avoid preemption rather than the HW mechanism v4: Just modify the port priority reporting function v5: Add nopreempt flag on gem context and always flag requests appropriately, regarless of OA reconfiguration. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-4-chris@chris-wilson.co.uk	2019-10-14 21:30:28 +01:00
Chris Wilson	7831e9a965	drm/i915/perf: Allow dynamic reconfiguration of the OA stream Introduce a new perf_ioctl command to change the OA configuration of the active stream. This allows the OA stream to be reconfigured between batch buffers, giving greater flexibility in sampling. We inject a request into the OA context to reconfigure the stream asynchronously on the GPU in between and ordered with execbuffer calls. Original patch for dynamic reconfiguration by Lionel Landwerlin. Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-3-chris@chris-wilson.co.uk	2019-10-14 21:30:27 +01:00
Lionel Landwerlin	4f6ccc74a8	drm/i915: add support for perf configuration queries Listing configurations at the moment is supported only through sysfs. This might cause issues for applications wanting to list configurations from a container where sysfs isn't available. This change adds a way to query the number of configurations and their content through the i915 query uAPI. v2: Fix sparse warnings (Lionel) Add support to query configuration using uuid (Lionel) v3: Fix some inconsistency in uapi header (Lionel) Fix unlocking when not locked issue (Lionel) Add debug messages (Lionel) v4: Fix missing unlock (Dan) v5: Drop lock when copying config content to userspace (Chris) v6: Drop lock when copying config list to userspace (Chris) Fix deadlock when calling i915_perf_get_oa_config() under perf.metrics_lock (Lionel) Add i915_oa_config_get() (Chris) Link: https://gitlab.freedesktop.org/mesa/mesa/merge_requests/932 Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-2-chris@chris-wilson.co.uk	2019-10-14 21:30:26 +01:00
Lionel Landwerlin	b8d49f28aa	drm/i915/perf: introduce a versioning of the i915-perf uapi Reporting this version will help application figure out what level of the support the running kernel provides. v2: Add i915_perf_ioctl_version() (Chris) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191014201404.22468-1-chris@chris-wilson.co.uk	2019-10-14 21:30:25 +01:00
Chris Wilson	c2fba936d3	drm/i915/perf: Avoid polluting the i915_oa_config with error pointers Use a local variable to track the allocation errors to avoid polluting the struct and keep the free simple. Reported-by: kbuild test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191013095211.2922-1-chris@chris-wilson.co.uk	2019-10-13 13:17:19 +01:00
Chris Wilson	5f5c382ecf	drm/i915/perf: Prefer using the pinned_ctx for emitting delays on config When we are watching a particular context, we want the OA config to be applied inline with that context such that it takes effect before the next submission. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191012091056.28686-1-chris@chris-wilson.co.uk	2019-10-12 17:04:08 +01:00
Lionel Landwerlin	15d0ace1f8	drm/i915/perf: execute OA configuration from command stream We haven't run into issues with programming the global OA/NOA registers configuration from CPU so far, but HW engineers actually recommend doing this from the command streamer. On TGL in particular one of the clock domain in which some of that programming goes might not be powered when we poke things from the CPU. Since we have a command buffer prepared for the execbuffer side of things, we can reuse that approach here too. This also allows us to significantly reduce the amount of time we hold the main lock. v2: Drop the global lock as much as possible v3: Take global lock to pin global v4: Create i915 request in emit_oa_config() to avoid deadlocks (Lionel) v5: Move locking to the stream (Lionel) v6: Move active reconfiguration request into i915_perf_stream (Lionel) v7: Pin VMA outside request creation (Chris) Lock VMA before move to active (Chris) v8: Fix double free on stream->initial_oa_config_bo (Lionel) Don't allow interruption when waiting on active config request (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-3-chris@chris-wilson.co.uk	2019-10-12 09:08:40 +01:00
Lionel Landwerlin	daed3e4439	drm/i915/perf: implement active wait for noa configurations NOA configuration take some amount of time to apply. That amount of time depends on the size of the GT. There is no documented time for this. For example, past experimentations with powergating configuration changes seem to indicate a 60~70us delay. We go with 500us as default for now which should be over the required amount of time (according to HW architects). v2: Don't forget to save/restore registers used for the wait (Chris) v3: Name used CS_GPR registers (Chris) Fix compile issue due to rebase (Lionel) v4: Fix save/restore helpers (Umesh) v5: Move noa_wait from drm_i915_private to i915_perf_stream (Lionel) v6: Add missing struct declarations in i915_perf.h Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-2-chris@chris-wilson.co.uk	2019-10-12 09:08:33 +01:00
Lionel Landwerlin	6a45008ab7	drm/i915/perf: allow for CS OA configs to be created lazily Here we introduce a mechanism by which the execbuf part of the i915 driver will be able to request that a batch buffer containing the programming for a particular OA config be created. We'll execute these OA configuration buffers right before executing a set of userspace commands so that a particular user batchbuffer be executed with a given OA configuration. This mechanism essentially allows the userspace driver to go through several OA configuration without having to open/close the i915/perf stream. v2: No need for locking on object OA config object creation (Chris) Flush cpu mapping of OA config (Chris) v3: Properly deal with the perf_metric lock (Chris/Lionel) v4: Fix oa config unref/put when not found (Lionel) v5: Allocate BOs for configurations on the stream instead of globally (Lionel) v6: Fix 64bit division (Chris) v7: Store allocated config BOs into the stream (Lionel) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191012072308.30312-1-chris@chris-wilson.co.uk	2019-10-12 09:08:27 +01:00
Chris Wilson	a5efcde69b	drm/i915/perf: Replace global wakeref tracking with engine-pm As we now have a specific engine to use OA on, exchange the top-level runtime-pm wakeref with the engine-pm. This still results in the same top-level runtime-pm, but with more nuances to keep the engine and its gt awake. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191011190325.10979-1-chris@chris-wilson.co.uk	2019-10-12 07:53:20 +01:00
Chris Wilson	52111c4628	drm/i915/perf: Store shortcut to intel_uncore Now that we have the engine stored in i915_perf, we have a means of accessing intel_gt should we require it. However, we are currently only using the intel_gt to find the right intel_uncore, so replace our i915_perf.gt pointer with the more useful i915_perf.uncore. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191010150520.26488-2-chris@chris-wilson.co.uk	2019-10-10 18:44:24 +01:00
Lionel Landwerlin	9a61363a63	drm/i915/perf: store the associated engine of a stream We'll use this information later to verify that a client trying to reconfigure the stream does so on the right engine. For now, we want to pull the knowledge of which engine we use into a central property. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191010150520.26488-1-chris@chris-wilson.co.uk	2019-10-10 18:44:13 +01:00
Lionel Landwerlin	23b9e41a3d	drm/i915/perf: drop list of streams At some point in time there was the idea that we could have multiple stream from the same piece of HW but that never materialized and given the hard time we already have making everything work with the submission side, there is no real point having this list of 1 element around. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20191008140111.5437-1-chris@chris-wilson.co.uk	2019-10-08 16:22:19 +01:00
Chris Wilson	a4c969d107	drm/i915/perf: Set the exclusive stream under perf->lock The BKL struct_mutex is no more, the only serialisation we required for setting the exclusive stream is already managed by ce->pin_mutex in gen8_configure_all_contexts(). As such, we can manipulate i915_perf.exclusive_stream underneath our own (already held) perf->lock. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191007140812.10963-2-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20191007210942.18145-2-chris@chris-wilson.co.uk	2019-10-08 07:52:36 +01:00
Chris Wilson	8f8b1171e1	drm/i915/perf: Wean ourselves off dev_priv Use the local uncore accessors for the GT rather than using the [not-so] magic global dev_priv mmio routines. In the process, we also teach the perf stream to use backpointers to the i915_perf rather than digging it out of dev_priv. v2: Rebase onto i915_perf_types.h Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20191007140812.10963-1-chris@chris-wilson.co.uk Link: https://patchwork.freedesktop.org/patch/msgid/20191007210942.18145-1-chris@chris-wilson.co.uk	2019-10-08 07:52:35 +01:00
Chris Wilson	a4e7ccdac3	drm/i915: Move context management under GEM Keep track of the GEM contexts underneath i915->gem.contexts and assign them their own lock for the purposes of list management. v2: Focus on lock tracking; ctx->vm is protected by ctx->mutex v3: Correct split with removal of logical HW ID Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-15-chris@chris-wilson.co.uk	2019-10-04 15:39:34 +01:00
Chris Wilson	2935ed5339	drm/i915: Remove logical HW ID With the introduction of ctx->engines[] we allow multiple logical contexts to be used on the same engine (e.g. with virtual engines). According to bspec, aach logical context requires a unique tag in order for context-switching to occur correctly between them. [Simple experiments show that it is not so easy to trick the HW into performing a lite-restore with matching logical IDs, though my memory from early Broadwell experiments do suggest that it should be generating lite-restores.] We only need to keep a unique tag for the active lifetime of the context, and for as long as we need to identify that context. The HW uses the tag to determine if it should use a lite-restore (why not the LRCA?) and passes the tag back for various status identifies. The only status we need to track is for OA, so when using perf, we assign the specific context a unique tag. v2: Calculate required number of tags to fill ELSP. Fixes: `976b55f0e1` ("drm/i915: Allow a context to define its set of engines") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=111895 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-14-chris@chris-wilson.co.uk	2019-10-04 15:39:30 +01:00
Chris Wilson	2850748ef8	drm/i915: Pull i915_vma_pin under the vm->mutex Replace the struct_mutex requirement for pinning the i915_vma with the local vm->mutex instead. Note that the vm->mutex is tainted by the shrinker (we require unbinding from inside fs-reclaim) and so we cannot allocate while holding that mutex. Instead we have to preallocate workers to do allocate and apply the PTE updates after we have we reserved their slot in the drm_mm (using fences to order the PTE writes with the GPU work and with later unbind). In adding the asynchronous vma binding, one subtle requirement is to avoid coupling the binding fence into the backing object->resv. That is the asynchronous binding only applies to the vma timeline itself and not to the pages as that is a more global timeline (the binding of one vma does not need to be ordered with another vma, nor does the implicit GEM fencing depend on a vma, only on writes to the backing store). Keeping the vma binding distinct from the backing store timelines is verified by a number of async gem_exec_fence and gem_exec_schedule tests. The way we do this is quite simple, we keep the fence for the vma binding separate and only wait on it as required, and never add it to the obj->resv itself. Another consequence in reducing the locking around the vma is the destruction of the vma is no longer globally serialised by struct_mutex. A natural solution would be to add a kref to i915_vma, but that requires decoupling the reference cycles, possibly by introducing a new i915_mm_pages object that is own by both obj->mm and vma->pages. However, we have not taken that route due to the overshadowing lmem/ttm discussions, and instead play a series of complicated games with trylocks to (hopefully) ensure that only one destruction path is called! v2: Add some commentary, and some helpers to reduce patch churn. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20191004134015.13204-4-chris@chris-wilson.co.uk	2019-10-04 15:39:02 +01:00
Chris Wilson	7dc56af526	drm/i915/selftests: Verify the LRC register layout between init and HW Before we submit the first context to HW, we need to construct a valid image of the register state. This layout is defined by the HW and should match the layout generated by HW when it saves the context image. Asserting that this should be equivalent should help avoid any undefined behaviour and verify that we haven't missed anything important! Of course, having insisted that the initial register state within the LRC should match that returned by HW, we need to ensure that it does. v2: Drop the RELATIVE_MMIO flag from gen11, we ignore it for constructing the lrc image. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190924145950.3011-1-chris@chris-wilson.co.uk	2019-09-24 17:27:19 +01:00
Chris Wilson	dffa8feb30	drm/i915/perf: Assert locking for i915_init_oa_perf_state() We use the context->pin_mutex to serialise updates to the OA config and the registers values written into each new context. Document this relationship and assert we do hold the context->pin_mutex as used by gen8_configure_all_contexts() to serialise updates to the OA config itself. v2: Add a white-lie for when we call intel_gt_resume() from init. v3: Lie while we have the context pinned inside atomic reset. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #v1 Link: https://patchwork.freedesktop.org/patch/msgid/20190830181929.18663-1-chris@chris-wilson.co.uk	2019-08-31 16:08:28 +01:00
Michel Thierry	45e9c829eb	drm/i915/tgl/perf: use the same oa ctx_id format as icl Compared to Icelake, Tigerlake's MAX_CONTEXT_HW_ID is smaller by one, but since we just use the upper 32 bits of the lrc_desc, it's guaranteed OA will use the correct one. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Michel Thierry <michel.thierry@intel.com> Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190823082055.5992-19-lucas.demarchi@intel.com	2019-08-27 08:47:31 -07:00
Jani Nikula	db94e9f133	drm/i915: extract i915_perf.h from i915_drv.h It used to be handy that we only had a couple of headers, but over time i915_drv.h has become unwieldy. Extract declarations to a separate header file corresponding to the implementation module, clarifying the modularity of the driver. Ensure the new header is self-contained, and do so with minimal further includes, using forward declarations as needed. Include the new header only where needed, and sort the modified include directives while at it and as needed. No functional changes. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/d7826e365695f691a3ac69a69ff6f2bbdb62700d.1565271681.git.jani.nikula@intel.com	2019-08-09 11:52:04 +03:00
Umesh Nerlige Ramappa	a37f08a882	drm/i915/perf: Refactor oa object to better manage resources The oa object manages the oa buffer and must be allocated when the user intends to read performance counter snapshots. This can be achieved by making the oa object part of the stream object which is allocated when a stream is opened by the user. Attributes in the oa object that are gen-specific are moved to the perf object so that they can be initialized on driver load. The split provides a better separation of the objects used in perf implementation of i915 driver so that resources are allocated and initialized only when needed. v2: Fix checkpatch warnings v3: Addressed Lionel's review comment v4: Rebase v5: Fix rebase/merge issue with ratelimit_state_init Signed-off-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190806233002.984-1-umesh.nerlige.ramappa@intel.com	2019-08-07 20:34:39 +01:00
Chris Wilson	750e76b4f9	drm/i915/gt: Move the [class][inst] lookup for engines onto the GT To maintain a fast lookup from a GT centric irq handler, we want the engine lookup tables on the intel_gt. To avoid having multiple copies of the same multi-dimension lookup table, move the generic user engine lookup into an rbtree (for fast and flexible indexing). v2: Split uabi_instance cf uabi_class v3: Set uabi_class/uabi_instance after collating all engines to provide a stable uabi across parallel unordered construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> #v2 Link: https://patchwork.freedesktop.org/patch/msgid/20190806124300.24945-2-chris@chris-wilson.co.uk	2019-08-06 15:00:43 +01:00
Chris Wilson	cb0c43f30c	drm/i915: Avoid ce->gem_context->i915 My plan for the future is to have kernel contexts not to have a GEM context backpointer (as they will not belong to any GEM context). In a few places, we use ce->gem_context to simply obtain the i915 backpointer, for which we can use ce->engine->i915 instead. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190730163441.16477-1-chris@chris-wilson.co.uk	2019-07-31 07:43:42 +01:00
Rodrigo Vivi	ed32f8d42c	Merge drm/drm-next into drm-intel-next-queued Catching up with 5.3-rc* Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>	2019-07-29 08:51:48 -07:00
Chris Wilson	5cca503817	drm/i915/perf: Initialise err to 0 before looping over ce->engines Smatch warning that the loop may be empty causing us to check err before it had been set. Ensure that it is initialised to 0, just in case. v2: Refactor the inner loop for better scooping and clarity Fixes: `a9877da2d6` ("drm/i915/oa: Reconfigure contexts on the fly") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190726131458.8310-1-chris@chris-wilson.co.uk	2019-07-26 15:10:53 +01:00
Matteo Croce	eec4844fae	proc/sysctl: add shared variables for range check In the sysctl code the proc_dointvec_minmax() function is often used to validate the user supplied value between an allowed range. This function uses the extra1 and extra2 members from struct ctl_table as minimum and maximum allowed value. On sysctl handler declaration, in every source file there are some readonly variables containing just an integer which address is assigned to the extra1 and extra2 members, so the sysctl range is enforced. The special values 0, 1 and INT_MAX are very often used as range boundary, leading duplication of variables like zero=0, one=1, int_max=INT_MAX in different source files: $ git grep -E '\.extra[12].*&(zero\|one\|int_max)' \|wc -l 248 Add a const int array containing the most commonly used values, some macros to refer more easily to the correct array member, and use them instead of creating a local one for every object file. This is the bloat-o-meter output comparing the old and new binary compiled with the default Fedora config: # scripts/bloat-o-meter -d vmlinux.o.old vmlinux.o add/remove: 2/2 grow/shrink: 0/2 up/down: 24/-188 (-164) Data old new delta sysctl_vals - 12 +12 __kstrtab_sysctl_vals - 12 +12 max 14 10 -4 int_max 16 - -16 one 68 - -68 zero 128 28 -100 Total: Before=20583249, After=20583085, chg -0.00% [mcroce@redhat.com: tipc: remove two unused variables] Link: http://lkml.kernel.org/r/20190530091952.4108-1-mcroce@redhat.com [akpm@linux-foundation.org: fix net/ipv6/sysctl_net_ipv6.c] [arnd@arndb.de: proc/sysctl: make firmware loader table conditional] Link: http://lkml.kernel.org/r/20190617130014.1713870-1-arnd@arndb.de [akpm@linux-foundation.org: fix fs/eventpoll.c] Link: http://lkml.kernel.org/r/20190430180111.10688-1-mcroce@redhat.com Signed-off-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Aaron Tomlin <atomlin@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-07-18 17:08:07 -07:00
Chris Wilson	a9877da2d6	drm/i915/oa: Reconfigure contexts on the fly Avoid a global idle barrier by reconfiguring each context by rewriting them with MI_STORE_DWORD from the kernel context. v2: We only need to determine the desired register values once, they are the same for all contexts. v3: Don't remove the kernel context from the list of known GEM contexts; the world is not ready for that yet. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190716213443.9874-1-chris@chris-wilson.co.uk	2019-07-17 07:58:27 +01:00
Lionel Landwerlin	14bfcd3e0d	drm/i915/perf: add missing delay for OA muxes configuration This was dropped from the original patch series, we weren't sure whether it was needed at the time. More recent tests show it's definitely needed to have acurate performance data. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `19f81df285` ("drm/i915/perf: Add OA unit support for Gen 8+") Acked-by: Chris Wilson <chris@chris-wilson.co.uk> [ickle: combine duplicate code and comments] Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190710105524.23017-1-chris@chris-wilson.co.uk	2019-07-10 12:55:48 +01:00
Lionel Landwerlin	a5af1df716	drm/i915/perf: ensure we keep a reference on the driver The i915 perf stream has its own file descriptor and is tied to reference of the driver. We haven't taken care of keep the driver alive. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: `eec688e142` ("drm/i915: Add i915 perf infrastructure") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190709123351.5645-2-lionel.g.landwerlin@intel.com	2019-07-09 21:26:40 +01:00
Michal Wajdeczko	5ed7a0cf33	drm/i915: Move OA files to separate folder OA files look to be auto-generated so we can keep them all in dedicated subdirectory. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190626123826.39760-1-michal.wajdeczko@intel.com	2019-06-26 23:23:59 +01:00
Lionel Landwerlin	8dcfdfb450	drm/i915/perf: fix ICL perf register offsets We got the wrong offsets (could they have changed?). New values were computed off an error state by looking up the register offset in the context image as written by the HW. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `1de401c08f` ("drm/i915/perf: enable perf support on ICL") Acked-by: Kenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190610081914.25428-1-lionel.g.landwerlin@intel.com	2019-06-25 14:03:05 +03:00
Daniele Ceraolo Spurio	d858d5695f	drm/i915: update rpm_get/put to use the rpm structure The functions where internally already only using the structure, so we need to just flip the interface. v2: rebase Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Imre Deak <imre.deak@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190613232156.34940-7-daniele.ceraolospurio@intel.com	2019-06-14 15:58:33 +01:00
Lionel Landwerlin	bf210f6c9e	drm/i915/perf: fix whitelist on Gen10+ Gen10 added an additional NOA_WRITE register (high bits) and we forgot to whitelist it for userspace. Fixes: `95690a02fb` ("drm/i915/perf: enable perf support on CNL") Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> Link: https://patchwork.freedesktop.org/patch/msgid/20190601225845.12600-1-lionel.g.landwerlin@intel.com	2019-06-10 11:48:24 +03:00
Chris Wilson	10be98a77c	drm/i915: Move more GEM objects under gem/ Continuing the theme of separating out the GEM clutter. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-8-chris@chris-wilson.co.uk	2019-05-28 12:45:29 +01:00
Chris Wilson	8475355f7a	drm/i915: Move shmem object setup to its own file Split the plain old shmem object into its own file to start decluttering i915_gem.c v2: Lose the confusing, hysterical raisins, suffix of _gtt. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190528092956.14910-4-chris@chris-wilson.co.uk	2019-05-28 12:45:29 +01:00
Chris Wilson	5e2a0419ef	drm/i915: Switch back to an array of logical per-engine HW contexts We switched to a tree of per-engine HW context to accommodate the introduction of virtual engines. However, we plan to also support multiple instances of the same engine within the GEM context, defeating our use of the engine as a key to looking up the HW context. Just allocate a logical per-engine instance and always use an index into the ctx->engines[]. Later on, this ctx->engines[] may be replaced by a user specified map. v2: Add for_each_gem_engine() helper to iterator within the engines lock v3: intel_context_create_request() helper v4: s/unsigned long/unsigned int/ 4 billion engines is quite enough. v5: Push iterator locking to caller Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-7-chris@chris-wilson.co.uk	2019-04-26 18:32:11 +01:00
Chris Wilson	fa9f668141	drm/i915: Export intel_context_instance() We want to pass in a intel_context into intel_context_pin() and that requires us to first be able to lookup the intel_context! Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190426163336.15906-2-chris@chris-wilson.co.uk	2019-04-26 18:32:00 +01:00
Chris Wilson	2ccdf6a1c3	drm/i915: Pass intel_context to i915_request_create() Start acquiring the logical intel_context and using that as our primary means for request allocation. This is the initial step to allow us to avoid requiring struct_mutex for request allocation along the perma-pinned kernel context, but it also provides a foundation for breaking up the complex request allocation to handle different scenarios inside execbuf. For the purpose of emitting a request from inside retirement (see the next patch for engine power management), we also need to lift control over the timeline mutex to the caller. v2: Note that the request carries the active reference upon construction. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424200717.1686-4-chris@chris-wilson.co.uk	2019-04-24 22:25:35 +01:00
Chris Wilson	112ed2d31a	drm/i915: Move GraphicsTechnology files under gt/ Start partitioning off the code that talks to the hardware (GT) from the uapi layers and move the device facing code under gt/ One casualty is s/intel_ringbuffer.h/intel_engine.h/ with the plan to subdivide that header and body further (and split out the submission code from the ringbuffer and logical context handling). This patch aims to be simple motion so git can fixup inflight patches with little mess. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424174839.7141-1-chris@chris-wilson.co.uk	2019-04-24 21:01:46 +01:00
Chris Wilson	09407579ab	drm/i915: Store the default sseu setup on the engine As we push for better compartmentalisation, it is more convenient to copy the default sseu configuration from the engine into the derived logical context, than it is to dig it out from i915->runtime_info. v2: Use intel_sseu_from_device_info() to describe the converter Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190424095134.30249-1-chris@chris-wilson.co.uk	2019-04-24 16:37:20 +01:00
Daniele Ceraolo Spurio	97a04e0d07	drm/i915: switch intel_wait_for_register to uncore The intel_uncore structure is the owner of register access, so subclass the function to it. While at it, use a local uncore var and switch to the new read/write functions where it makes sense. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190325214940.23632-9-daniele.ceraolospurio@intel.com	2019-03-26 20:20:24 +00:00
Chris Wilson	a679f58d05	drm/i915: Flush pages on acquisition When we return pages to the system, we ensure that they are marked as being in the CPU domain since any external access is uncontrolled and we must assume the worst. This means that we need to always flush the pages on acquisition if we need to use them on the GPU, and from the beginning have used set-domain. Set-domain is overkill for the purpose as it is a general synchronisation barrier, but our intent is to only flush the pages being swapped in. If we move that flush into the pages acquisition phase, we know then that when we have obj->mm.pages, they are coherent with the GPU and need only maintain that status without resorting to heavy handed use of set-domain. The principle knock-on effect for userspace is through mmap-gtt pagefaulting. Our uAPI has always implied that the GTT mmap was async (especially as when any pagefault occurs is unpredicatable to userspace) and so userspace had to apply explicit domain control itself (set-domain). However, swapping is transparent to the kernel, and so on first fault we need to acquire the pages and make them coherent for access through the GTT. Our use of set-domain here leaks into the uABI that the first pagefault was synchronous. This is unintentional and baring a few igt should be unoticed, nevertheless we bump the uABI version for mmap-gtt to reflect the change in behaviour. Another implication of the change is that gem_create() is presumed to create an object that is coherent with the CPU and is in the CPU write domain, so a set-domain(CPU) following a gem_create() would be a minor operation that merely checked whether we could allocate all pages for the object. On applying this change, a set-domain(CPU) causes a clflush as we acquire the pages. This will have a small impact on mesa as we move the clflush here on !llc from execbuf time to create, but that should have minimal performance impact as the same clflush exists but is now done early and because of the clflush issue, userspace recycles bo and so should resist allocating fresh objects. Internally, the presumption that objects are created in the CPU write-domain and remain so through writes to obj->mm.mapping is more prevalent than I expected; but easy enough to catch and apply a manual flush. For the future, we should push the page flush from the central set_pages() into the callers so that we can more finely control when it is applied, but for now doing it one location is easier to validate, at the cost of sometimes flushing when there is no need. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Matthew Auld <matthew.william.auld@gmail.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Antonio Argenziano <antonio.argenziano@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190321161908.8007-1-chris@chris-wilson.co.uk	2019-03-21 17:28:12 +00:00
Daniele Ceraolo Spurio	3ceea6a1b4	drm/i915: use intel_uncore for all forcewake get/put Now that the internal code all works on intel_uncore, flip the external-facing interface. v2: fix GVT. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Paulo Zanoni <paulo.r.zanoni@intel.com> Reviewed-by: Paulo Zanoni <paulo.r.zanoni@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20190319183543.13679-4-daniele.ceraolospurio@intel.com	2019-03-20 21:12:31 +00:00
Rodrigo Vivi	2dd24a9c2c	drm/i915/gen11+: First assume next platforms will inherit stuff This exactly same approach was already used from gen9 to gen10 and from gen10 to gen11. Let's also use it for gen11+. Let's first assume that we inherit a similar platform and than we apply the differences on top. Different from the previous attempts this will be done this time with coccinelle. We obviously need to exclude some case that is really exclusive for gen11 like PCH, Firmware, and few others. Luckly this was easy to filter by selecting the files we are touching with coccinelle as exposed below: spatch -sp_file gen11\+.cocci --in-place i915_perf.c \ intel_bios.c intel_cdclk.c intel_ddi.c \ intel_device_info.c intel_display.c intel_dpll_mgr.c \ intel_dsi_vbt.c intel_hdmi.c intel_mocs.c intel_color.c @noticelake@ expression e; @@ -!IS_ICELAKE(e) +INTEL_GEN(e) < 11 @notgen11@ expression e; @@ -!IS_GEN(e, 11) +INTEL_GEN(e) < 11 @icelake@ expression e; @@ -IS_ICELAKE(e) +INTEL_GEN(e) >= 11 @gen11@ expression e; @@ -IS_GEN(e, 11) +INTEL_GEN(e) >= 11 No functional change. v2: Remove intel_lrc.c per Tvrtko request since those were w/a for ICL hw issuea and media related configuration. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308214300.25057-1-rodrigo.vivi@intel.com	2019-03-13 13:00:24 -07:00
Chris Wilson	c4d52feb2c	drm/i915: Move over to intel_context_lookup() In preparation for an ever growing number of engines and so ever increasing static array of HW contexts within the GEM context, move the array over to an rbtree, allocated upon first use. Unfortunately, this imposes an rbtree lookup at a few frequent callsites, but we should be able to mitigate those by moving over to using the HW context as our primary type and so only incur the lookup on the boundary with the user GEM context and engines. v2: Check for no HW context in guc_stage_desc_init Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190308132522.21573-4-chris@chris-wilson.co.uk	2019-03-08 13:59:52 +00:00
Chris Wilson	b146e5efe6	drm/i915: Pass around the intel_context Instead of passing the gem_context and engine to find the instance of the intel_context to use, pass around the intel_context instead. This is useful for the next few patches, where the intel_context is no longer a direct lookup. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190306084704.15755-1-chris@chris-wilson.co.uk	2019-03-06 10:16:33 +00:00
Chris Wilson	8a68d46436	drm/i915: Store the BIT(engine->id) as the engine's mask In the next patch, we are introducing a broad virtual engine to encompass multiple physical engines, losing the 1:1 nature of BIT(engine->id). To reflect the broader set of engines implied by the virtual instance, lets store the full bitmask. v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/) v3: Tvrtko voted for moah churn so teach everyone to not mention ring and use $class$instance throughout. v4: Comment upon the disparity in bspec for using VCS1,VCS2 in gen8 and VCS[0-4] in later gen. We opt to keep the code consistent and use 0-index naming throughout. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190305180332.30900-1-chris@chris-wilson.co.uk	2019-03-05 18:19:50 +00:00
Rodrigo Vivi	993298af26	drm/i915: Yet another if/else sort of newer to older platforms. No functional change. Just a reorg to match the preferred behavior. When rebasing internal branch on top of latest sort I noticed few more cases that needs to get reordered. Let's do in a bundle this time and hoping there's no other missing places. v2: Check for HSW/BDW ULT before generic IS_HASWELL or IS_BROADWELL or it doesn't work as pointed by Ville. But also ULT came afterwards anyway. v3: Accepting suggestions from Lucas: Sort CNL/CFL, KBL/SKL, and use <= 8 removing chv and bdw. Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190301172703.12139-1-rodrigo.vivi@intel.com	2019-03-04 10:08:13 -08:00
Lionel Landwerlin	ec431eae8f	drm/i915/perf: lock powergating configuration to default when active If some of the contexts submitting workloads to the GPU have been configured to shutdown slices/subslices, we might loose the NOA configurations written in the NOA muxes. One possible solution to this problem is to reprogram the NOA muxes when we switch to a new context. We initially tried this in the workaround batchbuffer but some concerns where raised about the cost of reprogramming at every context switch. This solution is also not without consequences from the userspace point of view. Reprogramming of the muxes can only happen once the powergating configuration has changed (which happens after context switch). This means for a window of time during the recording, counters recorded by the OA unit might be invalid. This requires userspace dealing with OA reports to discard the invalid values. Minimizing the reprogramming could be implemented by tracking of the last programmed configuration somewhere in GGTT and use MI_PREDICATE to discard some of the programming commands, but the command streamer would still have to parse all the MI_LRI instructions in the workaround batchbuffer. Another solution, which this change implements, is to simply disregard the user requested configuration for the period of time when i915/perf is active. On most platforms there are no issues with this apart from a performance penality for some media workloads that benefit from running on a partially powergated GPU. We already prevent RC6 from affecting the programming so it doesn't sound completely unreasonable to hold on powergating for the same reason. On Icelake however there would a functional problem if the slices not- containing the VME block were left enabled with a running media workload which explicitly disabled them. To avoid a GPU hang in this case, on Icelake we lock the enablement to only slices which contain VME blocks. Downside is that it means degraded GPU performance when OA is active but there is no known alternative solution for this. v2: Leave RPCS programming in intel_lrc.c (Lionel) v3: Update for s/union intel_sseu/struct intel_sseu/ (Lionel) More to_intel_context() (Tvrtko) s/dev_priv/i915/ (Tvrtko) Tvrtko Ursulin: v4: * Rebase for make_rpcs changes. v5: * Apply OA restriction from make_rpcs directly. v6: * Rebase for context image setup changes. v7: * Move stream assignment before metric enable. v8-9: * Rebase. v10: * Squashed with ICL support patch. Bspec: 21140 Co-developed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> # v9 Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190205095032.22673-2-tvrtko.ursulin@linux.intel.com	2019-02-05 11:31:52 +00:00
Jani Nikula	739f3abdbf	drm/i915: small isolated c99 types to kernel types switch Mixed C99 and kernel types use is getting ugly. Prefer kernel types. sed -i 's/\buint$8\\|16\\|32\\|64$_t\b/u\1/g' Minor checkpatch fixes sprinkled on top of the changed lines. Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: José Roberto de Souza <jose.souza@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/14ed72e7f04c9340a057855c5950b54811f8a477.1547629303.git.jani.nikula@intel.com	2019-01-17 09:02:00 +02:00
Chris Wilson	6619c0075f	drm/i915/perf: Track the rpm wakeref Keep track of our wakeref used to keep the device awake so we can catch any leak. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-7-chris@chris-wilson.co.uk	2019-01-14 16:18:09 +00:00
Chris Wilson	16e4dd0342	drm/i915: Markup paired operations on wakerefs The majority of runtime-pm operations are bounded and scoped within a function; these are easy to verify that the wakeref are handled correctly. We can employ the compiler to help us, and reduce the number of wakerefs tracked when debugging, by passing around cookies provided by the various rpm_get functions to their rpm_put counterpart. This makes the pairing explicit, and given the required wakeref cookie the compiler can verify that we pass an initialised value to the rpm_put (quite handy for double checking error paths). For regular builds, the compiler should be able to eliminate the unused local variables and the program growth should be minimal. Fwiw, it came out as a net improvement as gcc was able to refactor rpm_get and rpm_get_if_in_use together, v2: Just s/rpm_put/rpm_put_unchecked/ everywhere, leaving the manual mark up for smaller more targeted patches. v3: Mention the cookie in Returns Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20190114142129.24398-2-chris@chris-wilson.co.uk	2019-01-14 16:17:53 +00:00
Jani Nikula	3eb0930a42	Merge drm/drm-next into drm-intel-next-queued Generally catch up with 5.0-rc1, and specifically get the changes: `96d4f267e4` ("Remove 'type' argument from access_ok() function") `0b2c8f8b6b` ("i915: fix missing user_access_end() in page fault exception case") `594cc251fd` ("make 'user_access_begin()' do 'access_ok()'") Signed-off-by: Jani Nikula <jani.nikula@intel.com>	2019-01-08 10:50:22 +02:00
Linus Torvalds	96d4f267e4	Remove 'type' argument from access_ok() function Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument of the user address range verification function since we got rid of the old racy i386-only code to walk page tables by hand. It existed because the original 80386 would not honor the write protect bit when in kernel mode, so you had to do COW by hand before doing any user access. But we haven't supported that in a long time, and these days the 'type' argument is a purely historical artifact. A discussion about extending 'user_access_begin()' to do the range checking resulted this patch, because there is no way we're going to move the old VERIFY_xyz interface to that model. And it's best done at the end of the merge window when I've done most of my merges, so let's just get this done once and for all. This patch was mostly done with a sed-script, with manual fix-ups for the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form. There were a couple of notable cases: - csky still had the old "verify_area()" name as an alias. - the iter_iov code had magical hardcoded knowledge of the actual values of VERIFY_{READ,WRITE} (not that they mattered, since nothing really used it) - microblaze used the type argument for a debug printout but other than those oddities this should be a total no-op patch. I tried to fix up all architectures, did fairly extensive grepping for access_ok() uses, and the changes are trivial, but I may have missed something. Any missed conversion should be trivially fixable, though. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2019-01-03 18:57:57 -08:00
Jani Nikula	0258404f9d	drm/i915: start moving runtime device info to a separate struct First move the low hanging fruit, the fields that are only initialized runtime. Use RUNTIME_INFO() exclusively to access the fields. Cc: Tvrtko Ursulin <tvrtko.ursulin@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/c24fe7a4b0492a888690c46814c0ff21ce2f12b1.1546267488.git.jani.nikula@intel.com	2019-01-02 12:46:29 +02:00
Lucas De Marchi	f3ce44a09a	drm/i915: merge gen checks to use range Instead of using IS_GEN() for consecutive gen checks, let's pass the range to IS_GEN_RANGE(). By code inspection these were the ranges deemed necessary for spatch: @@ expression e; @@ ( - IS_GEN(e, 3) \|\| IS_GEN(e, 2) + IS_GEN_RANGE(e, 2, 3) \| - IS_GEN(e, 3) \|\| IS_GEN(e, 4) + IS_GEN_RANGE(e, 3, 4) \| - IS_GEN(e, 5) \|\| IS_GEN(e, 6) + IS_GEN_RANGE(e, 5, 6) \| - IS_GEN(e, 6) \|\| IS_GEN(e, 7) + IS_GEN_RANGE(e, 6, 7) \| - IS_GEN(e, 7) \|\| IS_GEN(e, 8) + IS_GEN_RANGE(e, 7, 8) \| - IS_GEN(e, 8) \|\| IS_GEN(e, 9) + IS_GEN_RANGE(e, 8, 9) \| - IS_GEN(e, 10) \|\| IS_GEN(e, 9) + IS_GEN_RANGE(e, 9, 10) \| - IS_GEN(e, 9) \|\| IS_GEN(e, 10) + IS_GEN_RANGE(e, 9, 10) ) After conversion, checking we don't have any missing IS_GEN_RANGE() \|\| IS_GEN() was also done. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-3-lucas.demarchi@intel.com	2018-12-12 16:54:09 -08:00
Lucas De Marchi	cf819eff90	drm/i915: replace IS_GEN<N> with IS_GEN(..., N) Define IS_GEN() similarly to our IS_GEN_RANGE(). but use gen instead of gen_mask to do the comparison. Now callers can pass then gen as a parameter, so we don't require one macro for each gen. The following spatch was used to convert the users of these macros: @@ expression e; @@ ( - IS_GEN2(e) + IS_GEN(e, 2) \| - IS_GEN3(e) + IS_GEN(e, 3) \| - IS_GEN4(e) + IS_GEN(e, 4) \| - IS_GEN5(e) + IS_GEN(e, 5) \| - IS_GEN6(e) + IS_GEN(e, 6) \| - IS_GEN7(e) + IS_GEN(e, 7) \| - IS_GEN8(e) + IS_GEN(e, 8) \| - IS_GEN9(e) + IS_GEN(e, 9) \| - IS_GEN10(e) + IS_GEN(e, 10) \| - IS_GEN11(e) + IS_GEN(e, 11) ) v2: use IS_GEN rather than GT_GEN and compare to info.gen rather than using the bitmask Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-2-lucas.demarchi@intel.com	2018-12-12 16:52:10 -08:00
Lucas De Marchi	0069000877	drm/i915: Rename IS_GEN to IS_GEN_RANGE RANGE makes it longer, but clearer. We are also going to add a macro to check an individual gen, so add the _RANGE prefix here. Diff generated with: sed 's/IS_GEN(/IS_GEN_RANGE(/g' drivers/gpu/drm/i915/{/,}.{c,h} -i v2: use IS_GEN rather than GT_GEN Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181212181044.15886-1-lucas.demarchi@intel.com	2018-12-12 16:51:49 -08:00
Joonas Lahtinen	6b671c27ff	Revert "drm/i915/perf: Fix warning in documentation" Userspace portion is still missing. This reverts commit `9fa6e2f760`. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181116135510.13807-2-joonas.lahtinen@linux.intel.com	2018-11-19 13:08:48 +02:00
Joonas Lahtinen	fe84168647	Revert "drm/i915/perf: add a parameter to control the size of OA buffer" Userspace portion is still missing. This reverts commit `cd956bfcd0`. Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Acked-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181116135510.13807-1-joonas.lahtinen@linux.intel.com	2018-11-19 13:07:29 +02:00
Lionel Landwerlin	9fa6e2f760	drm/i915/perf: Fix warning in documentation Forgot to add the description of this option in a previous commit. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `cd956bfcd0` ("drm/i915/perf: add a parameter to control the size of OA buffer") Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20181024105158.4732-1-lionel.g.landwerlin@intel.com	2018-10-24 13:57:18 +01:00
Lionel Landwerlin	cd956bfcd0	drm/i915/perf: add a parameter to control the size of OA buffer The way our hardware is designed doesn't seem to let us use the MI_RECORD_PERF_COUNT command without setting up a circular buffer. In the case where the user didn't request OA reports to be available through the i915 perf stream, we can set the OA buffer to the minimum size to avoid consuming memory which won't be used by the driver. v2: Simplify oa buffer size exponent selection (Chris) Reuse vma size field (Lionel) v3: Restrict size opening parameter to values supported by HW (Chris) v4: Drop out of date comment (Matt) Add debug message when buffer size is rejected (Matt) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181023100707.31738-5-lionel.g.landwerlin@intel.com	2018-10-23 15:09:25 +01:00
Lionel Landwerlin	5728de2f4f	drm/i915/perf: pass stream to vfuncs when possible We want to use some of the properties of the perf stream to program the hardware in a later commit. v2: Pass only perf stream as argument (Matthew) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181023100707.31738-4-lionel.g.landwerlin@intel.com	2018-10-23 15:09:24 +01:00
Lionel Landwerlin	784b1a8435	drm/i915/perf: remove redundant oa buffer initialization We initialize the OA buffer everytime we enable the OA unit (first call in gen[78]_oa_enable), so we don't need to initialize when preparing the metric set. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181023100707.31738-3-lionel.g.landwerlin@intel.com	2018-10-23 15:09:22 +01:00
Chris Wilson	666424abfb	drm/i915/execlists: Use coherent writes into the context image That we use a WB mapping for updating the RING_TAIL register inside the context image even on !llc machines has been a source of consternation for every reader. It appears to work on bsw+, but it may just have been that we have been incredibly bad at detecting the errors. v2: With extra enthusiasm. v3: Drop force of map type for pinned default_state as by the time we pin it, the map type is always WB and doesn't conflict with the earlier use by ce->state. v4: Transfer engine->default_state from MAP_WC to MAP_WB on creation so we do not need the MAP_FORCE littered around the backends Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180914123504.2062-3-chris@chris-wilson.co.uk	2018-09-14 14:23:34 +01:00
Tvrtko Ursulin	722f3de39e	i915/oa: Simplify updating contexts We can remove the update-via-batch-buffer code path, which is basically an effective duplicate of update-via-context-image path, if we notice that after we have idled the GPU, we can update the context image even of the kernel context directly. (Update-via-batch-buffer path existed only to solve the problem of how to update the kernel context image.) Only additional thing needed is to activate the edited configuration by sending one empty request down the pipe. This accomplishes context restore of the updated kernel context and so the OA configuration gets written out to it's control registers. Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180912152930.28237-1-tvrtko.ursulin@linux.intel.com	2018-09-13 09:41:12 +01:00
Lionel Landwerlin	35ab4fd2b9	drm/i915/perf: reuse intel_lrc ctx regs macro Abstract the context image access a bit. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180813080218.28994-3-tvrtko.ursulin@linux.intel.com	2018-08-31 16:18:43 +01:00
Lionel Landwerlin	1c71bc565c	drm/i915/perf: simplify configure all context function We don't need any special treatment on error so just return as soon as possible. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/20180813080218.28994-2-tvrtko.ursulin@linux.intel.com	2018-08-31 16:18:42 +01:00
Chris Wilson	6a2f59e45a	drm/i915: Pull unpin map into vma release A reasonably common operation is to pin the map of the vma alongside the vma itself for the lifetime of the vma, and so release both pins at the same time as destroying the vma. It is common enough to pull into the release function, making that central function more attractive to a couple of other callsites. The continual ulterior motive is to sweep over errors on module load aborting... Testcase: igt/drv_module_reload/basic-reload-inject Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180721125037.20127-1-chris@chris-wilson.co.uk	2018-07-24 09:55:12 +01:00
Chris Wilson	ec625fb932	drm/i915: Provide a timeout to i915_gem_wait_for_idle() Usually we have no idea about the upper bound we need to wait to catch up with userspace when idling the device, but in a few situations we know the system was idle beforehand and can provide a short timeout in order to very quickly catch a failure, long before hangcheck kicks in. In the following patches, we will use the timeout to curtain two overly long waits, where we know we can expect the GPU to complete within a reasonable time or declare it broken. In particular, with a broken GPU we expect it to fail during the initial GPU setup where do a couple of context switches to record the defaults. This is a task that takes a few milliseconds even on the slowest of devices, but we may have to wait 60s for hangcheck to give in and declare the machine inoperable. In this a case where any gpu hang is unacceptable, both from a timeliness and practical standpoint. The other improvement is that in selftests, we do not need to arm an independent timer to inject a wedge, as we can just limit the timeout on the wait directly. v2: Include the timeout parameter in the trace. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180709122044.7028-1-chris@chris-wilson.co.uk	2018-07-09 13:55:41 +01:00
Jani Nikula	6ebb6d8ebe	drm/i915/perf: make oa format tables const No reason not to be const. Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180613114929.14541-1-jani.nikula@intel.com	2018-06-14 13:47:27 +03:00
Michel Thierry	2b9a820318	drm/i915/perf: fix gen11 engine class shift Use the correct engine class shift value while storing the ctx hw id. Fixes the copy+paste error from commit `61d5676b55` ("drm/i915/perf: fix ctx_id read with GuC & ICL"). Apologies for not spotting this in the original review, the specific_ctx_id_mask is correct, only the specific_ctx_id had this problem. v2: Just use the upper 32 bits of lrc_desc (Chris) v3: If we use the lrc_desc, we must apply the ctx_id_mask too (Lionel) Fixes: `61d5676b55` ("drm/i915/perf: fix ctx_id read with GuC & ICL") Signed-off-by: Michel Thierry <michel.thierry@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Michel Thierry <michel.thierry@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180604233250.609-2-michel.thierry@intel.com	2018-06-11 11:58:43 +01:00
Michel Thierry	9904b1560e	drm/i915/perf: use the lrc_desc to get the ctx hw id in gen8-10 The upper 32 bits of the lrc_desc (bits 52-32 to be precise) are the context hw id in GEN8-10, so use them and have one less thing to maintain in the unlikely case we change the descriptor sw fields. v2: If we use the lrc_desc, we must apply the ctx_id_mask too (Lionel) Signed-off-by: Michel Thierry <michel.thierry@intel.com> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180604233250.609-1-michel.thierry@intel.com	2018-06-11 11:58:13 +01:00
Lionel Landwerlin	61d5676b55	drm/i915/perf: fix ctx_id read with GuC & ICL One thing we didn't really understand about the OA report is that the ContextID field (dword 2) is copy of the context descriptor (dword 1). On Gen8->10 and without using GuC we didn't notice the issue because we only checked the 21bits of the ContextID field in the OA reports which matches exactly the hw_id stored into the context descriptor. When using GuC submission we have an issue of a non matching hw_id because GuC uses bit 20 of the hw_id to signal proxy submission. This change introduces a mask to compare only the relevant bits. On ICL the context descriptor format has changed and we failed to address this. On top of using a mask we also need to shift the bits properly. v2: Reuse lrc_desc rather than recomputing part of it (Chris/Michel) v3: Always pin the context we're filtering with (Chris) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Fixes: `1de401c08f` ("drm/i915/perf: enable perf support on ICL") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104252 BSpec: 1237 Testcase: igt/perf/gen8-unprivileged-single-ctx-counters Acked-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Michel Thierry <michel.thierry@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180602112946.30803-3-lionel.g.landwerlin@intel.com Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org	2018-06-04 18:16:08 +01:00
Chris Wilson	1fc44d9b1a	drm/i915: Store a pointer to intel_context in i915_request To ease the frequent and ugly pointer dance of &request->gem_context->engine[request->engine->id] during request submission, store that pointer as request->hw_context. One major advantage that we will exploit later is that this decouples the logical context state from the engine itself. v2: Set mock_context->ops so we don't crash and burn in selftests. Cleanups from Tvrtko. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Acked-by: Zhenyu Wang <zhenyuw@linux.intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180517212633.24934-3-chris@chris-wilson.co.uk	2018-05-18 09:35:22 +01:00
Chris Wilson	e896d29a54	drm/i915/oa: Check that OA is disabled before unpinning Before we unpin the buffer used for OA reports and return it to the system, we need to be sure that the HW has finished writing into it. For lack of a better idea, poll OACONTROL to check it is switched off. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=106379 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Tested-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180511135207.12880-1-chris@chris-wilson.co.uk	2018-05-11 17:17:46 +01:00
Chris Wilson	a89d1f921c	drm/i915: Split i915_gem_timeline into individual timelines We need to move to a more flexible timeline that doesn't assume one fence context per engine, and so allow for a single timeline to be used across a combination of engines. This means that preallocating a fence context per engine is now a hindrance, and so we want to introduce the singular timeline. From the code perspective, this has the notable advantage of clearing up a lot of mirky semantics and some clumsy pointer chasing. By splitting the timeline up into a single entity rather than an array of per-engine timelines, we can realise the goal of the previous patch of tracking the timeline alongside the ring. v2: Tweak wait_for_idle to stop the compiling thinking that ret may be uninitialised. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180502163839.3248-2-chris@chris-wilson.co.uk	2018-05-02 23:57:18 +01:00
Chris Wilson	ab82a0635c	drm/i915: Wrap engine->context_pin() and engine->context_unpin() Make life easier in upcoming patches by moving the context_pin and context_unpin vfuncs into inline helpers. v2: Fixup mock_engine to mark the context as pinned on use. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180430131503.5375-2-chris@chris-wilson.co.uk	2018-04-30 16:01:13 +01:00
Lionel Landwerlin	9bd9be6660	drm/i915/perf: add more debug message on perf open & configs This will make it easier to spot issues related to config creation/usage. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180326090831.22686-9-lionel.g.landwerlin@intel.com	2018-03-29 13:37:41 +01:00
Lionel Landwerlin	b82ed43de5	drm/i915: rename PPGTT/GGTT fields OA registers We had a generic field name used across 2 registers but it feels like it's clearer we make it obvious what register this field belongs to. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180326090831.22686-7-lionel.g.landwerlin@intel.com	2018-03-29 13:34:27 +01:00
Lionel Landwerlin	53744104be	drm/i915/perf: remove empty line This was added by mistake in commit `28964cf25e` ("drm/i915/perf: disable NOA logic when not used"). Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180326090831.22686-6-lionel.g.landwerlin@intel.com	2018-03-29 13:33:53 +01:00
Lionel Landwerlin	1105130334	drm/i915/perf: simplify OA unit enabling on gen7 In commit `d79651522e` ("drm/i915: Enable i915 perf stream for Haswell OA unit") the enable/disable vfunc hadn't appear yet and the same function would deal with enabling/disabling the OA unit. This was split later on for gen8 but the gen7 retained some code that isn't actually useful anymore. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180326090831.22686-4-lionel.g.landwerlin@intel.com	2018-03-29 13:33:25 +01:00
Lionel Landwerlin	b6dd47b9c8	drm/i915/perf: check the value of PROP_SAMPLE_OA uapi parameter We've been a bit loose about this opening parameter. We should only add the flag for writing OA reports when the value of this parameter is != 0. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180326090831.22686-3-lionel.g.landwerlin@intel.com	2018-03-29 13:30:02 +01:00
Lionel Landwerlin	1de401c08f	drm/i915/perf: enable perf support on ICL No significant changes from either context offsets, nor report formats, nor register whitelist. v2: Also drop slice/unslice clock ratio changes (Matt) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180326133949.12469-3-lionel.g.landwerlin@intel.com	2018-03-29 13:25:30 +01:00
Lionel Landwerlin	41d3fdcd15	drm/i915/perf: fix perf stream opening lock We're seeing on CI that some contexts don't have the programmed OA period timer that directs the OA unit on how often to write reports. The issue is that we're not holding the drm lock from when we edit the context images down to when we set the exclusive_stream variable. This leaves a window for the deferred context allocation to call i915_oa_init_reg_state() that will not program the expected OA timer value, because we haven't set the exclusive_stream yet. v2: Drop need_lock from gen8_configure_all_contexts() (Matt) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: `701f8231a2` ("drm/i915/perf: prune OA configs") Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102254 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103715 Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103755 Link: https://patchwork.freedesktop.org/patch/msgid/20180301110613.1737-1-lionel.g.landwerlin@intel.com Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-gfx@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v4.14+	2018-03-01 14:32:37 +00:00
Joonas Lahtinen	bba73071b6	Merge drm-next into drm-intel-next-queued (this time for real) To pull in the HDCP changes, especially wait_for changes to drm/i915 that Chris wants to build on top of. Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2018-03-01 11:14:24 +02:00
Chris Wilson	e61e0f51ba	drm/i915: Rename drm_i915_gem_request to i915_request We want to de-emphasize the link between the request (dependency, execution and fence tracking) from GEM and so rename the struct from drm_i915_gem_request to i915_request. That is we may implement the GEM user interface on top of requests, but they are an abstraction for tracking execution rather than an implementation detail of GEM. (Since they are not tied to HW, we keep the i915 prefix as opposed to intel.) In short, the spatch: @@ @@ - struct drm_i915_gem_request + struct i915_request A corollary to contracting the type name, we also harmonise on using 'rq' shorthand for local variables where space if of the essence and repetition makes 'request' unwieldy. For globals and struct members, 'request' is still much preferred for its clarity. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Michał Winiarski <michal.winiarski@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20180221095636.6649-1-chris@chris-wilson.co.uk Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com> Acked-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>	2018-02-21 20:57:22 +00:00
Linus Torvalds	a9a08845e9	vfs: do bulk POLL* -> EPOLL* replacement This is the mindless scripted replacement of kernel use of POLL* variables as described by Al, done by this script: for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do L=`git grep -l -w POLL$V \| grep -v '^t' \| grep -v /um/ \| grep -v '^sa' \| grep -v '/poll.h$'\|grep -v '^D'` for f in $L; do sed -i "-es/^$[^\"]$$\<POLL$V\>$/\\1E\\2/" $f; done done with de-mangling cleanups yet to come. NOTE! On almost all architectures, the EPOLL constants have the same values as the POLL* constants do. But they keyword here is "almost". For various bad reasons they aren't the same, and epoll() doesn't actually work quite correctly in some cases due to this on Sparc et al. The next patch from Al will sort out the final differences, and we should be all done. Scripted-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2018-02-11 14:34:03 -08:00
Linus Torvalds	4bf772b146	drm/graphics pull request for v4.16-rc1 -----BEGIN PGP SIGNATURE----- iQIcBAABAgAGBQJacnVwAAoJEAx081l5xIa+HhIP/0yDg5tuco0QN3YskE/bIa3o 4VDWsLi+WCoSZoV4uWLKYK8OHiNzKdnGfNoUNWqRqaYilWDtpgBX86Wjg5hxnGwA /6jGfU1nhb0teG9clGBbzgxHXW6iKvT+p/Pp1pC8HXU+zEUaungJcWY120hITwMD NqUGK6kYRsJVYj+4b+5Ho7Fvv912bbjK0YAptD6RdzX4rDPN0D+XrtXlYsg1PJYx jv/NNWEP5mCesYKsS8JzHYcfOF/vdQpPwAV4C3LKaQy5k3pVVIDOEuOycIZTKMf3 K/fSsbvhHMH3Ck+lPcK+etcoQbkLCcmKbw+3uvM/7njkn7Dp24Ryk9FXB3dXXOgb 3kLs7f0gY9j/NAi3uKAMvACPvXNA7eptIvAmN/VKzmEiqgx+l0sveSuU73DVoe/x Jko8ijyiKchcN+/CTgZ7FNyEd0UWO06+9B0RMrlEezE8f14EhR51wIQQTNFJRJn/ kqRM1hC2Cvb00vAwq7jjZcDa7hRCI0OoVU9N37smtPuTJY94tR/CUbq10g4pSlu8 h8FiHnLuhlyh1DQNNS19HQfOSh0yYgEGRQcIKy3vqshsO3/hbe8bQD5UerqMZPZB ZpMEWe5VHSWIVjAxgzHNXFd9F/jSeWDVkCztKfx0CLmzHZNLNjw+/zgbIdF3vj9T S1cwFZLWr/ngf5mbyR88 =pLN1 -----END PGP SIGNATURE----- Merge tag 'drm-for-v4.16' of git://people.freedesktop.org/~airlied/linux Pull drm updates from Dave Airlie: "This seems to have been a comparatively quieter merge window, I assume due to holidays etc. The "biggest" change is AMD header cleanups, which merge/remove a bunch of them. The AMD gpu scheduler is now being made generic with the etnaviv driver wanting to reuse the code, hopefully other drivers can go in the same direction. Otherwise it's the usual lots of stuff in i915/amdgpu, not so much stuff elsewhere. Core: - Add .last_close and .output_poll_changed helpers to reduce driver footprints - Fix plane clipping - Improved debug printing support - Add panel orientation property - Update edid derived properties at edid setting - Reduction in fbdev driver footprint - Move amdgpu scheduler into core for other drivers to use. i915: - Selftest and IGT improvements - Fast boot prep work on IPS, pipe config - HW workarounds for Cannonlake, Geminilake - Cannonlake clock and HDMI2.0 fixes - GPU cache invalidation and context switch improvements - Display planes cleanup - New PMU interface for perf queries - New firmware support for KBL/SKL - Geminilake HW workaround for perforamce - Coffeelake stolen memory improvements - GPU reset robustness work - Cannonlake horizontal plane flipping - GVT work amdgpu/radeon: - RV and Vega header file cleanups (lots of lines gone!) - TTM operation context support - 48-bit GPUVM support for Vega/RV - ECC support for Vega - Resizeable BAR support - Multi-display sync support - Enable swapout for reserved BOs during allocation - S3 fixes on Raven - GPU reset cleanup and fixes - 2+1 level GPU page table amdkfd: - GFX7/8 SDMA user queues support - Hardware scheduling for multiple processes - dGPU prep work rcar: - Added R8A7743/5 support - System suspend/resume support sun4i: - Multi-plane support for YUV formats - A83T and LVDS support msm: - Devfreq support for GPU tegra: - Prep work for adding Tegra186 support - Tegra186 HDMI support - HDMI2.0 and zpos support by using generic helpers tilcdc: - Misc fixes omapdrm: - Support memory bandwidth limits - DSI command mode panel cleanups - DMM error handling exynos: - drop the old IPP subdriver. etnaviv: - Occlusion query fixes - Job handling fixes - Prep work for hooking in gpu scheduler armada: - Move closer to atomic modesetting - Allow disabling primary plane if overlay is full screen imx: - Format modifier support - Add tile prefetch to PRE - Runtime PM support for PRG ast: - fix LUT loading" * tag 'drm-for-v4.16' of git://people.freedesktop.org/~airlied/linux: (1471 commits) drm/ast: Load lut in crtc_commit drm: Check for lessee in DROP_MASTER ioctl drm: fix gpu scheduler link order drm/amd/display: Demote error print to debug print when ATOM impl missing dma-buf: fix reservation_object_wait_timeout_rcu once more v2 drm/amdgpu: Avoid leaking PM domain on driver unbind (v2) drm/amd/amdgpu: Add Polaris version check drm/amdgpu: Reenable manual GPU reset from sysfs drm/amdgpu: disable MMHUB power gating on raven drm/ttm: Don't unreserve swapped BOs that were previously reserved drm/ttm: Don't add swapped BOs to swap-LRU list drm/amdgpu: only check for ECC on Vega10 drm/amd/powerplay: Fix smu_table_entry.handle type drm/ttm: add VADDR_FLAG_UPDATED_COUNT to correctly update dma_page global count drm: Fix PANEL_ORIENTATION_QUIRKS breaking the Kconfig DRM menuconfig drm/radeon: fill in rb backend map on evergreen/ni. drm/amdgpu/gfx9: fix ngg enablement to clear gds reserved memory (v2) drm/ttm: only free pages rather than update global memory count together drm/amdgpu: fix CPU based VM updates drm/amdgpu: fix typo in amdgpu_vce_validate_bo ...	2018-02-01 17:48:47 -08:00
Al Viro	afc9a42b74	the rest of drivers/*: annotate ->poll() instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2017-11-28 11:06:58 -05:00
Chris Wilson	fb5c551ad5	drm/i915: Remove i915.enable_execlists module parameter Execlists and legacy ringbuffer submission are no longer feature comparable (execlists now offer greater functionality that should overcome their performance hit) and obsoletes the unsafe module parameter, i.e. comparing the two modes of execution is no longer useful, so remove the debug tool. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Reviewed-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> #i915_perf.c Link: https://patchwork.freedesktop.org/patch/msgid/20171120205504.21892-1-chris@chris-wilson.co.uk	2017-11-20 21:53:59 +00:00
Lionel Landwerlin	9f9b2792b6	drm/i915/perf: reuse timestamp frequency from device info Now that we have this stored in the device info, we can drop it from perf part of the driver. Note that this requires to init perf after we've computed the frequency, hence why we move i915_perf_init() from i915_driver_init_early() to after intel_device_info_runtime_init(). v2: Use div_u64 (Chris) v3: Drop u64 divs by switching to kHz (Chris/Ville) Move i915_perf_fini to i915_driver_cleanup_hw (Matthew) Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171113181902.12411-2-lionel.g.landwerlin@intel.com	2017-11-20 16:09:04 +00:00
Chris Wilson	3fef5cda97	drm/i915: Automatic i915_switch_context for legacy During request construction, after pinning the context we know whether or not we have to emit a context switch. So move this common operation from every caller into i915_gem_request_alloc() itself. v2: Always submit the request if we emitted some commands during request construction, as typically it also involves changes in global state. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171120102002.22254-2-chris@chris-wilson.co.uk	2017-11-20 15:56:16 +00:00
Lionel Landwerlin	7c52a2219d	drm/i915/perf: replace .reg accesses with i915_mmio_reg_offset This replaces accesses to the reg field of the i915_reg_t structure with the i915_mmio_reg_offset() inline function. Suggested-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Ewelina Musial <ewelina.musial@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171113233455.12085-2-lionel.g.landwerlin@intel.com	2017-11-20 15:46:02 +00:00
Lionel Landwerlin	95690a02fb	drm/i915/perf: enable perf support on CNL This adds new registers to the whitelist to configs emitted from userspace. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-6-lionel.g.landwerlin@intel.com	2017-11-13 15:59:24 +00:00
Lionel Landwerlin	ba6b7c1ab2	drm/i915/perf: refactor perf setup Gen8/9 aren't very different and we can merge some of this code. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-4-lionel.g.landwerlin@intel.com	2017-11-13 15:59:14 +00:00
Lionel Landwerlin	4407eaa9b0	drm/i915/perf: add support for Coffeelake GT3 We can enable GT3 as well as GT2. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-3-lionel.g.landwerlin@intel.com	2017-11-13 15:59:03 +00:00
Lionel Landwerlin	a54b19f177	drm/i915/perf: complete whitelisting for OA programming on HSW We were missing some registers and also can name one for which we only had the offset. Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20171110190845.32574-2-lionel.g.landwerlin@intel.com	2017-11-13 15:59:00 +00:00

1 2 3 4 5 ...

299 Commits