Skip to content

Commit

Permalink
drm/i915: Update rules for reading cache lines through the LLC
Browse files Browse the repository at this point in the history
The LLC is a fun device. The cache is a distinct functional block within
the SA that arbitrates access from both the CPU and GPU cores. As such
all writes to memory land first in the LLC before further action is
taken. For example, an uncached write from either the CPU or GPU will
then proceed to memory and evict the cacheline from the LLC. This means that
a read from the LLC always returns the correct information even if the PTE
bit in the GPU differs from the PAT bit in the CPU. For the older
snooping architecture on non-LLC, the fundamental principle still holds
except that some coordination is required between the CPU and GPU to
explicitly perform the snooping (which is handled by our request
tracking).

The upshot of this is that we know that we can issue a read from either
LLC devices or snoopable memory and trust the contents of the cache -
i.e. we can forgo a clflush before a read in these circumstances.
Writing to memory from the CPU is a little more tricky as we have to
consider that the scanout does not read from the CPU cache at all, but
from main memory. So we have to currently treat all requests to write to
uncached memory as having to be flushed to main memory for coherency
with all consumers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
  • Loading branch information
Chris Wilson authored and Daniel Vetter committed Aug 10, 2013
1 parent 5c53661 commit c76ce03
Showing 1 changed file with 14 additions and 8 deletions.
22 changes: 14 additions & 8 deletions drivers/gpu/drm/i915/i915_gem.c
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,12 @@ static long i915_gem_purge(struct drm_i915_private *dev_priv, long target);
static void i915_gem_shrink_all(struct drm_i915_private *dev_priv);
static void i915_gem_object_truncate(struct drm_i915_gem_object *obj);

static bool cpu_cache_is_coherent(struct drm_device *dev,
enum i915_cache_level level)
{
return HAS_LLC(dev) || level != I915_CACHE_NONE;
}

static inline void i915_gem_object_fence_lost(struct drm_i915_gem_object *obj)
{
if (obj->tiling_mode)
Expand Down Expand Up @@ -420,8 +426,7 @@ i915_gem_shmem_pread(struct drm_device *dev,
* read domain and manually flush cachelines (if required). This
* optimizes for the case when the gpu will dirty the data
* anyway again before the next pread happens. */
if (obj->cache_level == I915_CACHE_NONE)
needs_clflush = 1;
needs_clflush = !cpu_cache_is_coherent(dev, obj->cache_level);
if (i915_gem_obj_bound_any(obj)) {
ret = i915_gem_object_set_to_gtt_domain(obj, false);
if (ret)
Expand Down Expand Up @@ -745,11 +750,11 @@ i915_gem_shmem_pwrite(struct drm_device *dev,
return ret;
}
}
/* Same trick applies for invalidate partially written cachelines before
* writing. */
if (!(obj->base.read_domains & I915_GEM_DOMAIN_CPU)
&& obj->cache_level == I915_CACHE_NONE)
needs_clflush_before = 1;
/* Same trick applies to invalidate partially written cachelines read
* before writing. */
if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0)
needs_clflush_before =
!cpu_cache_is_coherent(dev, obj->cache_level);

ret = i915_gem_object_get_pages(obj);
if (ret)
Expand Down Expand Up @@ -3597,7 +3602,8 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)

/* Flush the CPU cache if it's still invalid. */
if ((obj->base.read_domains & I915_GEM_DOMAIN_CPU) == 0) {
i915_gem_clflush_object(obj);
if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level))
i915_gem_clflush_object(obj);

obj->base.read_domains |= I915_GEM_DOMAIN_CPU;
}
Expand Down

0 comments on commit c76ce03

Please sign in to comment.