Skip to content

Commit

Permalink
drm/amdgpu: support to find RAS bad pages via old TA
Browse files Browse the repository at this point in the history
Old version of RAS TA doesn't support to convert MCA address stored on
eeprom to physical address (PA), support to find all bad pages in one
memory row by PA with old RAS TA. This approach is only suitable for
nps1 mode.

Signed-off-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
  • Loading branch information
Tao Zhou authored and Alex Deucher committed Dec 10, 2024
1 parent b02ef40 commit 07dd49e
Showing 1 changed file with 25 additions and 3 deletions.
28 changes: 25 additions & 3 deletions drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
Original file line number Diff line number Diff line change
Expand Up @@ -2765,9 +2765,10 @@ int amdgpu_ras_add_bad_pages(struct amdgpu_device *adev,
struct ras_err_handler_data *data;
struct ras_err_data err_data;
struct eeprom_table_record *err_rec;
enum amdgpu_memory_partition nps = AMDGPU_NPS1_PARTITION_MODE;
int ret = 0;
uint32_t i, j, loop_cnt = 1;
bool is_mca_add = true;
bool is_mca_add = true, find_pages_per_pa = false;

if (!con || !con->eh_data || !bps || pages <= 0)
return 0;
Expand Down Expand Up @@ -2797,12 +2798,33 @@ int amdgpu_ras_add_bad_pages(struct amdgpu_device *adev,
}

loop_cnt = adev->umc.retire_unit;
if (adev->gmc.gmc_funcs->query_mem_partition_mode)
nps = adev->gmc.gmc_funcs->query_mem_partition_mode(adev);
}

for (i = 0; i < pages; i++) {
if (is_mca_add) {
if (amdgpu_ras_mca2pa(adev, &bps[i], &err_data))
goto free;
if (!find_pages_per_pa) {
if (amdgpu_ras_mca2pa(adev, &bps[i], &err_data)) {
if (!i && nps == AMDGPU_NPS1_PARTITION_MODE) {
/* may use old RAS TA, use PA to find pages in
* one row
*/
if (amdgpu_umc_pages_in_a_row(adev, &err_data,
bps[i].retired_page << AMDGPU_GPU_PAGE_SHIFT))
goto free;
else
find_pages_per_pa = true;
} else {
/* unsupported cases */
goto free;
}
}
} else {
if (amdgpu_umc_pages_in_a_row(adev, &err_data,
bps[i].retired_page << AMDGPU_GPU_PAGE_SHIFT))
goto free;
}

err_rec = err_data.err_addr;
} else {
Expand Down

0 comments on commit 07dd49e

Please sign in to comment.