Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

host_get_copy_bytes_normal: "no bytes to copy" log spam #8761

Closed
marc-hb opened this issue Jan 18, 2024 · 19 comments
Closed

host_get_copy_bytes_normal: "no bytes to copy" log spam #8761

marc-hb opened this issue Jan 18, 2024 · 19 comments
Labels
bug Something isn't working as expected MTL Applies to Meteor Lake platform P2 Critical bugs or normal features

Comments

@marc-hb
Copy link
Collaborator

marc-hb commented Jan 18, 2024

1 line / ms

This mtrace spam still makes logs unusable.

Try opening
https://sof-ci.01.org/sofpr/PR8754/build1979/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=multiple-pause-resume-50 ,then click on the "mtrace" tab. The logs are so big that it takes at least 10-15 seconds to just open that tab. Probably longer if you don't have a high speed connection.

[ 2782.210465] <inf> pipe: pipeline_trigger: pipe:2 0x0 pipe trigger cmd 2
[ 2782.211160] <inf> ll_schedule: zephyr_ll_task_done: task complete 0xa0117cc0 0x20210U
[ 2782.211178] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 1 total_num_tasks 3
[ 2782.211186] <inf> ll_schedule: zephyr_domain_unregister: zephyr_domain_unregister domain->type 1 domain->clk 3
[ 2782.221346] <inf> ipc: ipc_cmd: rx	: 0x13000004|0x1
[ 2782.221371] <inf> pipe: pipeline_trigger: pipe:2 0x0 pipe trigger cmd 8
[ 2782.221380] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0117cc0 0x20210U priority 0 flags 0x0
[ 2782.221398] <inf> ll_schedule: zephyr_domain_register: zephyr_domain_register domain->type 1 domain->clk 3 domain->ticks_per_ms 38400 period 1000
[ 2782.222170] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.223161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.224160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.225161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.226160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.227160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.228160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.229160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.230160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.231158] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.231463] <inf> pipe: pipeline_trigger: pipe:1 0x0 pipe trigger cmd 8
[ 2782.231488] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0118dc0 0x20210U priority 0 flags 0x0
[ 2782.232158] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.233161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.234161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.235161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.236160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.237160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.238160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.239161] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.240160] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.241158] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.241546] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 8
[ 2782.241571] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0119240 0x20210U priority 0 flags 0x0
[ 2782.242231] <inf> host_comp: host_get_copy_bytes_normal: comp:2 0x30004 no bytes to copy, available samples: 0, free_samples: 768
[ 2782.242280] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_off: intel_adsp_gpdma_power_off: dma dma@7c000 power off
[ 2782.242320] <inf> dma_intel_adsp_gpdma: intel_adsp_gpdma_power_on: intel_adsp_gpdma_power_on: dma dma@7c000 initialized
[ 2782.242363] <inf> dai_intel_dmic: dai_dmic_update_bits: dai_dmic_update_bits base 10000, reg 0, mask 6000000, value 4000000
[ 2782.242370] <inf> dai_intel_dmic: dai_dmic_update_bits: dai_dmic_update_bits base 10000, reg 1000, mask 10000, value 0

Originally posted by @marc-hb in #4672 (comment)

cc:

@kv2019i
Copy link
Collaborator

kv2019i commented Jan 19, 2024

This is potentially related to (and fixed by) #8756

The "host_get_copy_bytes_norma" logs repeat exactly 10ms which is the delay we now have in set-pipeline-state processing (fixed by 8756).

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jan 22, 2024

@marc-hb marc-hb added the P1 Blocker bugs or important features label Feb 14, 2024
@marc-hb
Copy link
Collaborator Author

marc-hb commented Feb 14, 2024

This massive spam really makes it impossible to do anthing. Try for instance looking at the mtrace logs in https://sof-ci.01.org/sofpr/PR8855/build2720/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=multiple-pause-resume-50

Good luck.

The sheer workload is probably changing timings considerably too, which means turning debug on/off makes reproductions come and go.

@lgirdwood
Copy link
Member

@marc-hb can you try the following patch in your failing test case. If it helps then we can upstream.

From aad766dc742525df67f579c0b5afadeb0b102244 Mon Sep 17 00:00:00 2001
From: Liam Girdwood <liam.r.girdwood@linux.intel.com>
Date: Mon, 19 Feb 2024 14:11:51 +0000
Subject: [PATCH] host-zephyr: flow control "no data to copy" message.

Flow control the important "no data to copy" message which can be
indicative that an error state has been entered but can also flood the
logs worsening the log SNR for debug.

Signed-off-by: Liam Girdwood <liam.r.girdwood@linux.intel.com>
---
 src/audio/copier/host_copier.h |  1 +
 src/audio/host-zephyr.c        | 16 +++++++++++++---
 2 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/src/audio/copier/host_copier.h b/src/audio/copier/host_copier.h
index 08cbba43b..07346a868 100644
--- a/src/audio/copier/host_copier.h
+++ b/src/audio/copier/host_copier.h
@@ -68,6 +68,7 @@ struct host_data {
 	uint16_t no_stream_position; /**< 1 means don't send stream position */
 	uint64_t total_data_processed;
 	uint8_t cont_update_posn; /**< 1 means continuous update stream position */
+	uint8_t dma_copy_info_count;	/**! flow control for DMA "no bytes" message */
 
 	/* host component attributes */
 	enum comp_copy_type copy_type;	/**< Current host copy type */
diff --git a/src/audio/host-zephyr.c b/src/audio/host-zephyr.c
index fd629838a..3abbba85e 100644
--- a/src/audio/host-zephyr.c
+++ b/src/audio/host-zephyr.c
@@ -397,9 +397,19 @@ static uint32_t host_get_copy_bytes_normal(struct host_data *hd, struct comp_dev
 	if (!(hd->ipc_host.feature_mask & BIT(IPC4_COPIER_FAST_MODE)))
 		dma_copy_bytes = MIN(hd->period_bytes, dma_copy_bytes);
 
-	if (!dma_copy_bytes)
-		comp_info(dev, "no bytes to copy, available samples: %d, free_samples: %d",
-			  avail_samples, free_samples);
+	if (!dma_copy_bytes) {
+		/* flow control this message, its important but can flood the logs,
+		 * so dial down to ~4 times per second when continual (based on
+		 * 1 LL tick per ms and dma_copy_info_count being a uint8_t).
+		 */
+		if (hd->dma_copy_info_count++ == 0) {
+			comp_info(dev, "no bytes to copy, available samples: %d, free_samples: %d",
+				  avail_samples, free_samples);
+		}
+	} else {
+		/* DMA back to normal now with data to copy so reset counter */
+		hd->dma_copy_info_count = 0;
+	}
 
 	/* dma_copy_bytes should be aligned to minimum possible chunk of
 	 * data to be copied by dma.
-- 
2.40.1

@kv2019i
Copy link
Collaborator

kv2019i commented Mar 11, 2024

This is potentially fixed by zephyrproject-rtos/zephyr#69480 . Assigning to @serhiy-katsyuba-intel as owner of the Zephyr fix.

@kv2019i
Copy link
Collaborator

kv2019i commented Mar 22, 2024

This should be fixed with #8954 , @marc-hb please close if you can confirm.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Mar 28, 2024

I will skim recent logs, give me some time until next week.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 1, 2024

Not fixed. For instance the mtrace tab in https://sof-ci.01.org/sofpr/PR8995/build3690/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=multiple-pause-resume-50 has still 30,000 lines with host_get_copy_bytes_normal ... and still crashes the browser.

That's for just 50 pause/resume iterations.

What's new: these lines are not consecutive anymore.

@kv2019i kv2019i added the MTL Applies to Meteor Lake platform label Apr 3, 2024
@wszypelt
Copy link

The original issue has been fixed, after talking to the developers, setting the issue to P2

@wszypelt wszypelt added P2 Critical bugs or normal features and removed P1 Blocker bugs or important features labels Apr 12, 2024
@thesofproject thesofproject deleted a comment from marc-hb Apr 15, 2024
@marc-hb
Copy link
Collaborator Author

marc-hb commented Apr 26, 2024

I'm not sure why now, but https://sof-ci.01.org/sofpr/PR9086/build4356/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=multiple-pause-resume-50 is still totally unusable (which is why I'm not sure why). Try it.

@lrudyX
Copy link

lrudyX commented May 10, 2024

@marc-hb this won't be fixed in fw. Phrase "no bytes" was found about 33 000 times in 1000 000 log lines. Please close the ticket.

marc-hb added a commit to marc-hb/sof-test that referenced this issue May 15, 2024
Protect storage in case this is not run first and immediately after boot
as it should. Also makes the logs more relevant and usable: long files
can make the user interface unusable, see example in
thesofproject/sof#8761

This also gives us --no-pager "for free" in case it was ever needed.

Fixes recent commits c8565e8 and 914c5c0.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to thesofproject/sof-test that referenced this issue May 16, 2024
Protect storage in case this is not run first and immediately after boot
as it should. Also makes the logs more relevant and usable: long files
can make the user interface unusable, see example in
thesofproject/sof#8761

This also gives us --no-pager "for free" in case it was ever needed.

Fixes recent commits c8565e8 and 914c5c0.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@wszypelt
Copy link

@marc-hb could you close the ticket? or maybe the issue is still reproducing?

@marc-hb
Copy link
Collaborator Author

marc-hb commented May 29, 2024

Still 30,000 "no bytes to copy" spam in https://sof-ci.01.org/sofpr/PR9174/build4946/devicetest/index.html?model=LNLM_SDW_AIOC&testcase=multiple-pause-resume-50

There are 1200 such lines in 1.2 seconds near the end, so the 1 line/ms is probably not a coincidence.

@dbaluta
Copy link
Collaborator

dbaluta commented May 29, 2024

@marc-hb can we implemented log_ratelimit? So that if thare are more that we only alow for about 10 similar logs in about 1s?

@lgirdwood
Copy link
Member

@dbaluta pls go ahead with ratelimit. Thanks !

@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 18, 2024

@marc-hb marc-hb closed this as completed Jul 18, 2024
@marc-hb
Copy link
Collaborator Author

marc-hb commented Jul 18, 2024

Actually, still a lot of repetition in for instance in:

But at least the logs are usable.

I don't know what changed

Maybe it was just me reducing the default count from 5 down to 1:

@marc-hb

This comment was marked as duplicate.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Aug 20, 2024

can we implemented log_ratelimit? So that if thare are more that we only alow for about 10 similar logs in about 1s?

Someone did it??

https://sof-ci.01.org/sofpr/PR9319/build7075/devicetest/index.html?model=MTLP_RVP_NOCODEC&testcase=check-pause-resume-capture-100

[ 1321.674795] <inf> pipe: pipeline_trigger: pipe:1 0x0 pipe trigger cmd 2
[ 1321.675398] <inf> ll_schedule: zephyr_ll_task_done: task complete 0xa015a940 0xa00f905cU
[ 1321.675446] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 1 total_num_tasks 1
[ 1321.675541] <inf> ll_schedule: zephyr_domain_unregister: zephyr_domain_unregister domain->type 1 domain->clk 3
[ 1322.730845] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 27 messages dropped ---
[ 1322.735423] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 12 messages dropped ---
[ 1322.739406] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 18 messages dropped ---
[ 1322.747573] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 24 messages dropped ---
[ 1322.755735] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 21 messages dropped ---
[ 1322.766646] <inf> host_comp: host_get_copy_bytes_normal: comp:1 0x20004 no bytes to copy, available samples: 384, free_samples: 0
--- 17 messages dropped ---
[ 1322.771691] <wrn> dai_comp: dai_common_copy: comp:0 0x4 dai_common_copy(): Copy_bytes 0 + free bytes 352 < period bytes 768, possible glitch
--- 15 messages dropped ---
[ 1322.779690] <wrn> dai_comp: dai_common_copy: comp:0 0x4 dai_common_copy(): Copy_bytes 0 + free bytes 352 < period bytes 768, possible glitch
--- 19 messages dropped ---
[ 1322.790786] <wrn> dai_comp: dai_common_copy: comp:0 0x4 dai_zephyr_copy(): nothing to copy
--- 16 messages dropped ---

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected MTL Applies to Meteor Lake platform P2 Critical bugs or normal features
Projects
None yet
Development

No branches or pull requests

8 participants