In my on-going series Writing Better Audio Applications for Linux, here's another installment: a little explanation how fragments/periods and buffer sizes should be chosen when doing audio playback with traditional audio APIs such as ALSA and OSS. This originates from some emails I exchanged with the Ekiga folks. In the last weeks I kept copying this explanation to various other folks. I guess it would make sense to post this on my blog here too to reach a wider audience. So here it is, mostly unedited:
Yes. You shouldn't misuse the fragments logic of sound devices. It's like this: The latency is defined by the buffer size. The wakeup interval is defined by the fragment size. The buffer fill level will oscillate between 'full buffer' and 'full buffer minus 1x fragment size minus OS scheduling latency'. Setting smaller fragment sizes will increase the CPU load and decrease battery time since you force the CPU to wake up more often. OTOH it increases drop out safety, since you fill up playback buffer earlier. Choosing the fragment size is hence something which you should do balancing out your needs between power consumption and drop-out safety. With modern processors and a good OS scheduler like the Linux one setting the fragment size to anything other than half the buffer size does not make much sense. Your [Ekiga's ptlib driver that is] ALSA output is configured to set the the fragment size to the size of your codec audio frames. And that's a bad idea. Because the codec frame size has not been chosen based on power consumption or drop-out safety reasoning. It has been chosen by the codec designers based on different reasoning, such as latency. You probably configured your backend this ways because the ALSA library docs say that it is recommended to write to the sound card in multiples of the fragment size. However deducing from this that you hence should configure the fragment size to the codec frame size is wrong! The best way to implement playback these days for ALSA is to write as much as snd_pcm_avail() tells you to each time you wake up due to POLLOUT on the sound card. If that is not a multiple of your codec frame size then you need to buffer the the remainder of the decoded data yourself in system memory. The ALSA fragment size you should normally set as large as possible given your latency constraints but that you have at least two fragments in your buffer size. I hope this explains a bit how frag_size/buffer_size should be chosen. If you have questions, just ask. (Oh, ALSA uses the term 'period' for what I call 'fragment' above. It's synonymous)