Different length audio files from remote recording

When a recording is created using two different computers, the audio files will always be different lengths. Why are they different lengths? …how big can the difference get?

The symptom: Different lengths

The following screenshot has two tracks. The tracks are simply two WAV files, imported, and they both are aligned to begin at exactly zero seconds. I didn’t clip or cut these; they are simply different lengths.

There’re no waveforms visible because these tracks are both completely silent at their end. The view of the wave forms is zoomed in to show about two seconds of time across the width.

The upper track is 22 minutes, ~32 seconds long. (It ends halfway between 31.8 and 32.2 seconds.)

The lower track is 22 minutes, ~32.6 seconds long. (It ends just a smidge beyond 32.6 seconds.)

What does it mean to have different lengths

This turns out to be a real problem. If you were to simply listen to these tracks simultaneously, we start out at zero seconds having a “Hello”, “hi” conversation with nice back and forth, perfectly in time.

But by the end of our conversation (22+ minutes later) the lower track is lagging behind. If this were a recording of us, clapping excitedly at a concert… after 22 minutes, one of us is on the beat, and one of us is on the back beat clapping more than a half second later.

How did we get this far off after just 20 minutes of recording?

How digital audio works

These are WAV files. What’s in a WAV file? It’s little more than a long string of numbers. Some device notes “how loud does the microphone say it is now?” as a number. It does that very quickly, but it’s simply a string of numbers, “how loud?” and “how loud now?” over and over and over…

How many numbers? You may have seen “sample rate” mentioned around podcasting. Sometimes you see “44k” or “48k” as suggested, or possible, sample rates. That’s the number of samples EVERY SECOND.

A one-second-long WAV file would have 44,000 of those “how loud?” samples. Exactly how many isn’t so important. What is important is that it is a certain number of samples per second. (Okay fine, but exactly how many samples?)

44k is very slow for a computer. Computer CPUs (desktop, notebook, phone, dedicated recorders — are all simply computers) do things millions (megahertz) or B_illions (gigahertz) of times per second.

A program tells the computer, I need to do something every 1/44000 of a second. (Or whatever the exact fraction is for the sample rate). The program then samples the volume reported by the microphone. And that’s one more number added to the WAV file.

How computers keep time

Computers know their nominal [what it is supposed to be] CPU speed. They can simply compute how many of those CPU cycles do I count to, before I advance the clock? Time is kept at the microsecond level. (That’s 1/100,000 of a second.) A computer’s “clock” ticks ahead 100,000 times every a second. The computer simply counts oscillations of it’s internal clock… 1, 2, 3, … up to the right [large!] number, and then bumps the clock ahead 1 microsecond.

Just think: counting counting counting TICK (the clock advances one microsecond) counting counting counting TICK … TICK … TICK When the clock advances 1/44,000 of a second the program checks how loud the microphone is and adds a number to our WAV file.

Now use two computers…

You and a friend are in separate rooms. The doors to the hallway are open. You hear a third person counting time. 1. 2. 3. 4. 1. 2. 3. 4. “you guys ready?” “you have the beat?” “you’re really good at keeping time?” …and the doors are closed. How long do you think you’re time keeping is in synch?

:thinking: Time synchronization among computers—all computers—is the single most important thing which keeps our entire global civilization working. I’m not exaggerating.

Now let’s run two recording programs.

Let’s say we’re using Zencastr, Riverside, whatever. So that means my Chrome web browser on my computer, and my guest’s Chrome web browser on their computer. Those are two programs, on two different computers.

We press record and the two programs start recording. One computer is tick tick ticking at it’s rate, and the program is writing numbers to that WAV file. The other computer is tick tick ticking at it’s rate, and the program is writing numbers to the second WAV file.

And when we stop, we see the difference in our computers time-keeping abilities. One computer recorded more “how loud?” samples than the other. Computer geeks would say: One clock was running faster than the other.

The problem is not with our programs. The problem is with our clocks!

:bookmark: Fun fact: Keeping accurate time on old-timey sailing ships was insanely hard. Check out the book Longitude by Dava Sobel.

How would you fix this?

Now the fun part.

How do I fix my progressively farther out of sync tracks?

How do I fix the actual problem [with the clocks]?


Fascinating, @Craig! Only time I’ve had this happen was when someone muted themselves on a Zoom call recording. Zoom just skips (or did–try not to use that now) when someone mutes, so it throws off the timing.

How you fix this from your sample? Waiting to hear from folks!


I’ve been dealing with this from recording initial episodes over Zoom, double-enders but not recorded in programs beyond Zoom itself (recording locally). So the drift I’m dealing with is really Zoom/internet drift as opposed to clock drift (which is also present, but micro in comparison).

First off, it should be noted that recording at the same sampling rate and bit depth is a baseline necessity. Not doing so will cause additional drifting. I realize that is distinct from what you are pointing to, @craig; I mention it in case others are unaware.

I have focused my DAW learning in REAPER, so I can’t speak to what other programs can do or how they might do it, but chances are there is similar capability in programs like Hindenburg, Adobe, ProTools, etc. I would guess Audacity is not in that list, but I don’t know.

My work at solving this problem has been within a larger margin of error than the problem you are seeking to fix in your example. Fixing clock drift, in my understanding, is something that only really happens on a Hollywood budget - there are devices to manage synchrony between devices, and they are exceedingly expensive. Most folks simply tolerate clock drift.

I am curious about whether this is really only a matter of clock drift in your case - 30 seconds over 20 minutes seems like an awful lot. I have read reports comparing other sources of time drift cropping up in different cloud recording services, e.g. Zencastr, Ecastr, Riverside, etc. In my understanding clock drift is on a micro scale compared to other causes of drift.

What I have been doing has involved aligning transient peaks and stretching time. In REAPER this process can be automated to a degree. I’ll post a few links below which show how this is done in REAPER; hopefully this will give folks enough to go on to seek out comparable solutions in whichever software they are running.

I hope these help - I’m happy to try and answer any additional questions these resources may raise.


Craig, thank you for the detailed explanation of how sound works. Digitally, that is. It’s one of the things the Godin workshop didn’t cover at all. I understand why, but understanding fundamentals like this sure do help when it comes to problem-solving.

To me the solution is to cut the timeline and realign as needed. If you are recording over the Internet there are going to be sync problems, inherently.

This is because of reasons like you mentioned, but mostly because the two people are not actually hearing and recording in sync because of the delay that the Internet connection creates.

So I cut the timeline and realign as needed.

The very existence of this problem reminds me how amazing it is that we are able to record each other over such long distances in the first place!

What a time to be alive.

Thanks again for a great article.


Assuming most of the time we don’t have guests speaking at the same time, we can even slice the conversation into discrete clips and then position as needed - sometimes because of the lag effect from the internet an affirmative response arrives late and the guest as started speaking again - accidental overlap. Create splits, trim appropriately and then slide the “Yes, you are right” into the gap you created = no overlap.

It is a lot of work to create clips for the whole piece (some DAWs can automate the process) but for a time-drifting nightmare interview, it can be a way to save it.

Thanks for the @craig treatment of time-slip and the helpful reflections from everyone else.


@craig - I may be the only person who’s been editing for almost three years and hadn’t yet fully wrapped my head around this “sync slip” issue. I have dealt with times where tracks are slightly “off.” Your explanation (sampling rate was my “Aha!”) is extremely clear. Everyone’s replies (as always) also adds to my understanding.

Bottom line, for me, is that moving through the file during the editing phase uncovers where editing is needed for alignment and the issue is taken care of there.

Even if I were at the point where I hit Stop Record and then Export, I would still insert that one step where I would scrub forward to each point where the speakers change or crosstalk, and make adjustments as needed. Then, Publish/Ship.

A great initial post and a very solid and helpful string of comments.
Good stuff!


I’ve not had this issue using double-ender services like riverside.fm for recording episodes…


@craig - for your enjoyment - Kilogram: Introduction | NIST


Hey @AnetteCarlisle this is a really good question. First off this is not about the clock speed issue that @craig is exploring. This is a Zoom thing (fail ? :grinning:).

Solutions I’ve found so far:

  1. Tell everyone on the Zoom call NOT TO MUTE EVER during the call. I know it’s counter-intuative but this avoids the problem.

In post production:

  1. if folks have muted then use the master single track that Zoom makes with everyone on a single track as your reference. Drop this track into your DAW and this helps syncronise the audio by lining up the out of sync audio.


  1. Ditch Zoom and use Zencaster free plan! Zencater’s free plan is amazing and I recommend using it instead of Zoom. No sync issues, it’s a double ender recorder and you get 128 kbps mp3. I don’t know how long they will offer this for free but use it while you can. No service is perfect but it’s better than Zoom.

[quote=“craig, post:1, topic:2288”]

How would you fix this?

Ooow nice puzzle @craig

I’d use the single muxed track as a reference to manually line up each out of sync audio track. Painful but effective.

I don’t think you can solve it for remote recordings however for face to face recordings where you are physically present in the same room as the guest(s) then the solution is to have a Bluetooth clock that links to the mics and sync the clocks on each mic. Otherwise we can rely on the audio interface/recording device to keep time.

…aaaaand much later, I’m wondering if I could have simply grabbed the right edge of the audio clipping (the entire track, imported, is just one loooong clipping) … and streeeeeeetched the right edge. I’ll have to try that the next time I have Hindenberg open— I wonder if it would just repeat a few 1/44000sec data points along the full width to stretch back my few seconds I needed . . . :thinking:

Been there. Hated it. Tried to be clever, just made more work for myself! Ugh

Edit: should have read the whole thread before replying. Going to check out Zencaster’s free option for now.

Thanks to @craig for the post and everyone else for their contributions.


@Jey, do you ever have any issues with the guest shutting down the Zencastr window before the recording is complete? Any concerns with guests being anxious about a “new piece of software” when the receive the Zencastr link? I did sign up and viewed a couple of videos but not jumping over just yet. I always appreciate your thorough AND thoughtful approach that always comes down on the side of what works best (or at least very, very well) so your observations are valuable. Thanks for the post above.

I’ve pulled 100+ people through Zencastr now… mostly people who’ve never used it. I just mention it a lot … on the “how to be a guest” page I link people to, in the “thanks for signing up” email I tell them

I’ll send you a Zencastr.com call URL (Chrome web browser required) about 15 minutes before the scheduled time…

I’ve had a few tech problems where we couldn’t get it working. But mostly, it’s all gogogo.


Hi @David3560 , it’s great to hear from you and thanks for your kind words. I’ve not had the situation of a guest cutting out too early or a Zencastr call - so I can’t really say. I’ve not used Zencastr in anger anywhere nearly as much as @craig

But you’ve prompted me to rehearse this contingency. I’ll practice this with my production assistant (i.e. my son! :grinning:)

The opposite actually. I like that the guest just has a link to click and there is no software for the guest to install, no settings on their side to really worry about. Internet speed/quality shouldn’t be an issue. It’s as robust as possible.

Though that being said I still plug in my Zoom P4 to my laptop to back up the conversation. My motto is, Trust but Backup.


I know this topic has been solved but can I just add one more thing?

Clock sync issues that @craig describes are also a danger if you decide you want to hook up more than one USB mic directly into your computer. It’s one of the things you must check if you decide to do this. And it’s the key reason I don’t recommend doing this for rookies.

As ever Apple Macs have fewer issues than Windows machines with multiple USB mics plugged directly into a computer. If you decide to go down this route then the best thing to do is to use the same brand and model of the mics, if possible.

I’m not saying that mixing up brands and models of mics will cause sync issues but you should definitely test before going live. The longer your recording duration the bigger the danger of sync issues. So test for as long as your usual record time (not publish time!).

Go nuts!

1 Like

I have just been checking this out. Wow. I can’t wait to use it to record the next podcast.

1 Like

Thanks so much @Jey and @craig!

1 Like