Why is reduce performing better in this use case of finding duplicates in an array [closed]
Clash Royale CLAN TAG#URR8PPP
Why is reduce performing better in this use case of finding duplicates in an array [closed]
Recent problem stated that given an array of very large size we need to find first duplicate value and return it (not the index but the value).
So given an example array of [1,2,3,4,5,3]
we need to return 3
since it is the first value with a duplicate in the array.
[1,2,3,4,5,3]
3
Now the array in question could be 100K and much larger.
Given those requirements there are few solutions that come to mind right away.
1.Using ES6 with Set solution:
function firstDuplicate(a)
var r = new Set();
for (var e of a)
if (r.has(e)) return e;
else r.add(e);
2.Using for loop with lastIndexOf:
const firstDuplicate = (arr) =>
for(var i = 0; i < arr.length ; i++)
if(arr.lastIndexOf(arr[i]) > i)
return arr[i]
3.Using reduce & lastIndexOf:
Note: I am aware that this approach mutates the input via splice
used to exit from the reduce...
splice
const firstDuplicate = (arr) =>
let result = null
arr.reduce((r, c, i, a) =>
let index = a.lastIndexOf(c)
if (index > i)
result = c // we got a dublicate
a.splice(1) // break of reduce!
)
return result
... Please feel free to add/suggest other solutions as well ...
Benchmarks:
Update:
As per the comments I validated that if the input has a duplicate VERY close in the beginning (like in the first 5 values) then the Set
approach is faster. But after that the reduce wins consistently.
Set
Question:
Why is the performance of the reduce & lastIndexOf
better in this situation? Why with this specific input reduce performs better?
reduce & lastIndexOf
Update: Valid comment was posted in regards to the results being dependent on the input. I updated the question to be with the given input what makes reduce
perform better.
reduce
As you can see from the benchmarks both show the reduce approach as the fastest. Now I assume this is due to the ES6 optimizations etc but that seems kind of too vague and with not that many details. I was hoping for some more in depth analysis here as to why this is more performant since I would not have guessed it would be to that extend.
It does iterate through every item and does lastIndexOf
on the array passed. Also it does break with the hack of arr.slice(1)
:). The for loop goes through each element as well but more surprisingly the Set option (and I would guess a similar Map option) did not perform faster!
lastIndexOf
arr.slice(1)
Here is the code snippets just so you can get an idea of the output of them all.
Note: The array in the benchmark links above is much larger etc:
const arr = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 3]
const findDubWithSet = a =>
var r = new Set();
for (var e of a)
if (r.has(e)) return e;
else r.add(e);
return -1;
const findDubWithForLoop = arr =>
for (var i = 0; i < arr.length; i++)
if (arr.lastIndexOf(arr[i]) > i)
return arr[i]
const findWithReduce = (arr) =>
var result = null
arr.reduce((r, c, i, a) =>
var index = a.lastIndexOf(c)
if (index > i)
result = c
a.splice(1)
)
return result
console.log(findDubWithSet(arr))
console.log(findDubWithForLoop(arr))
console.log(findWithReduce(arr))
PS. I am aware of the overall critique of the benchmarking as a practice but in this case it does show a pattern which does rise a question. Given a huge array a method that is not expected to be very performant is. I do not agree that this is same question as this.
Please edit the question to limit it to a specific problem with enough detail to identify an adequate answer. Avoid asking multiple distinct questions at once. See the How to Ask page for help clarifying this question. If this question can be reworded to fit the rules in the help center, please edit the question.
reduce()
findDubWithSet
@HugoTeixeira Ok I validated your comment. If I have the values REALLY close by like [1,3,4,5,1] then
Set
ends up being faster! Good call.– Akrion
Aug 12 at 3:33
Set
Possible duplicate of What is microbenchmarking?
– feeling unwelcome
Aug 12 at 3:52
@feelingunwelcome Is it? I mean I get the overall
critique
of the benchmarking but you are implying that it is useless as practice and ask questions based on its results? You think locally I would not get similar results which would again make we wonder ... why is reduce beating Set/for loops
?– Akrion
Aug 12 at 4:29
critique
Set/for loops
What do you mean by "first duplicate value". For instance, in
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 3, 2]
, is 2
or 3
the first duplicate? We can't tell which you mean, because the 3 functions you've provided produce different results.– Makyen
Aug 12 at 10:34
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 3, 2]
2
3
1 Answer
1
Looks like this is faster:
var arr = [1, 2, 3, 4, 5, 3];
const firstDuplicateInArray = array =>
while (array.length > 0)
var f = array.shift();
var i = array.indexOf(f);
if (~i) return array[i];
return -1;
;
console.log(firstDuplicateInArray(arr))
It appears so. Thanks! Still wondering however (as per the question) why is the reduce approach more performant.
– Akrion
Aug 12 at 4:08
@Akrion, as @HugoTeixeira stated,
reduce
is not more performant. The performance of your examples is input-dependent. So your question looks incorrect to me.– Kosh Very
Aug 12 at 5:13
reduce
The actual question is why in this case with this input is reduce more performant really ...
– Akrion
Aug 12 at 5:14
The speed of the
reduce()
call is really impressive. But I should note that the performance of the algorithms depend on the input. For example, if the duplicate is at the very beginning of the array (e.g., [1, 1, ...]), thenfindDubWithSet
is going to be the fastest. Ideally you should run several benchmarks and check how each function performs as the duplicate moves through the array. This would give us a better view of the average case.– HugoTeixeira
Aug 12 at 3:27