Day 1 Report Repair

This is my attempt to solve Day 1.

sample <- read_lines("samples/day_01_sample.txt") %>% as.integer()
actual <- read_lines("inputs/day_01_input.txt") %>% as.integer()

1.1 Part 1

The naive approach to solving today’s problem is to simple loop through the list twice, checking to see if the condition is met. If it is, immediately return that value. If we reach the end of both loops without finding the solution we can return NULL to indicate no result found.

part_1_naive <- function(input, target) {
  if (length(input) == 0) stop("empty list")
  
  for (i in 1:(length(input)-1)) {
    for (j in (i+1):length(input)) {
      if (input[[i]] + input[[j]] == target)
        return (input[[i]] * input[[j]])
    }
  }
  NULL
}

We can test that this method works:

part_1_naive(sample, 2020) == 514579
## [1] TRUE

This approach could be improved, we can sort the list, then create two pointers: the start and end of the list. If we add these two numbers up and exceed the target then we can decrease the higher number pointer. If the number is lower that the target we increase the lower number pointer.

part_1_improved <- function(input, target) {
  if (length(input) == 0) stop("empty list")
  
  i <- 1
  j <- length(input)
  
  input <- sort(input)
  
  while (i <= j) {
    v <- input[[i]] + input[[j]]
    if (v == target) {
      return (input[[i]] * input[[j]])
    } else if (v < target) {
      i <- i + 1
    } else {
      j <- j - 1
    }
  }
  
  NULL
}

Again, we can test that this method works.

part_1_improved(sample, 2020) == 514579
## [1] TRUE

We can now use our improved algorithm to get the result for part 1:

part_1_improved(actual, 2020)
## [1] 793524

1.1.1 is the improved algorithm any better?

bench::mark(part_1_naive(sample, 2020),
            part_1_improved(sample, 2020))
## # A tibble: 2 x 6
##   expression                         min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr>                    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 part_1_naive(sample, 2020)      2.85µs   3.51µs   255336.        0B     25.5
## 2 part_1_improved(sample, 2020)   61.5µs  65.94µs    14364.        0B     17.1

For me, the improved algorithm actually takes longer on the sample data! This is because the improved algorithm has to sort the data, which is costly, and then it performs far more comparisons per iteration.

However, when we have more data, these extra steps lead to big improvements, as can be seen when running with the actual data.

bench::mark(part_1_naive(actual, 2020),
            part_1_improved(actual, 2020))
## # A tibble: 2 x 6
##   expression                         min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr>                    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 part_1_naive(actual, 2020)     681.5µs  711.3µs     1178.        0B      0  
## 2 part_1_improved(actual, 2020)   84.5µs   90.2µs    10028.    1.66KB     14.9

The improved algorithm was roughly 10x faster for me on the actual data.

1.2 Part 2

We can alter the naive approach from part 1 by adding in an extra for loop.

part_2_naive <- function(input, target) {
  if (length(input) == 0) stop("empty list")
  
  for (i in 1:(length(input) - 2)) {
    for (j in (i + 1):(length(input) - 1)) {
      for (k in (j + 1):length(input)) {
        if (input[[i]] + input[[j]] + input[[k]] == target)
          return (prod(input[c(i, j, k)]))
      }
    }
  }
  NULL
}

Again, we can test that this method works on the sample.

part_2_naive(sample, 2020) == 241861950
## [1] TRUE

It’s much harder to adapt the improved algorithm though. My best approach involves a binary search.

This function takes a sorted array, a target value, a current index into the array, and the current min/max extents to search. If the value at position i in array is less than target, we look in the left half of the array, chopping it in half by reducing max. Likewise, if the value is greater than target, we look in the right half of the array, chopping it in half by increasing min.

binary_search <- function(arr, target, i, min, max) {
  if (arr[[i]] == target) {
    return(i)
  } else if (target < arr[[i]]) {
    max <- i
    ni <- (min + i) %/% 2
  } else {
    min <- i
    ni <- (max + i) %/% 2
  }
  
  if (ni == i) {
    NULL
  } else {
    binary_search(arr, target, ni, min, max)
  }
}

Our improved algorithm for part 2 involves creating two for loops to search from the end of the (sorted) input. We then remove the third loop by using a binary search to find the target value quicker.

In the two for loops we first calculate the value of the adding these two items together, quickly checking to see if there is a possible solution (and exiting early if not). Then, we perform a binary search on the rest of the array for the amount required to make v equal target (2020). If we find the value, then we simply return the results. Else we continue the search by looping.

part_2_improved <- function(input, target) {
  if (length(input) == 0) stop("empty list")
  
  input <- sort(input)
  
  for (j in length(input):3) {
    for (i in (j - 1):2) {
      v <- sum(input[c(i, j)])
      # early termination
      if (v >= target) next()
      if (v + input[[i - 1]] < target) next()
      
      # binary search for a suitable value
      k <- binary_search(input, target - v, (i + 1) %/% 2, 1, (i - 1))
      if (!is.null(k)) {
        return (prod(input[c(i, j, k)]))
      }
    }
  }
  
  NULL
}

We can once again test that our method works on the sample data:

part_2_improved(sample, 2020) == 241861950
## [1] TRUE

We can now run both approaches to see what the result is:

part_2_naive(actual, 2020)
## [1] 61515678
part_2_improved(actual, 2020)
## [1] 61515678

1.2.1 Is our improved algorithm better this time?

Once again, we can benchmark the two approaches.

bench::mark(part_2_naive(actual, 2020),
            part_2_improved(actual, 2020))
## # A tibble: 2 x 6
##   expression                         min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr>                    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 part_2_naive(actual, 2020)       111ms  113.7ms      8.64        0B     2.16
## 2 part_2_improved(actual, 2020)     15ms   15.2ms     65.5     1.66KB   480.

On my machine the improved approach is again about 10x quicker.

1.3 Extra: Implementing in Python

I wanted to have a go at implementing my improved algorithm in python, but we can take advantage of sets to get even better performance and do away with the need to sort the list and perform binary search (asking if a value is in a set is a constant time operation in python - it’s near instantaneous).

library(reticulate)
def part_2_py(input_values, target):
  # convert the 
  s = { i for i in input_values }
  
  # outer loop: we need to have at least 3 items to check
  while len(s) > 2:
    # remove a value from the set
    j = s.pop()
    # take a copy of the set: if we don't find a solution for j then we need to
    # continue with the set as it was at this point
    t = s.copy()
    while len(t) > 1:
      # remove a value from the copied set
      i = t.pop()
      # the value to reach the target
      k = target - i - j
      # is k in the set? I.e. does i + j + k == target?
      if k in t:
        return i * j * k
  # no solution, return none
  return none
bench::mark(part_2_improved(actual, 2020),
            py$part_2_py(actual, 2020))
## # A tibble: 2 x 6
##   expression                         min   median `itr/sec` mem_alloc `gc/sec`
##   <bch:expr>                    <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
## 1 part_2_improved(actual, 2020)   14.9ms  15.13ms      54.0    1.66KB    31.7 
## 2 py$part_2_py(actual, 2020)       1.2ms   1.26ms     773.    17.45KB     2.02

Elapsed Time: 6.654s