Day 4 Passport Processing

This is my attempt to solve Day 4.

sample <- read_lines("samples/day_04_sample.txt")
actual <- read_lines("inputs/day_04_input.txt")

4.1 Part 1

The fields for todays puzzle are:

byr (Birth Year)
iyr (Issue Year)
eyr (Expiration Year)
hgt (Height)
hcl (Hair Color)
ecl (Eye Color)
pid (Passport ID)
cid (Country ID)

For a record to be treated as valid we must have all of the fields present, except for the country id which is optional.

Records are separated by blank new lines. I am going to first collapse the data into a single string, separated by “,”. The blank lines will then be “,,”, so we can simply split our input there. This will give us one string per records. We then need to sort out each record by splitting each string at either a space or a comma, then using the unglue_data function to extra a data frame with a column “key” for the left hand part and a column “value” for the right hand part.

By using map_dfr we will be able to combine each record’s key/value pairs into a single dataframe, but we add a column “record” to keep track of which record we are dealing with.

process_data <- function(input) {
  input %>%
    paste(collapse = ",") %>%
    str_split(",,") %>%
    # take just the first result returned by str_split, it will return a list
    # with one item which contains the results
    pluck(1) %>%
    str_split("[, ]") %>%
    map_dfr(unglue::unglue_data, "{key}:{value}", .id = "record")
}
process_data(sample)

##    record key     value
## 1       1 ecl       gry
## 2       1 pid 860033327
## 3       1 eyr      2020
## 4       1 hcl   #fffffd
## 5       1 byr      1937
## 6       1 iyr      2017
## 7       1 cid       147
## 8       1 hgt     183cm
## 9       2 iyr      2013
## 10      2 ecl       amb
## 11      2 cid       350
## 12      2 eyr      2023
## 13      2 pid 028048884
## 14      2 hcl   #cfa07d
## 15      2 byr      1929
## 16      3 hcl   #ae17e1
## 17      3 iyr      2013
## 18      3 eyr      2024
## 19      3 ecl       brn
## 20      3 pid 760753108
## 21      3 byr      1931
## 22      3 hgt     179cm
## 23      4 hcl   #cfa07d
## 24      4 eyr      2025
## 25      4 pid 166559648
## 26      4 iyr      2011
## 27      4 ecl       brn
## 28      4 hgt      59in

get_valid_records <- function(input) {
  input %>%
    process_data() %>%
    filter(key != "cid") %>%
    group_by(record) %>%
    filter(n() == 7) %>%
    ungroup()
}

part_1 <- function(input) {
  input %>%
    get_valid_records() %>%
    distinct(record) %>%
    nrow()
}

We can test our function on the sample:

part_1(sample) == 2

## [1] TRUE

Now we can run our function on the actual data:

part_1(actual)

## [1] 233

4.2 Part 2

We now need to validate the data in the passports:

byr (Birth Year) - four digits; at least 1920 and at most 2002.
iyr (Issue Year) - four digits; at least 2010 and at most 2020.
eyr (Expiration Year) - four digits; at least 2020 and at most 2030.
hgt (Height) - a number followed by either cm or in:
- If cm, the number must be at least 150 and at most 193.
- If in, the number must be at least 59 and at most 76.
hcl (Hair Color) - a # followed by exactly six characters 0-9 or a-f.
ecl (Eye Color) - exactly one of: amb blu brn gry grn hzl oth.
pid (Passport ID) - a nine-digit number, including leading zeroes.
cid (Country ID) - ignored, missing or not.

The approach I am going to take for part 2 is to build some helper validation functions for the years and the height parts of the record, and then filter the rows to just retain valid records. To make it easier to do this I will first pivot the data from long format to wide, so each passport is one row of data.

part_2 <- function(input) {
  validate_years <- function(y, min, max) {
    yi <- suppressWarnings(as.integer(y))
    ifelse(is.na(yi), FALSE, min <= yi & yi <= max)
  }
  
  validate_height <- function(h) {
    hv <- suppressWarnings(as.integer(str_sub(h, 1, -3)))
    ht <- str_sub(h, -2, -1)
    case_when(is.na(hv) ~ FALSE,
              ht == "cm" ~ 150 <= hv & hv <= 193,
              ht == "in" ~  59 <= hv & hv <=  76,
              TRUE ~ FALSE)
  }
  
  input %>%
    get_valid_records() %>%
    pivot_wider(names_from = key, values_from = value) %>%
    filter(validate_years(byr, 1920, 2002),
           validate_years(iyr, 2010, 2020),
           validate_years(eyr, 2020, 2030),
           validate_height(hgt),
           str_detect(hcl, "^#[0-9a-f]{6}$"),
           ecl %in% c("amb", "blu", "brn", "gry", "grn", "hzl", "oth"),
           str_detect(pid, "^\\d{9}$"))
}

The provided test cases don’t use the initial sample data, so let’s just run the function and see if it does not error.

part_2(sample)

## # A tibble: 2 x 8
##   record ecl   pid       eyr   hcl     byr   iyr   hgt  
##   <chr>  <chr> <chr>     <chr> <chr>   <chr> <chr> <chr>
## 1 1      gry   860033327 2020  #fffffd 1937  2017  183cm
## 2 3      brn   760753108 2024  #ae17e1 1931  2013  179cm

It seems to work, so let’s run on our actual data

nrow(part_2(actual))

## [1] 111

4.3 Extra: Solving with regular expressions

This could be reduced to simply solving with regular expressions. First let’s create a function to convert the input into a single string.

records_as_strings <- function(input) {
  input %>%
    str_replace("^$", "\n") %>%
    paste(collapse = " ") %>%
    str_split(" \n ") %>%
    pluck(1)
}
records_as_strings(sample)

## [1] "ecl:gry pid:860033327 eyr:2020 hcl:#fffffd byr:1937 iyr:2017 cid:147 hgt:183cm"
## [2] "iyr:2013 ecl:amb cid:350 eyr:2023 pid:028048884 hcl:#cfa07d byr:1929"          
## [3] "hcl:#ae17e1 iyr:2013 eyr:2024 ecl:brn pid:760753108 byr:1931 hgt:179cm"        
## [4] "hcl:#cfa07d eyr:2025 pid:166559648 iyr:2011 ecl:brn hgt:59in"

Now, for part 1 we just need to run a regular expression for each of the different fields on each record. Using map gives us a list for each of the different fields, so we transpose to get the results of the regex’s for each record. We can then flatten these lists and run the all function to check to see if every regex was matched for that record.

actual %>%
  records_as_strings() %>%
  map(c("byr", "iyr", "eyr", "hgt", "hcl", "ecl", "pid"),
      str_detect,
      string = .) %>%
  transpose() %>%
  map_lgl(compose(all, flatten_lgl)) %>%
  sum()

## [1] 233

Part 2 is similar, but we need to match the value after the field name.

actual %>%
  records_as_strings() %>%
  map(c("byr:(19[2-9][0-9]|200[0-2])",
        "iyr:20(1[0-9]|20)",
        "eyr:20(2[0-9]|30)",
        "hgt:(1([5-8][0-9]|9[0-3])cm|(59|6[0-9]|7[0-6])in)",
        "hcl:#[0-9a-f]{6}",
        "ecl:(amb|blu|brn|gry|grn|hzl|oth)",
        "pid:\\d{9}(?!\\d)"), # negative lookahead: make sure the character that follows the 9th digit is not a digit
      str_detect,
      string = .) %>%
  transpose() %>%
  map_lgl(compose(all, flatten_lgl)) %>%
  sum()

## [1] 111

Elapsed Time: 3.216s