Using the General T-Q SOM in R#

If you prefer working in R and all you need is to classify C–Q loops using the General T–Q SOM (learn about the General T-Q SOM in our research paper and this tuturial) you can do it without leaving R. This is how you can do it:

Note that to classify a set of loops using the General T-Q SOM you simply need:

  1. The set of prototypes (loop types in the General T-Q SOM)

  2. The distance function (Dynamic Time Warping) to compute the similarity between your loops and each prototype.

Here’s how you can get both:

1. How to get the prototypes in R#

The prototypes that represent the loop types of a trained SOM are stored as a multidimensional array. The General T-Q SOM has dimensions: \(8 \times 8 \times 100 \times 2\) (has 8 rows and 8 columns for the map and 100 rows and 2 columns for each loop). You can download this multidimensional array as a netcdf file from this link. So, first, download the file and save it in your computer.

Next, make sure you have installed ncdf4. If not you can install it with:

install.packages("ncdf4")

Then open the file you just downloaded. See below how to do it. Note that in the code below, we use the function aperm to re-arrange the array so the first two indices refer to the SOM’s rows and columns (map_row, map_col) and the next two indices refer to the loop’s rows and columns (loop_row, loop_col). Also, make sure to replace the path in netcdf_file with the path where you saved the file you downloaded.

 1library(ncdf4)
 2# open netcdf file
 3netcdf_file <- "path/to/your/file/GTQSOM_prots.nc" #replace with your path
 4nc_data <- nc_open(netcdf_file)
 5
 6#Get data as array
 7varname <- "General_TQSOM_prots"
 8data_array <- ncvar_get(nc_data, varname)
 9
10# get current dimensions in data_array
11current_dims <- sapply(nc_data$var[[varname]]$dim, function(d) d$name)
12
13# permute array to match desired dimensions
14desired_order <- c("map_row", "map_col", "loop_row", "loop_col")
15perm_vector <- match(desired_order, current_dims)
16prototypes <- aperm(data_array, perm_vector)
17
18nc_close(nc_data)

At this point, you should have the prototypes stored in the variable prototypes as a multidimensional array with dimensions: \(8 \times 8 \times 100 \times 2\).

2. How to get the distance function in R#

The next thing you need is the distance function to compute the similarity between your loops and each prototype. The distance function used to train the general T-Q SOM was Dynamic Time Warping (DTW), and I suggest using the same distance function for classification. You can use whatever DTW implementation you like, just keep in mind that it should be compatible with two-dimensional sequences and that it implements the dependent warping step pattern as explained in shokoohi-yekta et al. 2017. Or just use the next implementation which is equivalent to the one included in HySOM and was used to train the General T-Q SOM.

sqr_dist <- function(x1, x2) {
   sum((x1 - x2)^2)
}

dtw <- function(x, x_prime) {

   # Get lengths of the sequences
   n <- dim(x)[1]
   m <- dim(x_prime)[1]

   # Initialize the cost matrix R with zeros
   R <- matrix(0, nrow = n, ncol = m)

   for (i in 1:n) {
      for (j in 1:m) {
         R[i, j] <- sqr_dist(x[i, ], x_prime[j, ])

         if (i > 1 || j > 1) {
         # Calculate possible predecessors
         prev_i  <- if (i > 1) R[i - 1, j]     else Inf
         prev_j  <- if (j > 1) R[i, j - 1]     else Inf
         prev_ij <- if (i > 1 && j > 1) R[i - 1, j - 1] else Inf
         R[i, j] <- R[i, j] + min(prev_i, prev_j, prev_ij)
         }
      }
   }

   # Return the square root of the final accumulated cost
   return(sqrt(R[n, m]))
}

3. Classify your loops using the prototypes and distance function#

Now you have both the prototypes and the distance function, so you can classify your loops by computing the distance between each loop and each prototype and assigning each loop to the closest prototype. To demonstrate how to do this, I’ll use a loop extracted directly from the prototypes and compute the distance from that loop to each prototype.

compute_dtw_distances <- function(prototypes, loop) {
distances <- matrix(NA, nrow = 8, ncol = 8)
for (i in 1:8) {
   for (j in 1:8) {
      distances[i, j] <- dtw(loop, prototypes[i, j , , ])
   }
}
return(distances)
}

get_BMU <- function(prototypes, loop  ) {
distances <- compute_dtw_distances(prototypes, loop)
return(which(distances == min(distances), arr.ind = TRUE))
}

get_distance_to_bmu <- function(prototypes, loop) {
distances <- compute_dtw_distances(prototypes, loop)
return(min(distances))
}

# Let's test it with a loop extracted directly from the prototypes
loop <- prototypes[2, 3, , ]

distances <- compute_dtw_distances(prototypes, loop)
bmu <- get_BMU(prototypes, loop)
distance_to_bmu <- get_distance_to_bmu(prototypes, loop)

# Print results
cat("Distance Matrix:\n")
print(round(distances, 3))
cat("\nBMU:\n")
print(get_BMU(prototypes, loop))
cat("\nDistance to BMU:\n")
print(get_distance_to_bmu(prototypes, loop))

The output of the code above should look like this:

Distance Matrix:
      [,1]  [,2]  [,3]  [,4]  [,5]  [,6]  [,7]  [,8]
[1,] 1.155 0.653 0.624 0.794 1.075 1.647 2.943 3.905
[2,] 0.842 0.457 0.000 0.695 0.966 1.405 2.074 3.369
[3,] 1.132 0.717 0.559 0.728 0.954 1.260 1.954 2.814
[4,] 1.294 1.208 1.048 1.034 1.103 1.352 1.769 2.471
[5,] 1.620 1.515 1.380 1.219 1.334 1.380 1.911 2.516
[6,] 2.117 1.737 1.723 1.616 1.609 1.791 1.940 2.192
[7,] 3.473 2.671 2.077 1.948 2.167 2.140 2.163 2.608
[8,] 4.728 3.794 2.693 2.316 2.906 2.731 2.518 2.929

BMU:
   row col
[1,]   2   3

Distance to BMU:
[1] 0

In this example, the loop we tested was extracted from the prototype located in row 2 and column 3 of the map, so it makes sense that the closest prototype (BMU) is located in row 2 and column 3 and that the distance to the BMU is 0.