developer tip

제품 속성 보간

copycodes 2020. 12. 13. 09:51
반응형

제품 속성 보간


세 가지 속성 (브랜드, 가격, 성능)을 가진 두 가지 대안을 포함 하는 일련의 개별 선택 작업 에서 데이터 세트가 있습니다. 이 데이터에서 저는 사후 분포에서 1000 개의 추첨을 가져 와서 효용을 계산하고 궁극적으로 각 개인과 각 추첨에 대한 선호도 점유율을 계산하는 데 사용할 것입니다.

가격과 성능은 각각 개별 수준 (-.2, 0, .2) 및 (-.25, 0, .25)에서 테스트되었습니다. 테스트 된 속성 수준간에 유틸리티를 보간 할 수 있어야합니다. 지금은 선형 보간이 통계적으로 합리적이라고 가정 해 보겠습니다. 즉, 가격 @ 10 % 더 낮은 시나리오를 테스트하려는 경우 가격 대비 유틸리티를 보간하는 가장 효율적인 방법은 무엇입니까? 보간을 수행하는 매끄 럽거나 효율적인 방법을 생각할 수 없었습니다. 나는 plyr의 mdply 함수를 사용하여 mapply () 접근 방식을 사용했습니다.

다음은 몇 가지 데이터와 현재 접근 방식입니다.

library(plyr)
#draws from posterior, 2 respondents, 2 draws each
draw <- list(structure(c(-2.403, -2.295, 3.198, 1.378, 0.159, 1.531, 
1.567, -1.716, -4.244, 0.819, -1.121, -0.622, 1.519, 1.731, -1.779, 
2.84), .Dim = c(2L, 8L), .Dimnames = list(NULL, c("brand_1", 
"brand_2", "price_1", "price_2", "price_3", "perf_1", "perf_2", 
"perf_3"))), structure(c(-4.794, -2.147, -1.912, 0.241, 0.084, 
0.31, 0.093, -0.249, 0.054, -0.042, 0.248, -0.737, -1.775, 1.803, 
0.73, -0.505), .Dim = c(2L, 8L), .Dimnames = list(NULL, c("brand_1", 
"brand_2", "price_1", "price_2", "price_3", "perf_1", "perf_2", 
"perf_3")))) 

#define attributes for each brand: brand constant, price, performance
b1 <- c(1, .15, .25)
b2 <- c(2, .1, .2)

#Create data.frame out of attribute lists. Wil use mdply to go through each 
interpolateCombos <- data.frame(xout = c(b1,b2), 
                                atts = rep(c("Brand", "Price", "Performance"), 2),
                                i = rep(1:2, each = 3),
                                stringsAsFactors = FALSE)

#Find point along line. Tried approx(), but too slow

findInt <- function(x1,x2,y1,y2,reqx) {
  range <- x2 - x1
  diff <- reqx - x1
  out <- y1 + ((y2 - y1)/range) * diff
  return(out)
}


calcInterpolate <- function(xout, atts, i){
  if (atts == "Brand") {
    breaks <- 1:2
    cols <- 1:2
  } else if (atts == "Price"){
    breaks <- c(-.2, 0, .2)
    cols <- 3:5
  } else {
    breaks <- c(-.25, 0, .25)
    cols <- 6:8
  }

  utils <- draw[[i]][, cols]

  if (atts == "Brand" | xout %in% breaks){ #Brand can't be interpolated or if level matches a break
    out <- data.frame(out = utils[, match(xout, breaks)])
    } else{ #Must interpolate    
    mi <- min(which(breaks <= xout))
    ma <- max(which(breaks >= xout))
    out <- data.frame(out = findInt(breaks[mi], breaks[ma], utils[, mi], utils[,ma], xout))
    }
  out$draw <- 1:nrow(utils)
  return(out)
}
out <- mdply(interpolateCombos, calcInterpolate)

To provide context on what I'm trying to accomplish without interpolating attribute levels, here's how I'd do that. Note the brands are now defined in terms of their column reference. p1 & p2 refer to the product definition, u1 & u2 are the utility, and s1, s2 are the preference shares for that draw.

Any nudge in the right direction would be appreciated. My real case has 10 products with 8 attributes each. At 10k draws, my 8gb of ram are crapping out, but I can't get out of this rabbit hole I've dug myself.

p1 <- c(1,2,1)
p2 <- c(2,1,2)


FUN <- function(x, p1, p2) {
  bases <- c(0,2,5)

  u1 <- rowSums(x[, bases + p1])
  u2 <- rowSums(x[, bases + p2])
  sumExp <- exp(u1) + exp(u2)
  s1 <- exp(u1) / sumExp
  s2 <- exp(u2) / sumExp
  return(cbind(s1,s2))
}
lapply(draw, FUN, p1 = p1, p2 = p2)

[[1]]
                s1        s2
[1,] 0.00107646039 0.9989235
[2,] 0.00009391749 0.9999061

[[2]]
              s1        s2
[1,] 0.299432858 0.7005671
[2,] 0.004123175 0.9958768

A somewhat unconventional way to get what you desire is to build a global ranking of all your products using your 10k draws.

Use each draw as a source of binary contests between the 10 products, and sum the results of these contests over all draws.

This will give you a final "leader-board" for your 10 products. From this you have relative utility across all consumers, or you can assign an absolute value based on the number of wins (and optionally, the "strength" of the alternative in each contest) for each product.

When you want to test a new product with a different attribute profile find its sparse(st) representation as a vector sum of (weighted) other sample products, and you can run the contest again with the win probabilities weighted by the contribution weights of the component attribute vectors.

The advantage of this is that simulating the contest is efficient, and the global ranking combined with representing new products as sparse vector sums of existing data allows much pondering and interpretation of the results, which is useful when you're considering strategies to beat the competition's product attributes.

To find a sparse (descriptive) representation of your new product (y) solve Ax = y where A is your matrix of existing products (rows as their attribute vectors), and y is a vector of weights of contributions from your existing products. You want to minimize the non-zero entries in y. Check out Donoho DL article on the fast homotopy method (like a power iteration) to solve l0-l1 minimization quickly to find sparse representations.

When you have this (or a weighted average of sparse representations) you can reason usefully about the performance of your new product based on the model set up by your existing preference draws.

The advantage of sparseness as a representation is it allows you to reason usefully, plus, the more features or product you have, the better, since the more likely the product is to be sparsely representable by them. So you can scale to big matrices and get really useful results with a quick algorithm.

참고URL : https://stackoverflow.com/questions/7898833/interpolate-product-attributes

반응형