
Cosine similarity calculator code#
However, cosine similarity assumes that the two input vectors are not zero-vectors, so it makes sense to me to throw an exception when it happens.Īs 200_success advised, you can omit the entire exception-throwing code and just proceed to division by zero: that will return a special NaN value ("Not a Number"). Note that sqrt(d_a * d_b) = 0 only when at least one of the input vectors is a zero vector. However, usually a single space is put straight after for: for (int i = 0. Usually people don't put a single space before the semicolon. You can get rid of one call to sqrt double cosine_similarity2(double *A, double *B, unsigned int size) "cosine similarity is not defined whenever one or both " Std::vector::iterator A_iter = A.begin() įor( A_iter != A.end() A_iter++, B_iter++ ) If (A.size() ::iterator B_iter = B.begin() Throw std::logic_error("Vector A and Vector B are not the same size") double cosine_similarity(double *A, double *B, unsigned int size)ĭouble cosine_similarity_vectors(std::vector A, std::vectorB) Use the pointers rather than indexing them. Since the code is already passing pointers The the most efficient way to write the function is to use direct addressing rather than indirect addressing. this 'unit-length normalization' is often called. thus we can 'unit-normalize' document vectors d d d and then compute dot product on them and get cosine. cosine ( d 1, d 2) d 1 T d 2 d 1 d 2 d 1 T d 2. This is less readable and maintainable than putting the initialization on separate lines: double mul = 0.0 ĭirect Addressing Versus Indirect Addressing If documents have unit length, then cosine similarity is the same as Dot Product. Dot product doesnt use the mean in its calculation. The adjusted cosine similarity subtracts the mean before calculating cosine similarity. Both of these measures take into account differences in magnitude.


Item 1 and 2Ībove belong in the std::logic_error class, item 3 belongs in the You can use adjusted cosine similarity or dot product(as referenced in the answer you linked). Two of the classes are std::runtime_error and std::logic_error. Either d_a or d_b can add up to zero, which can lead to division by zeroĬ++ provides the exception class std:exception that has multiple sub classes.Size variable is the size of the larger vector this can lead to unknown Is the size of the shorter vector this is not a problem, but if the The vectors A and B may not be the same length, if the size variable.The variable size may be zero which will lead to division by zero.There are a number of possible errors in the code:
