I have a quite few unique strings that I need to map to numbers 1,2,3.
How to count a simple number 1,2,3 from a random ascii string, so that approx 33.3% of strings will return 1, 33.3% of strings will return 2, and 33.3% of strings will return 3?
The string can be anything like XYZDA, JKLLK, DSAJKL, DASDSA
For example:
DASDSA could always return 1.
JKLLK could always return 2.
etc.
Could the alphabet ASCII values be somehow counted together with a for loop and then the total amount to be MOD 3, or is there a faster hash algorithm available? How is this done in PHP most efficiently?
Each unique string must always return the same number 1,2,3 when the function is called again.
2
Answers
So the ASCII alphabet has 52 characters (A-Z and a-z). If we make 3 sections of these 52 characters, we will divide the ASCII alphabet in 3 parts (33.33% each part).
Lets call A,B,C the 3 parts.
Given a random letter, it could be 33.33% in A section, 33.33% in B section and 33.33% in C section.
So given a random String, if we count the number of characters belonging to section A,B, and C (lets say that in the random string the section A has 2 characters, the section B has 1 and the section C has 4) and takes the max (4 , section C) the probability of this is near to 33,33%
So, now we map A -> number 1 , B -> number 2 , C-> number 3.
And this random String (as it had 4 characters of section C) would be mapped to a 3
Obviously, this depends of the dataset, but I think it’s a good aproach.
Let me know if I’m wrong at some point or if I didn’t explain it as good as needed
A simple solution is to compute the MD5, take the first byte and apply a modulo:
Output: