What is the meaning of life?

miladheidari
May 25, 2023
259 views
0 votes
2 Answers

According to the docs, the Maximum number of interactions that are considered by a model during training is 500M. What if I have more than 500M records in my interactions dataset? How does Amazon select 500M interactions among those to train a model? Does it consider the latest 500M interactions?

Tags: amazon-personalize amazon-web-services

Answers

- HassanSerhan
- May 24, 2023 at 12:03 pm
- 0 votes
0
I had not explicitly disclosed how they handle datasets larger than the maximum size during training. However, I can tell you some generally accepted strategies in machine learning for dealing with this kind of situation:

Random Sampling: One approach could be to randomly select 500 million interactions from your dataset. This would provide a broad, if not comprehensive, sample of the data.

Temporal Sampling: Another approach could be to select the most recent 500 million interactions, under the assumption that more recent data is more relevant to current and future predictions.

Stratified Sampling: You could also stratify the data, ensuring that the sample of 500 million interactions is representative of the various categories or types of interactions in the dataset.

However, it’s crucial to consider that these strategies may introduce bias or exclude potentially useful information.

If you are dealing with a specific tool, product, or cloud service, such as Amazon Personalize, which I believe you might be referring to, it would be best to consult the specific product documentation or directly contact the service provider for a precise answer.

Login or Signup to reply.

- JamesJ
- May 25, 2023 at 1:44 am
- 0 votes
0
The most recent 500M interactions are used for training based on the TIMESTAMP column in the interactions dataset. This limit is also adjustable.

https://docs.aws.amazon.com/personalize/latest/dg/limits.html

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.