skip to Main Content

Using the api to analyze a twitter stream I am getting very similar results for openness for pretty much everybody. How can I train a corpus to generate a different output

2

Answers


  1. Unfortunately, you can’t. Also, I am afraid twitter is not the best source for this kind of analysis since each tweet has just a little piece of text. Watson Personality Insights works better with large text samples, and most probably, twitter sentences are too short to provide enough information for this kind of analysis (even if you concatenate several tweets in the same text sample).

    But, if you’re getting meaningful results for the other dimensions, what I’d suggest you to do is to ignore the openness information and try to calculate it using another algorithm (your own?) or even checking if just removing this dimension does not provide good enough results for you.

    There are some nice tips here — https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/personality-insights/science.shtml and some references to papers that can help you understand the algorithm internals.

    Login or Signup to reply.
  2. You cannot train Watson Personality Insights at the current version. But there may be alternatives.

    From your message it is not clear to me if you are receiving too similar results for individual tweets or entire twitter streams. In the first case, as Leo pointed out in a different answer, please note that you should aim to provide enough information for any analysis to be meaningful (this is 3,000+ words, not just a tweet). In the second case, I would be a bit surprised if your scores are still so similar with so much text (how many tweets per user?), but this may still happen depending on the domain.

    If you are analyzing individual tweets you may also benefit from user Tone Analyzer (in Beta as of today). Its “social tone” is basically the same model as Personality Insights, and gives some raw scores even for small texts. (And by the way you get other measures such as emotions and writing style).

    And in any case (small or large inputs), we encourage users to take a look at the raw scores in their own data corpus. For example, say you are analyzing a set of IT support calls (I am making this up), you will likely find some traits tend to be all the same because the jargon and writing style is similar in all of them. However, within your domain there may be small differences you may want to focus, ie. there is still a 90% percentile, a lowest 10% in each trait… So you might want to do some data analysis on Personality Insights raw_score (api reference) or just the score in Tone Analyzer (api reference) and draw your own conclusions.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search