skip to Main Content

I have been working on a project related to Sentiment Analysis on Emojis. And I only want tweets with emojis in them and I don’t want to do it manually So, is there any way that I could make some changes in the below code that will result only in the tweets that have emoticons in them. So, let’s say that if I scrape 100 tweets, those 100 tweets must have some kind of emojis with some text. Any help will be highly appreciated.

For example, I only want tweets like this:

when is @McDonalds_SA gonna let us add spicy sauce on our veg burgers when we order on MrD or Uber eats ๐Ÿ˜ญ๐Ÿ˜ญ๐Ÿ˜ญ๐Ÿ˜ญ

Code:

get_token() # Connects with Twitter API
Uber <- search_tweets("uber", n = 100, lang = "en")

2

Answers


  1. Just a simple solution based on a regex of all emojis. Let me know if this works.

    library(rtweet)
    library(dplyr)
    library(stringr)
    
    get_token()
    uber <- search_tweets("uber", n = 2000, lang = "en")
    emoji_regex <- "๐Ÿ˜„|๐Ÿ˜ƒ|๐Ÿ˜€|๐Ÿ˜Š|โ˜บ๏ธ|๐Ÿ˜‰|๐Ÿ˜|๐Ÿ˜˜|๐Ÿ˜š|๐Ÿ˜—|๐Ÿ˜™|๐Ÿ˜œ|๐Ÿ˜|๐Ÿ˜›|๐Ÿ˜ณ|๐Ÿ˜|๐Ÿ˜”|๐Ÿ˜Œ|๐Ÿ˜’|๐Ÿ˜ž|๐Ÿ˜ฃ|๐Ÿ˜ข|๐Ÿ˜‚|๐Ÿ˜ญ|๐Ÿ˜ช|๐Ÿ˜ฅ|๐Ÿ˜ฐ|๐Ÿ˜…|๐Ÿ˜“|๐Ÿ˜ฉ|๐Ÿ˜ซ|๐Ÿ˜จ|๐Ÿ˜ฑ|๐Ÿ˜ |๐Ÿ˜ก|๐Ÿ˜ค|๐Ÿ˜–|๐Ÿ˜†|๐Ÿ˜†|๐Ÿ˜‹|๐Ÿ˜ท|๐Ÿ˜Ž|๐Ÿ˜ด|๐Ÿ˜ต|๐Ÿ˜ฒ|๐Ÿ˜Ÿ|๐Ÿ˜ฆ|๐Ÿ˜ง|๐Ÿ˜ˆ|๐Ÿ‘ฟ|๐Ÿ˜ฎ|๐Ÿ˜ฌ|๐Ÿ˜|๐Ÿ˜•|๐Ÿ˜ฏ|๐Ÿ˜ถ|๐Ÿ˜‡|๐Ÿ˜|๐Ÿ˜‘|๐Ÿ‘ฒ|๐Ÿ‘ณ|๐Ÿ‘ฎ|๐Ÿ‘ท|๐Ÿ’‚|๐Ÿ‘ถ|๐Ÿ‘ฆ|๐Ÿ‘ง|๐Ÿ‘จ|๐Ÿ‘ฉ|๐Ÿ‘ด|๐Ÿ‘ต|๐Ÿ‘ฑ|๐Ÿ‘ผ|๐Ÿ‘ธ|๐Ÿ˜บ|๐Ÿ˜ธ|๐Ÿ˜ป|๐Ÿ˜ฝ|๐Ÿ˜ผ|๐Ÿ™€|๐Ÿ˜ฟ|๐Ÿ˜น|๐Ÿ˜พ|๐Ÿ‘น|๐Ÿ‘บ|๐Ÿ™ˆ|๐Ÿ™‰|๐Ÿ™Š|๐Ÿ’€|๐Ÿ‘ฝ|๐Ÿ’ฉ|๐Ÿ’ฉ|๐Ÿ’ฉ|๐Ÿ”ฅ|โœจ|๐ŸŒŸ|๐Ÿ’ซ|๐Ÿ’ฅ|๐Ÿ’ฅ|๐Ÿ’ข|๐Ÿ’ฆ|๐Ÿ’ง|๐Ÿ’ค|๐Ÿ’จ|๐Ÿ‘‚|๐Ÿ‘€|๐Ÿ‘ƒ|๐Ÿ‘…|๐Ÿ‘„|๐Ÿ‘|๐Ÿ‘|๐Ÿ‘Ž|๐Ÿ‘Ž|๐Ÿ‘Œ|๐Ÿ‘Š|๐Ÿ‘Š|โœŠ|โœŒ๏ธ|๐Ÿ‘‹|โœ‹|โœ‹|๐Ÿ‘|๐Ÿ‘†|๐Ÿ‘‡|๐Ÿ‘‰|๐Ÿ‘ˆ|๐Ÿ™Œ|๐Ÿ™|โ˜๏ธ|๐Ÿ‘|๐Ÿ’ช|๐Ÿšถ|๐Ÿƒ|๐Ÿƒ|๐Ÿ’ƒ|๐Ÿ‘ซ|๐Ÿ‘ช|๐Ÿ‘ฌ|๐Ÿ‘ญ|๐Ÿ’|๐Ÿ’‘|๐Ÿ‘ฏ|๐Ÿ™†|๐Ÿ™…|๐Ÿ’|๐Ÿ™‹|๐Ÿ’†|๐Ÿ’‡|๐Ÿ’…|๐Ÿ‘ฐ|๐Ÿ™Ž|๐Ÿ™|๐Ÿ™‡|๐ŸŽฉ|๐Ÿ‘‘|๐Ÿ‘’|๐Ÿ‘Ÿ|๐Ÿ‘ž|๐Ÿ‘ž|๐Ÿ‘ก|๐Ÿ‘ |๐Ÿ‘ข|๐Ÿ‘•|๐Ÿ‘•|๐Ÿ‘”|๐Ÿ‘š|๐Ÿ‘—|๐ŸŽฝ|๐Ÿ‘–|๐Ÿ‘˜|๐Ÿ‘™|๐Ÿ’ผ|๐Ÿ‘œ|๐Ÿ‘|๐Ÿ‘›|๐Ÿ‘“|๐ŸŽ€|๐ŸŒ‚|๐Ÿ’„|๐Ÿ’›|๐Ÿ’™|๐Ÿ’œ|๐Ÿ’š|โค๏ธ|๐Ÿ’”|๐Ÿ’—|๐Ÿ’“|๐Ÿ’•|๐Ÿ’–|๐Ÿ’ž|๐Ÿ’˜|๐Ÿ’Œ|๐Ÿ’‹|๐Ÿ’|๐Ÿ’Ž|๐Ÿ‘ค|๐Ÿ‘ฅ|๐Ÿ’ฌ|๐Ÿ‘ฃ|๐Ÿ’ญ|๐Ÿถ|๐Ÿบ|๐Ÿฑ|๐Ÿญ|๐Ÿน|๐Ÿฐ|๐Ÿธ|๐Ÿฏ|๐Ÿจ|๐Ÿป|๐Ÿท|๐Ÿฝ|๐Ÿฎ|๐Ÿ—|๐Ÿต|๐Ÿ’|๐Ÿด|๐Ÿ‘|๐Ÿ˜|๐Ÿผ|๐Ÿง|๐Ÿฆ|๐Ÿค|๐Ÿฅ|๐Ÿฃ|๐Ÿ”|๐Ÿ|๐Ÿข|๐Ÿ›|๐Ÿ|๐Ÿ|๐Ÿœ|๐Ÿž|๐ŸŒ|๐Ÿ™|๐Ÿš|๐Ÿ |๐ŸŸ|๐Ÿฌ|๐Ÿฌ|๐Ÿณ|๐Ÿ‹|๐Ÿ„|๐Ÿ|๐Ÿ€|๐Ÿƒ|๐Ÿ…|๐Ÿ‡|๐Ÿ‰|๐ŸŽ|๐Ÿ|๐Ÿ“|๐Ÿ•|๐Ÿ–|๐Ÿ|๐Ÿ‚|๐Ÿฒ|๐Ÿก|๐ŸŠ|๐Ÿซ|๐Ÿช|๐Ÿ†|๐Ÿˆ|๐Ÿฉ|๐Ÿพ|๐Ÿพ|๐Ÿ’|๐ŸŒธ|๐ŸŒท|๐Ÿ€|๐ŸŒน|๐ŸŒป|๐ŸŒบ|๐Ÿ|๐Ÿƒ|๐Ÿ‚|๐ŸŒฟ|๐ŸŒพ|๐Ÿ„|๐ŸŒต|๐ŸŒด|๐ŸŒฒ|๐ŸŒณ|๐ŸŒฐ|๐ŸŒฑ|๐ŸŒผ|๐ŸŒ|๐ŸŒž|๐ŸŒ|๐ŸŒš|๐ŸŒ‘|๐ŸŒ’|๐ŸŒ“|๐ŸŒ”|๐ŸŒ”|๐ŸŒ•|๐ŸŒ–|๐ŸŒ—|๐ŸŒ˜|๐ŸŒœ|๐ŸŒ›|๐ŸŒ™|๐ŸŒ|๐ŸŒŽ|๐ŸŒ|๐ŸŒ‹|๐ŸŒŒ|๐ŸŒ |โญ|โ˜€๏ธ|โ›…|โ˜๏ธ|โšก|โ˜”|โ„๏ธ|โ›„|๐ŸŒ€|๐ŸŒ|๐ŸŒˆ|๐ŸŒŠ|๐ŸŽ|๐Ÿ’|๐ŸŽŽ|๐ŸŽ’|๐ŸŽ“|๐ŸŽ|๐ŸŽ†|๐ŸŽ‡|๐ŸŽ|๐ŸŽ‘|๐ŸŽƒ|๐Ÿ‘ป|๐ŸŽ…|๐ŸŽ„|๐ŸŽ|๐ŸŽ‹|๐ŸŽ‰|๐ŸŽŠ|๐ŸŽˆ|๐ŸŽŒ|๐Ÿ”ฎ|๐ŸŽฅ|๐Ÿ“ท|๐Ÿ“น|๐Ÿ“ผ|๐Ÿ’ฟ|๐Ÿ“€|๐Ÿ’ฝ|๐Ÿ’พ|๐Ÿ’ป|๐Ÿ“ฑ|โ˜Ž๏ธ|โ˜Ž๏ธ|๐Ÿ“ž|๐Ÿ“Ÿ|๐Ÿ“ |๐Ÿ“ก|๐Ÿ“บ|๐Ÿ“ป|๐Ÿ”Š|๐Ÿ”‰|๐Ÿ”ˆ|๐Ÿ”‡|๐Ÿ””|๐Ÿ”•|๐Ÿ“ข|๐Ÿ“ฃ|โณ|โŒ›|โฐ|โŒš|๐Ÿ”“|๐Ÿ”’|๐Ÿ”|๐Ÿ”|๐Ÿ”‘|๐Ÿ”Ž|๐Ÿ’ก|๐Ÿ”ฆ|๐Ÿ”†|๐Ÿ”…|๐Ÿ”Œ|๐Ÿ”‹|๐Ÿ”|๐Ÿ›|๐Ÿ›€|๐Ÿšฟ|๐Ÿšฝ|๐Ÿ”ง|๐Ÿ”ฉ|๐Ÿ”จ|๐Ÿšช|๐Ÿšฌ|๐Ÿ’ฃ|๐Ÿ”ซ|๐Ÿ”ช|๐Ÿ”ช|๐Ÿ’Š|๐Ÿ’‰|๐Ÿ’ฐ|๐Ÿ’ด|๐Ÿ’ต|๐Ÿ’ท|๐Ÿ’ถ|๐Ÿ’ณ|๐Ÿ’ธ|๐Ÿ“ฒ|๐Ÿ“ง|๐Ÿ“ฅ|๐Ÿ“ค|โœ‰๏ธ|โœ‰๏ธ|๐Ÿ“ฉ|๐Ÿ“จ|๐Ÿ“ฏ|๐Ÿ“ซ|๐Ÿ“ช|๐Ÿ“ฌ|๐Ÿ“ญ|๐Ÿ“ฎ|๐Ÿ“ฆ|๐Ÿ“|๐Ÿ“|๐Ÿ“„|๐Ÿ“ƒ|๐Ÿ“‘|๐Ÿ“Š|๐Ÿ“ˆ|๐Ÿ“‰|๐Ÿ“œ|๐Ÿ“‹|๐Ÿ“…|๐Ÿ“†|๐Ÿ“‡|๐Ÿ“|๐Ÿ“‚|โœ‚๏ธ|๐Ÿ“Œ|๐Ÿ“Ž|โœ’๏ธ|โœ๏ธ|๐Ÿ“|๐Ÿ“|๐Ÿ“•|๐Ÿ“—|๐Ÿ“˜|๐Ÿ“™|๐Ÿ““|๐Ÿ“”|๐Ÿ“’|๐Ÿ“š|๐Ÿ“–|๐Ÿ“–|๐Ÿ”–|๐Ÿ“›|๐Ÿ”ฌ|๐Ÿ”ญ|๐Ÿ“ฐ|๐ŸŽจ|๐ŸŽฌ|๐ŸŽค|๐ŸŽง|๐ŸŽผ|๐ŸŽต|๐ŸŽถ|๐ŸŽน|๐ŸŽป|๐ŸŽบ|๐ŸŽท|๐ŸŽธ|๐Ÿ‘พ|๐ŸŽฎ|๐Ÿƒ|๐ŸŽด|๐Ÿ€„|๐ŸŽฒ|๐ŸŽฏ|๐Ÿˆ|๐Ÿ€|โšฝ|โšพ๏ธ|๐ŸŽพ|๐ŸŽฑ|๐Ÿ‰|๐ŸŽณ|โ›ณ|๐Ÿšต|๐Ÿšด|๐Ÿ|๐Ÿ‡|๐Ÿ†|๐ŸŽฟ|๐Ÿ‚|๐ŸŠ|๐Ÿ„|๐ŸŽฃ|โ˜•|๐Ÿต|๐Ÿถ|๐Ÿผ|๐Ÿบ|๐Ÿป|๐Ÿธ|๐Ÿน|๐Ÿท|๐Ÿด|๐Ÿ•|๐Ÿ”|๐ŸŸ|๐Ÿ—|๐Ÿ–|๐Ÿ|๐Ÿ›|๐Ÿค|๐Ÿฑ|๐Ÿฃ|๐Ÿฅ|๐Ÿ™|๐Ÿ˜|๐Ÿš|๐Ÿœ|๐Ÿฒ|๐Ÿข|๐Ÿก|๐Ÿณ|๐Ÿž|๐Ÿฉ|๐Ÿฎ|๐Ÿฆ|๐Ÿจ|๐Ÿง|๐ŸŽ‚|๐Ÿฐ|๐Ÿช|๐Ÿซ|๐Ÿฌ|๐Ÿญ|๐Ÿฏ|๐ŸŽ|๐Ÿ|๐ŸŠ|๐Ÿ‹|๐Ÿ’|๐Ÿ‡|๐Ÿ‰|๐Ÿ“|๐Ÿ‘|๐Ÿˆ|๐ŸŒ|๐Ÿ|๐Ÿ|๐Ÿ |๐Ÿ†|๐Ÿ…|๐ŸŒฝ|๐Ÿ |๐Ÿก|๐Ÿซ|๐Ÿข|๐Ÿฃ|๐Ÿฅ|๐Ÿฆ|๐Ÿช|๐Ÿฉ|๐Ÿจ|๐Ÿ’’|โ›ช|๐Ÿฌ|๐Ÿค|๐ŸŒ‡|๐ŸŒ†|๐Ÿฏ|๐Ÿฐ|โ›บ|๐Ÿญ|๐Ÿ—ผ|๐Ÿ—พ|๐Ÿ—ป|๐ŸŒ„|๐ŸŒ…|๐ŸŒƒ|๐Ÿ—ฝ|๐ŸŒ‰|๐ŸŽ |๐ŸŽก|โ›ฒ|๐ŸŽข|๐Ÿšข|โ›ต|โ›ต|๐Ÿšค|๐Ÿšฃ|โš“|๐Ÿš€|โœˆ๏ธ|๐Ÿ’บ|๐Ÿš|๐Ÿš‚|๐ŸšŠ|๐Ÿš‰|๐Ÿšž|๐Ÿš†|๐Ÿš„|๐Ÿš…|๐Ÿšˆ|๐Ÿš‡|๐Ÿš|๐Ÿš‹|๐Ÿšƒ|๐ŸšŽ|๐ŸšŒ|๐Ÿš|๐Ÿš™|๐Ÿš˜|๐Ÿš—|๐Ÿš—|๐Ÿš•|๐Ÿš–|๐Ÿš›|๐Ÿšš|๐Ÿšจ|๐Ÿš“|๐Ÿš”|๐Ÿš’|๐Ÿš‘|๐Ÿš|๐Ÿšฒ|๐Ÿšก|๐ŸšŸ|๐Ÿš |๐Ÿšœ|๐Ÿ’ˆ|๐Ÿš|๐ŸŽซ|๐Ÿšฆ|๐Ÿšฅ|โš ๏ธ|๐Ÿšง|๐Ÿ”ฐ|โ›ฝ|๐Ÿฎ|๐Ÿฎ|๐ŸŽฐ|โ™จ๏ธ|๐Ÿ—ฟ|๐ŸŽช|๐ŸŽญ|๐Ÿ“|๐Ÿšฉ|๐Ÿ‡ฏ๐Ÿ‡ต|๐Ÿ‡ฐ๐Ÿ‡ท|๐Ÿ‡ฉ๐Ÿ‡ช|๐Ÿ‡จ๐Ÿ‡ณ|๐Ÿ‡บ๐Ÿ‡ธ|๐Ÿ‡ซ๐Ÿ‡ท|๐Ÿ‡ช๐Ÿ‡ธ|๐Ÿ‡ฎ๐Ÿ‡น|๐Ÿ‡ท๐Ÿ‡บ|๐Ÿ‡ฌ๐Ÿ‡ง|๐Ÿ‡ฌ๐Ÿ‡ง|1๏ธโƒฃ|2๏ธโƒฃ|3๏ธโƒฃ|4๏ธโƒฃ|5๏ธโƒฃ|6๏ธโƒฃ|7๏ธโƒฃ|8๏ธโƒฃ|9๏ธโƒฃ|0๏ธโƒฃ|๐Ÿ”Ÿ|๐Ÿ”ข|#๏ธโƒฃ|๐Ÿ”ฃ|โฌ†๏ธ|โฌ‡๏ธ|โฌ…๏ธ|โžก๏ธ|๐Ÿ” |๐Ÿ”ก|๐Ÿ”ค|โ†—๏ธ|โ†–๏ธ|โ†˜๏ธ|โ†™๏ธ|โ†”๏ธ|โ†•๏ธ|๐Ÿ”„|โ—€๏ธ|โ–ถ๏ธ|๐Ÿ”ผ|๐Ÿ”ฝ|โ†ฉ๏ธ|โ†ช๏ธ|โ„น๏ธ|โช|โฉ|โซ|โฌ|โคต๏ธ|โคด๏ธ|๐Ÿ†—|๐Ÿ”€|๐Ÿ”|๐Ÿ”‚|๐Ÿ†•|๐Ÿ†™|๐Ÿ†’|๐Ÿ†“|๐Ÿ†–|๐Ÿ“ถ|๐ŸŽฆ|๐Ÿˆ|๐Ÿˆฏ|๐Ÿˆณ|๐Ÿˆต|๐Ÿˆด|๐Ÿˆฒ|๐Ÿ‰|๐Ÿˆน|๐Ÿˆบ|๐Ÿˆถ|๐Ÿˆš|๐Ÿšป|๐Ÿšน|๐Ÿšบ|๐Ÿšผ|๐Ÿšพ|๐Ÿšฐ|๐Ÿšฎ|๐Ÿ…ฟ๏ธ|โ™ฟ|๐Ÿšญ|๐Ÿˆท๏ธ|๐Ÿˆธ|๐Ÿˆ‚๏ธ|โ“‚๏ธ|๐Ÿ›‚|๐Ÿ›„|๐Ÿ›…|๐Ÿ›ƒ|๐Ÿ‰‘|ใŠ™๏ธ|ใŠ—๏ธ|๐Ÿ†‘|๐Ÿ†˜|๐Ÿ†”|๐Ÿšซ|๐Ÿ”ž|๐Ÿ“ต|๐Ÿšฏ|๐Ÿšฑ|๐Ÿšณ|๐Ÿšท|๐Ÿšธ|โ›”|โœณ๏ธ|โ‡๏ธ|โŽ|โœ…|โœด๏ธ|๐Ÿ’Ÿ|๐Ÿ†š|๐Ÿ“ณ|๐Ÿ“ด|๐Ÿ…ฐ๏ธ|๐Ÿ…ฑ๏ธ|๐Ÿ†Ž|๐Ÿ…พ๏ธ|๐Ÿ’ |โžฟ|โ™ป๏ธ|โ™ˆ|โ™‰|โ™Š|โ™‹|โ™Œ|โ™|โ™Ž|โ™|โ™|โ™‘|โ™’|โ™“|โ›Ž|๐Ÿ”ฏ|๐Ÿง|๐Ÿ’น|๐Ÿ’ฒ|๐Ÿ’ฑ|ยฉ๏ธ|ยฎ๏ธ|โ„ข๏ธ|โŒ|โ€ผ๏ธ|โ‰๏ธ|โ—|โ—|โ“|โ•|โ”|โญ•|๐Ÿ”|๐Ÿ”š|๐Ÿ”™|๐Ÿ”›|๐Ÿ”œ|๐Ÿ”ƒ|๐Ÿ•›|๐Ÿ•ง|๐Ÿ•|๐Ÿ•œ|๐Ÿ•‘|๐Ÿ•|๐Ÿ•’|๐Ÿ•ž|๐Ÿ•“|๐Ÿ•Ÿ|๐Ÿ•”|๐Ÿ• |๐Ÿ••|๐Ÿ•–|๐Ÿ•—|๐Ÿ•˜|๐Ÿ•™|๐Ÿ•š|๐Ÿ•ก|๐Ÿ•ข|๐Ÿ•ฃ|๐Ÿ•ค|๐Ÿ•ฅ|๐Ÿ•ฆ|โœ–๏ธ|โž•|โž–|โž—|โ™ ๏ธ|โ™ฅ๏ธ|โ™ฃ๏ธ|โ™ฆ๏ธ|๐Ÿ’ฎ|๐Ÿ’ฏ|โœ”๏ธ|โ˜‘๏ธ|๐Ÿ”˜|๐Ÿ”—|โžฐ|ใ€ฐ๏ธ|ใ€ฝ๏ธ|๐Ÿ”ฑ|โ—ผ๏ธ|โ—ป๏ธ|โ—พ|โ—ฝ|โ–ช๏ธ|โ–ซ๏ธ|๐Ÿ”บ|๐Ÿ”ฒ|๐Ÿ”ณ|โšซ|โšช|๐Ÿ”ด|๐Ÿ”ต|๐Ÿ”ป|โฌœ|โฌ›|๐Ÿ”ถ|๐Ÿ”ท|๐Ÿ”ธ|๐Ÿ”น"
    filter(uber, str_detect(text, emoji_regex))
    
    Login or Signup to reply.
  2. Note: I assume you’re not looking for all emoji, since they include quite common characters:

    enter image description here

    (from https://unicode.org/Public/UNIDATA/emoji/emoji-data.txt)

    Unicode library

    To get the Unicode block for one or more characters, we can use the Unicode library:

    library("Unicode") # install.packages("Unicode")
    

    A few examples:

    > "๐Ÿ˜Ž" |> utf8ToInt() |> u_char_properties("Block")
    
                Block
    U+1F60E Emoticons
    
    > "๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘ฆ" |> utf8ToInt() |> u_char_properties("Block")
    
                                      Block
    1 Miscellaneous Symbols and Pictographs
    2                   General Punctuation
    3 Miscellaneous Symbols and Pictographs
    4                   General Punctuation
    5 Miscellaneous Symbols and Pictographs
    
    > "๐Ÿค" |> utf8ToInt() |> u_char_properties("Block")
    
                                           Block
    U+1F910 Supplemental Symbols and Pictographs
    
    > "โœ…" |> utf8ToInt() |> u_char_properties("Block")
    
              Block
    U+2705 Dingbats
    
    > "โ˜๏ธ" |> utf8ToInt() |> u_char_properties("Block")
    
                           Block
    U+261D Miscellaneous Symbols
    U+FE0F   Variation Selectors
    
    > "โ˜Ž๏ธ" |> utf8ToInt() |> u_char_properties("Block")
    
                           Block
    U+260E Miscellaneous Symbols
    U+FE0F   Variation Selectors
    
    > "โ™" |> utf8ToInt() |> u_char_properties("Block")
    
                           Block
    U+264D Miscellaneous Symbols
    
    > "๐Ÿซƒ๐Ÿฝ" |> utf8ToInt() |> u_char_properties("Block")
    
                                            Block
    U+1FAC3    Symbols and Pictographs Extended-A
    U+1F3FD Miscellaneous Symbols and Pictographs
    
    > "๐Ÿš‚" |> utf8ToInt() |> u_char_properties("Block")
    
                                Block
    U+1F682 Transport and Map Symbols
    

    Matching all emoji-like characters could be done like this:

    blocks <- c("Emoticons",
                "Miscellaneous Symbols and Pictographs",
                "Supplemental Symbols and Pictographs",
                "Dingbats",
                "Miscellaneous Symbols",
                "Symbols and Pictographs Extended-A",
                "Transport and Map Symbols")
    
    > "๐Ÿ˜Ž" |> utf8ToInt() |> u_char_properties("Block") |> unlist() |> intersect(blocks) |> length() > 0
    [1] TRUE
    
    > "๐Ÿ‘จโ€๐Ÿ‘จโ€๐Ÿ‘ฆ" |> utf8ToInt() |> u_char_properties("Block") |> unlist() |> intersect(blocks) |> length() > 0
    [1] TRUE
    
    > "๐Ÿค" |> utf8ToInt() |> u_char_properties("Block") |> unlist() |> intersect(blocks) |> length() > 0
    [1] TRUE
    
    > "โ˜Ž๏ธ" |> utf8ToInt() |> u_char_properties("Block") |> unlist() |> intersect(blocks) |> length() > 0
    [1] TRUE
    
    > "โ™" |> utf8ToInt() |> u_char_properties("Block") |> unlist() |> intersect(blocks) |> length() > 0
    [1] TRUE
    
    > "๐Ÿซƒ๐Ÿฝ" |> utf8ToInt() |> u_char_properties("Block") |> unlist() |> intersect(blocks) |> length() > 0
    [1] TRUE
    
    > "๐Ÿš‚" |> utf8ToInt() |> u_char_properties("Block") |> unlist() |> intersect(blocks) |> length() > 0
    [1] TRUE
    
    > "#" |> utf8ToInt() |> u_char_properties("Block") |> unlist() |> intersect(blocks) |> length() > 0
    [1] FALSE
    

    Integrating into your code

    library(rtweet)
    library(dplyr)
    
    get_token() # Connects with Twitter API
    Uber <- search_tweets("uber", n = 100, lang = "en")
    
    Uber_filtered <- Uber %>%
      rowwise() %>%
      filter(text |> utf8ToInt() |> u_char_properties("Block") |> unlist() |> intersect(blocks) |> length() > 0)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search