Question posted in Json
Our archive of expertly curated questions and answers provides insights and solutions to common problems related to this popular data interchange format. From parsing and manipulating JSON data to integrating it with various programming languages and web services, our archive has got you covered. Start exploring today and take your JSON skills to the next level

How to remove duplicate entries from json string in SQL Server

Huso
December 19, 2022
133 views
0 votes
2 Answers

I’ve checked stackoverflow as well as google but could not find any solution. What would be the proper way to remove duplicate entries from an nvarchar field that contains json string in SQL Server? For my case, let say I have nvarchar ‘People’ field on my table which contains the following data.

[
 {
  "name":"Jon",
  "age": 30
 },
 {
  "name":"Bob",
  "age": 30
 },
 {
  "name":"Nick",
  "age": 40
 },
 {
  "name":"Bob",
  "age": 40
 }
]

I need to remove the entries which has duplicate names which would be the ‘Bob’ in that case. So after executing the query I am expecting this result

[
 {
  "name":"Jon",
  "age": 30
 },
 {
  "name":"Bob",
  "age": 30
 },
 {
  "name":"Nick",
  "age": 40
 }
]

What would be the proper sql query to do that? Actually I am trying to achieve no duplicate names rather than no duplicate entries. That’s why 2 Bobs have different ages in the above example. More specifically I need to keep only first items among duplicates for this example the first Bob with age 30.Using ROW_NUMBER() and Partition By would be solution but it breaks the existing order.I need to achieve this without breaking the existing order. So I have the table with Id and PeopleJson fields. The following query would achieve what I want to achieve but it breaks the order in PeopleJson

SELECT Id, (
    SELECT [Name],[Age] FROM (
        SELECT *, ROW_NUMBER() OVER (PARTITION BY [Name] ORDER BY (select NULL)) as row_num
        FROM OPENJSON(PeopleJson) WITH ([Name] NVARCHAR(1000), [Age]  int)
    ) t WHERE t.row_num = 1
    FOR JSON PATH, INCLUDE_NULL_VALUES
) as [People]
 From [TestTable]

Answers

WITH cte AS (
    SELECT DISTINCT value
    FROM OPENJSON(@json)
)
SELECT * FROM cte

DECLARE @json NVARCHAR(MAX) = N'[{"id": 1, "name": "Alice"}, {"id": 2, "name": "Bob"}, {"id": 1, "name": "Alice"}]'

SELECT (
    SELECT *
    FROM (
        SELECT DISTINCT value
        FROM OPENJSON(@json)
    ) cte
    FOR JSON AUTO
) AS result

[{"id":1,"name":"Alice"},{"id":2,"name":"Bob"}]

- DayByDay
- December 19, 2022 at 7:11 pm
- 0 votes
0
Can you provide some more information, please? I understand what you’re trying to do, but I have some questions.

Each Bob has a different age, so those aren’t duplicate entries, only duplicate names. Either way, it would be hard to decide which entry to remove if each one is different.

You can achieve no duplicate "Bob" entries, but the issue comes in when deciding which Bob record you want to keep.

Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.