skip to Main Content

I’ve switched from Memcached to Hazelcast. After a while i’ve noticed that the size of the Cache was bigger than usual. With man center.

So i did like this:
1. Before to call the IMap.set(key,value(ArrayList) i deserialize the value to a file which has 128K as size.
2. After the IMap.set() is called, i IMap.get() the same map, which suddently this has now 6 MB size.

The object in question has many objects which are referenced multiple times in the same structure.

i’ve opened the 2 binary files and i’ve seen that the 6MB file has a lot of duplicated data. The serialization used by hazelcast somehow make copies of the references

  • All the Classes instantiated for the Cache are Serializable except Enums.

  • using Memcached the value size is 128K in both cases.

  • i’ve tryied Kryo with hazelcast and there was not really a difference, still over 6MB

Have someone a similar problem with hazelcast ? If yes how did you solved this without changing the cache provider.

I could provide the Object Structure and Try to reproduce it with non sensitive data, if someone need it.

2

Answers


  1. Chosen as BEST ANSWER

    I am not pretending, but after a lost day, i finally came up with a solution which workaround this. I cannot say if it is a feature or just a problem to report.

    Anyway in Hazelcast if you put in an IMap a value as ArrayList thus will be Serialized Entry By Entry. Which means if we have 100 entries of the same instance A which is 6K we will have 600K with Hazelcast. Here a short RAW code which prove my answer.

    To Workaround or avoid this with Java Serialization you should wrap the ArrayList into an object , this will do the trick.

    (only with Serializable, no other Implementations)

       @Test
    public void start() throws Exception {
    
    
    
        HazelcastInstance client = produceHazelcastClient();
    
        Data data = new Data();
    
        ArrayList<Data> datas = new ArrayList<>();
    
        IntStream.range(0, 1000).forEach(i -> {
            datas.add(data);
        });
    
        wirteFile(datas,"DataLeoBefore","1");
    
    
        client.getMap("data").put("LEO", datas);
    
        Object redeserialized = client.getMap("data").get("LEO");
    
        wirteFile(redeserialized,"DataLeoAfter","1");
    
    }
    
    public void wirteFile(Object value, String key, String fileName) {
        try {
            Files.write(Paths.get("./" + fileName + "_" + key), SerializationUtils.serialize(((ArrayList) value)));
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    

  2. Hazelcast can be configured to use several different serialization schemes; Java serialization (the default) is the least efficient in terms of both time and space. Typically choosing the right serialization strategy gives a bigger payoff than almost any other optimization you could do.

    The reference manual gives a good overview of the different serialization schemes and the tradeoffs involved.
    IMDG Reference Manual v3.11 – Serialization

    I typically would go with IdentifiedDataSerializable if my application is all Java, or Portable if I needed to support cross-language clients or object versioning.

    If you need to use Java serialization for some reason, you might check and verify that the SharedObject property is set to true to avoid creating multiple copies of the same object. (That property can be set via the element in hazelcast.xml, or programmatically through the SerializationConfig object).

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search