skip to Main Content

This code has had me working for hours without coming to a solution.
The program does not find the path in the file, so it creates a new dataset. However then it throws the following TypeError():

TypeError: "Incompatible object (Dataset) already exists"

If I try to just update the value with via dset = file[hdf5_path], it throws this error:

Unable to open object (message type not found)

This code generates the problem above:

    hdf5_path = "path/to/my/dataset"
    with h5py.File(hdf5_path, "r+") as file:
        if hdf5_path in file:
            dset = file[hdf5_path]
            dset[...] = new_data_set
        else:
            file.create_dataset(hdf5_path, data=pixel_count)

The following code instead generates an error the second time in the loop, i.e. creating the group "my_path/to_another": "Unable to create group (message type not found)"

    import h5py
    data_set = [1,2,3,4]
    fname = "temp.h5"
    h5_path = "my_path"
    with h5py.File(fname, "r+") as file:
        if h5_path in file:
            dset = file[h5_path] 
            dset[...] = data_set
        else:
            file.create_dataset(h5_path, data=data_set)

    h5_path = "my_path/to_another/dest"
    with h5py.File(fname, "r+") as file:
        current_path = ""
        for part in h5_path.split('/')[:-1]:
            current_path = f"{current_path}/{part}" if current_path else part
            if current_path not in file:
                file.create_group(current_path)
        if h5_path in file:
            dset = file[h5_path] 
            dset[...] = data_set
        else:
            file.create_dataset(h5_path, data=data_set)

Could it be that the files are corrupted?

3

Answers


  1. Chosen as BEST ANSWER

    Ok, I finally found the purported issue:

    import h5py
    
    pixel_count = [i for i in range(17)]
    fname = "copyfile.h5"
    dset_tag = "post/cams/thermal/pixels"
    with h5py.File(fname, "r+") as file:
        if dset_tag in file:
            del file[dset_tag]
            print("Dataset deleted")
        file.create_dataset(dset_tag, data=pixel_count)
        
    

    Problem here is that thermal was a dataset in the original file, not a group! Thus "post/cams/thermal/pixels" is not in file, and hence is not deleted, however I cannot create the dataset either as it would be a child of a group.

    The error message,

    "ValueError: Unable to create dataset (name already exists)"

    did not make me realize what the problem was, until this answer put me on the right track.


  2. You have errors in both methods, so most likely the file is not corrupted. I will explain, then provide working code.

    In the 1st example, you use hdf5_path as both the H5 file name and the name of the dataset path. Also, the create_dataset() method references dset_tag as the dataset name. Are those a typos or cut-n-paste errors?

    In the 2nd example, the 1st with/as block creates a DATASET named my_path. That triggers the error in the 2nd with/as block. You are trying to create a dataset with h5_path = "my_path/to_another/dest". That means my_path must now be a GROUP (when it was just created as a DATASET). That’s why you get an error. Also, you don’t need to create the intermediate groups; the create_dataset() method will do that for you. I modified your indexing to h5py preferred format (from [...] to [()], and removed lines with dset = file[h5_path] to reduce code).

    Your code doesn’t test modification of data in an existing dataset. I added a third segment to do that. Note: this will only work if the new data matches the type and shape of the existing dataset. You will get an error if the shape changes, or you change from ints to strings. Worse, if the new data changes from ints to floats, you will get an undetected error as the floats will be saved as ints.

    Modified code below:

    import h5py  
    data_set = [1,2,3,4]
    fname = "temp.h5"
    h5_path = "my_path/dest1"
    with h5py.File(fname, "r+") as file:
        if h5_path in file:
            file[h5_path][...] = data_set
        else:
            file.create_dataset(h5_path, data=data_set)
    
    h5_path = "my_path/dest2"
    with h5py.File(fname, "r+") as file:
        if h5_path in file:
            file[h5_path][...] = data_set
        else:
            file.create_dataset(h5_path, data=data_set)
    
    data_set = [11,12,13,14]
    with h5py.File(fname, "r+") as file:
        if h5_path in file:
            file[h5_path][()] = data_set
        else:
            file.create_dataset(h5_path, data=data_set)
        print(file[h5_path][()])    
    
    Login or Signup to reply.
  3. This answer is a follow-up to @hamo’s answer with "purported issue". As you discovered, you have to check if the file[dset_tag] is a dataset, but also have to check all names along the path are groups (except the last one used for the dataset). The logic gets a little trickier. I addressed it by creating a simple function to check that all path names are NOT datasets. Then you can use it in the original logic. I modified your example to create a file that mimics your problem. The file is 1st created with a dataset named post/cams/thermal then tries to create a dataset named post/cams/thermal/pixels.

    Code below:

    def group_path_ok(pset_path):
        pset_path = dset_tag.split('/')
        group_path = ''
        for name in pset_path[:-1]:
            group_path += '/' + name
            if group_path in file and isinstance(file[group_path], h5py.Dataset):
                print(f'group name: {group_path} in path is a dataset')
                return False
        return True
                
        
    fname = "copyfile.h5"
    pixel_count = [i for i in range(10)]
    dset_tag = "post/cams/thermal"
    
    with h5py.File(fname, "w") as file:
        file.create_dataset(dset_tag, data=pixel_count)
    
    pixel_count = [i for i in range(17)]
    dset_tag = "post/cams/thermal/pixels"   
    with h5py.File(fname, "r+") as file:
        if group_path_ok(pset_path):
            if dset_tag in file:
                del file[dset_tag]
                print("Dataset deleted")
            file.create_dataset(dset_tag, data=pixel_count)
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search