skip to Main Content

I’m new to python and I’m trying to perform a simple task which is to read a .csv file and save it in a specific data structure. I’m using numpy to load the data and I get a ndarray of ndarray‘s, which is not exactly what I want.

My code:

import numpy
filename = '../HTRU2/HTRU_2.csv'
raw_data = open(filename, 'rU')
data = numpy.loadtxt(raw_data, delimiter=',')

The data structure I’m looking for is a list of tuples. The tuples are a pair (x,y) of ndarrays: x is a ndarray of shape (nx – 1, 1) filled with floats, where nx is the number of elements of each line in the file minus 1; y is a ndarray of shape (1, 1) that holds the last element of the line (also a float).

You might think this is some crazy data structure I’ve made up, but it’s actually quite useful since my end goal is to put this in a Neural Network (if you know about NN’s you probably guessed the tuple is actually a pair of inputs/output, where both are a column matrix). I must not change the data structure.

File sample:

140.5625,55.68378214,-0.234571412,-0.699648398,3.199832776,19.11042633,7.975531794,74.24222492,0
102.5078125,58.88243001,0.465318154,-0.515087909,1.677257525,14.86014572,10.57648674,127.3935796,0

Each tuple would look like this:

#     x                      y
[[140.5625]               
[55.68378214]
[-0.234571412]
[-0.699648398]
[3.199832776]
[19.11042633]
[7.975531794]
[74.24222492]]     ,      [[0]]

3

Answers


  1. Chosen as BEST ANSWER

    I was able to figure out a solution:

    import numpy
    filename = '../HTRU2/test.csv'
    file = open(filename, 'rU')
    data = numpy.loadtxt(file, delimiter=',')
    training_data = list()
    for test in data:
        training_data.append((test[:-1].reshape(8, 1), test[-1].reshape(1, 1)))
    

    Where the number of input neurons is 8 and the number of output neurons is 1.


  2. import pandas as pd
    filename = '../HTRU2/HTRU_2.csv'
    df = pd.read_csv(filename, encoding="utf-8")
    

    The second argument may be unnecessary

    Finally:

    df = [tuple(x) for x in df.values]
    
    Login or Signup to reply.
  3. In [60]: data = np.arange(12).reshape(3,4)
    In [61]: data
    Out[61]: 
    array([[ 0,  1,  2,  3],
           [ 4,  5,  6,  7],
           [ 8,  9, 10, 11]])
    In [62]: data.tolist()
    Out[62]: [[0, 1, 2, 3], [4, 5, 6, 7], [8, 9, 10, 11]]
    In [63]: [tuple(l) for l in _]
    Out[63]: [(0, 1, 2, 3), (4, 5, 6, 7), (8, 9, 10, 11)]
    In [64]: [tuple(np.array(i) for i in l) for l in __]
    Out[64]: 
    [(array(0), array(1), array(2), array(3)),
     (array(4), array(5), array(6), array(7)),
     (array(8), array(9), array(10), array(11))]
    In [65]: [tuple(np.array(i) for i in l) for l in data]
    Out[65]: 
    [(array(0), array(1), array(2), array(3)),
     (array(4), array(5), array(6), array(7)),
     (array(8), array(9), array(10), array(11))]
    

    But do you really need a tuple layer? Why not just add a dimension:

    In [67]: data.reshape(3,4,1)
    Out[67]: 
    array([[[ 0],
            [ 1],
            [ 2],
            [ 3]],
    
           [[ 4],
            [ 5],
            [ 6],
            [ 7]],
    
           [[ 8],
            [ 9],
            [10],
            [11]]])
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search