skip to Main Content

I am new to Python and I need to iterate over 3 main variables to check the best mean error in an artificial intelligence models.

The 3 models are: Gradient booster, Random Forest and XGBooster.

Each model is fitted to the data separately. And at the end I need to ensemble them but the iteration is exhausting as there is 27 iterations to make.

The equation is as follows:

y_predict = xgradientBossterPredict + yrandomForest + z*XGBooster

Where

  1. x, y and z are between 0 and 1 (with 0.1 as step for each of them)
  2. x + y + z should be always equal to 1

I tried the following:

rmse = []
for (gbrCount in np.arange(0, 1.0, 0.1)):
    for(xgbCount in np.arange(0, 1.0, 0.1)):
        for(regCount in np.arange(0, 1.0, 0.1)):
            y_p = (xgbCount*xgb.predict(testset)+ gbrCount*gbr.predict(testset)+regCount*regressor.predict(testset))
            testset['SalePrice']=np.expm1(y_p)
            y_train_p = xgb.predict(dataset)
            y_train_p = np.expm1(y_train_p)
            rmse.append(np.sqrt(mean_squared_error(y, y_train_p)))
            rmse.append(xgbCount)
            rmse.append(gbrCount)
            rmse.append(regCount)

But I am getting the following error:

SyntaxError: unexpected EOF while parsing
for gbrCount in np.arange(0, 1.0, 0.1):

4

Answers


  1. Please code like the following.

    np.linspace(0,1,11)
    

    or

    np.arange(0.0, 1.0, 0.1)
    

    or

    numpy.arange(1, 1.0, 0.1)
    
    Login or Signup to reply.
  2. This is just a Python syntax error.
    Omit the parentheses in this line:

    for gbrCount in np.arange(0, 1.0, 0.1):

    and also in the other for lines.

    That will solve your stated problem. But also note, in the arange docs, that you should instead be using linspace if you want to use noninteger step paramter.

    As to making the sum equal 1:

    You already have if int(gbrCount+xgbCount+regCount) == 1: Doesn’t that work? If not, note that floating point numbers are not exact, so that what looks like it should be 1.0 might actually be 0.9999, so that int() gives 0. You should use linspace or else use np.arange(0, 10 , 1) so that everything is integers (inside the loop, dividing each value by 10).

    Login or Signup to reply.
  3. Your code will work fine with below syntax for FOR loops:

    import numpy as np
    for gbrCount in np.arange(0, 1.0, 0.1):
        for xgbCount in np.arange(0, 1.0, 0.1):
            for regCount in np.arange(0, 1.0, 0.1):
                y_p = (xgbCount*xgb.predict(testset)+ gbrCount*gbr.predict(testset)+regCount*regressor.predict(testset))
                testset['SalePrice']=np.expm1(y_p)
                y_train_p = xgb.predict(dataset)
                y_train_p = np.expm1(y_train_p)
                rmse.append(np.sqrt(mean_squared_error(y, y_train_p)))
                rmse.append(xgbCount)
                rmse.append(gbrCount)
                rmse.append(regCount)
    

    for sum always = 1 in loop, look below:

    import numpy as np
    for gbrCount in np.arange(0, 1.0, 0.1):
        for xgbCount in np.arange(0, 1.0, 0.1):
            for regCount in np.arange(0, 1.0, 0.1):
                #check if sum is 1
                if int(gbrCount+xgbCount+regCount) == 1:
    
                    y_p = (xgbCount*xgb.predict(testset)+ gbrCount*gbr.predict(testset)+regCount*regressor.predict(testset))
                    testset['SalePrice']=np.expm1(y_p)
                    y_train_p = xgb.predict(dataset)
                    y_train_p = np.expm1(y_train_p)
                    rmse.append(np.sqrt(mean_squared_error(y, y_train_p)))
                    rmse.append(xgbCount)
                    rmse.append(gbrCount)
                    rmse.append(regCount)
    

    for each result in same row and not each value:

    import numpy as np
    for gbrCount in np.arange(0, 1.0, 0.1):
        for xgbCount in np.arange(0, 1.0, 0.1):
            for regCount in np.arange(0, 1.0, 0.1):
                #check if sum is 1
                if int(gbrCount+xgbCount+regCount) == 1:
    
                    y_p = (xgbCount*xgb.predict(testset)+ gbrCount*gbr.predict(testset)+regCount*regressor.predict(testset))
                    testset['SalePrice']=np.expm1(y_p)
                    y_train_p = xgb.predict(dataset)
                    y_train_p = np.expm1(y_train_p)
    
                    rmse.append([np.sqrt(mean_squared_error(y, y_train_p)), xgbCount, gbrCount, regCount ])
    
    Login or Signup to reply.
  4. Simplest approach I can think of: loop over two of the variables, and determine the necessary value of the third (if it isn’t in range, just continue; or better yet, specify the range for the second variable in terms of the first, in a way that ensures the third can be in range).

    Example, with integers in 0..10 summing to 10:

    for i in range(11):
        # when i == 0, j may be 0..10, which we get from range(11).
        # when i == 10, j should only be 0, which we get from range(1).
        for j in range(11-i):
            k = 10 - i - j
            # proceed to do math with (i, j, k)
    

    (For the floating-point case, this may require some adjustment due to the imprecision of floating-point arithmetic.)

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search