Values updating automatically inside the for loop

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP



Values updating automatically inside the for loop



I was trying to implement the value iteration algorithm.
I have a grid


grid = [[0, 0, 0, +1],
[0, "W", 0, -1],
[0, 0, 0, 0]]



An actionlist


actlist = UP:1, DOWN:2, LEFT:3, RIGHT:4



And a reward function


reward = [[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]



I wrote a function T, which returns tuple of 3 tuples.


def T(i,j,actions):
if(i == 0 and j == 0):
if(actions == UP):
return (i,i,0.8),(i,i,0.1),(i,j+1,0.1)
elif(actions == DOWN):
return (i+1,j,0.8),(i,j,0.1),(i,j+1,0.1)
elif(actions == LEFT):
return (i,j,0.8),(i,j,0.1),(i+1,j,0.1)
elif(actions == RIGHT):
return (i,j+1,0.8),(i,i,0.1),(i+1,j,0.1)
elif (i == 0 and j == 1):
if(actions == UP):
return (i,i,0.8),(i,j-1,0.1),(i,j+1,0.1)
elif(actions == DOWN):
return (i,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
elif(actions == LEFT):
return (i,j-1,0.8),(i,j,0.1),(i,j,0.1)
elif(actions == RIGHT):
return (i,j+1,0.8),(i,j,0.1),(i,j,0.1)
elif(i == 0 and j == 2):
if(actions == UP):
return (i,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
elif(actions == DOWN):
return(i+1,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
elif(actions == LEFT):
return (i,j-1,0.8),(i,j,0.1),(i+1,j,0.1)
elif(actions == RIGHT):
return (i,j+1,0.8),(i,j,0.1),(i+1,j,0.1)
elif(i == 0 and j == 3):
if(actions == UP):
return (-1,-1,0.8),(-1,-1,0.1),(-1,-1,0.1)
elif(actions == DOWN):
return (-1,-1,0.8),(-1,-1,0.1),(-1,-1,0.1)
elif(actions == LEFT):
return (-1,-1,0.8),(-1,-1,0.1),(-1,-1,0.1)
elif(actions == RIGHT):
return (-1,-1,0.8),(-1,-1,0.1),(-1,-1,0.1)
# 2nd row
elif (i == 1 and j == 0):
if(actions == UP):
return (i-1,j,0.8),(i,j,0.1),(i,j,0.1)
elif(actions == DOWN):
return (i+1,j,0.8),(i,j,0.1),(i,j,0.1)
elif(actions == LEFT):
return (i,j,0.8),(i-1,j,0.1),(i+1,j,0.1)
elif(actions == RIGHT):
return (i,j,0.8),(i-1,j,0.1),(i+1,j,0.1)
elif(i == 1 and j ==1):
if(actions == UP):
return (i,j,0.8),(i,j,0.1),(i,j,0.1)
elif(actions == DOWN):
return (i,j,0.8),(i,j,0.1),(i,j,0.1)
elif(actions == LEFT):
return (i,j,0.8),(i,j,0.1),(i,j,0.1)
elif(actions == RIGHT):
return (i,j,0.8),(i,j,0.1),(i,j,0.1)
elif (i == 1 and j == 2):
if(actions == UP):
return (i-1,j,0.8),(i,j,0.1),(i,j+1,0.1)
elif(actions == DOWN):
return (i+1,j,0.8),(i,j,0.1),(i,j+1,0.1)
elif(actions == LEFT):
return (i,j,0.8),(i-1,j,0.1),(i+1,j,0.1)
elif(actions == RIGHT):
return (i,j+1,0.8),(i-1,j,0.1),(i+1,j,0.1)
elif(i == 1 and j == 3):
if(actions == UP):
return (-2,-2,0.8),(-2,-2,0.1),(-2,-2,0.1)
elif(actions == DOWN):
return (-2,-2,0.8),(-2,-2,0.1),(-2,-2,0.1)
elif(actions == LEFT):
return (-2,-2,0.8),(-2,-2,0.1),(-2,-2,0.1)
elif(actions == RIGHT):
return (-2,-2,0.8),(-2,-2,0.1),(-2,-2,0.1)
# 3rd row
elif(i == 2 and j == 0):
if(actions == UP):
return (i-1,j,0.8),(i,j,0.1),(i,j+1,0.1)
elif(actions == DOWN):
return (i,j,0.8),(i,j,0.1),(i,j+1,1,0.1)
elif(actions == LEFT):
return (i,j,0.8),(i-1,j,0.1),(i,j,0.1)
elif(actions == RIGHT):
return (i,j+1,0.8),(i-1,j,0.1),(i,j,0.1)
elif (i == 2 and j == 1):
if(actions == UP):
return (i,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
elif(actions == DOWN):
return (i,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
elif(actions == LEFT):
return (i,j-1,0.8),(i,j,0.1),(i,j,0.1)
elif(actions == RIGHT):
return (i,j+1,0.8),(i,j,0.1),(i,j,0.1)
elif(i == 2 and j == 2):
if(actions == UP):
return (i-1,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
elif(actions == DOWN):
return (i,j,0.8),(i,j-1,0.1),(i,j+1,0.1)
elif(actions == LEFT):
return (i,j-1,0.8),(i-1,j,0.1),(i,j,1)
elif(actions == RIGHT):
return (i,j+1,0.8),(i-1,j,0.1),(i,j,0.1)
elif(i == 2 and j == 3):
if(actions == UP):
return (i-1,j,0.8),(i,j-1,0.1),(i,j,0.1)
elif(actions == DOWN):
return (i,j,0.8),(i,j-1,0.1),(i,j,0.1)
elif(actions == LEFT):
return (i,j-1,0.8),(i-1,j,0.1),(i,j,0.1)
elif(actions == RIGHT):
return (i,j,0.8),(i-1,j,0.1),(i,j,0.1)



This function is called in the value iteration function:


def value_iteration():
U1 = [[0, 0, 0, 0],
[0, 0, 0, 0],
[0, 0, 0, 0]]
while True:
U=U1.copy()
delta = 0
for i in range(len(grid)):
for j in range(len(grid[i])):
U1[i][j] = max(sum(p*(R(k,l)+gamma*U[k][l]) for (k,l,p) in T(i,j,a)) for a in actlist)
print(i,j,U1[i][j],U[i][j])
delta = max(delta, abs(U1[i][j] - U[i][j]))
if delta <= epsilon*(1 - gamma)/gamma:
return U



I was updating


U=U1.copy()



in the while loop.



The problem is, the output looks like this:


0 0 0.0 0.0
0 1 0.0 0.0
0 2 0.0 0.0
0 3 1.0 1.0
1 0 0.0 0.0
1 2 0.0 0.0
1 3 -1.0 -1.0
2 0 0.0 0.0
2 1 0.0 0.0
2 2 0.7000000000000001 0.7000000000000001
2 3 0.9630000000000001 0.9630000000000001



But I was not updating U inside the for loops. U was supposed to remain unchanged (that means, all zeros) and U1 was only supposed to change. Why U become automatically set to the value of U1 inside the for loop?




2 Answers
2



U1 (and U) is a list of lists, really a list of references to lists.



You're (shallow) copying the outer list, but the contents of the copy are still references to the same inner lists.



Try:


import copy
U = copy.deepcopy(U1)



and see what happens instead. deepcopy will correctly recursively copy the lists.


deepcopy





Thank you. But there is still a problem. Value of U1[2][2] and U1[2][3] was supposed to be zero. But they got updated to other values.
– Shifat E Arman
Aug 7 at 2:23



nimish's answer is probably the most pythonic, but if you need to create a copy of a list, you can also just unpack it into a new array:


U = U1[:]



Which creates a new object that shouldn't reference the old one


mylist = [[1,1,1],[2,2,2],[3,3,3]]
otherlist= mylist[:]
otherlist[0] = [5,5,5]
mylist
# [[1, 1, 1], [2, 2, 2], [3, 3, 3]]






By clicking "Post Your Answer", you acknowledge that you have read our updated terms of service, privacy policy and cookie policy, and that your continued use of the website is subject to these policies.

Popular posts from this blog

make 2 or more post in bootsrap

Store custom data using WC_Cart add_to_cart() method in Woocommerce 3

Firebase Auth - with Email and Password - Check user already registered