Probably not the fastest but maybe educational: The operation you are describing can be thought of as matrix multiplication with a certain adjacency matrix:
from scipy import sparse# construct adjacency matrixindices = E[E!=-1]indptr = np.concatenate([[0],np.count_nonzero(E!=-1,axis=1).cumsum()])data = np.ones_like(indptr)aux = sparse.csr_matrix((data,indices,indptr))# multiplyaux*M# array([[5, 7, 9],# [7, 8, 9],# [1, 2, 3]], dtype=int64)