An important processing to do before running ICE is to remove (or set to zero) all bins with less than some number of reads (e.g. 50), and remove all bins with less than some number of non-zero entrees (e.g. 10). If those bins are not removed, it creates “standalone” pixels in the map in the rows that are almost completely zero. Those “standalone” pixels can leave to non-convergence, and even if the algorithm converge the matrix sometimes ends up being incorrect.

Can you try feeding your matrix at 1MB or 100kb resolution to the mirnylib.numutils.completeIC method of the mirnylib package? It will do these filtering steps for you. If you send it to me I can run it for you as well.

We are also developing a cooler package that can so ICE on sparse matrices. You can try it as well.

Cooler: https://github.com/mirnylab/cooler/tree/master/cooler

Mirnylib: https://www.google.com/search?q=mirnylib&oq=mirnylib&aqs=chrome..69i57j0.1546j0j7&sourceid=chrome&ie=UTF-8

]]>To add on, I have tried 100Kb resolution, 1Mb resolution.

]]>I have been trying to apply ICE normalization on my mouse Hi-C data (700 million paired-end reads) digested with DpnII (4-base cutter). Mapping and fragment filtering were done with HiCUP while raw interaction matrix was computed using Homer. The observed matrix was fed into “normICE” function in HiTC package with 500 maximum iterations. I was surprised to find that it did not converge.

]]>Dear Maxim,

Many thanks for taking the time to comment on the post. I appreciate your reply. Thanks again,

Lior

]]>Reply to Jonathan: Hi-C matrices (whole-genome and single chromosome contact maps) are symmetric, therefore there is no need to alternate between rows and columns.

Reply to Martin: Our algorithm converges to a matrix which has a constant as a sum of any row/column (this roughly preserves the overall sum of the matrix, which is useful for certain analyses). The matrix can be then divided by this constant, and then it would be normalized to one.

]]>Let the real symmetric non-negative matrix $O$ have all row sums equal to $c$ for some positive constant $c \ne 1$. Then we have that $S_i = c$ for all $i$, and further that $\overline{S} = c$. Thus we get that $\Delta B_i^k = 1$, hence $B_i^k = 1$, and $W^k = O$, for all $k$. This implies that $W^{\infty} = O$. Since $O$ does not provide row sums equal to $1$, this trivial limit differs from any existing solution $T$.

For example, for $O=[0 2; 2 0]$ we get for all $k$ that $B^k=[1 0; 0 1]$ and $W^k = [0 2; 2 0]$, which does not provide unit row sums. However, IPF would converge to the (inverse of the) correct solution, which is $B^k = [sqrt(2) 0; 0 sqrt(2)]$ and $T = [0 1; 1 0]$, yielding $O = B * T * B$.

Probably this flaw occurs from the fact that the IPF procedure *alternately* maps row sums and column sums to their intended values, while the update rule for the $\epsilon_{ij}$ in your post does only consider the row summation part. It seems as if the HiC-authors wanted to provide a multiplicative symmetrization of the IPF procedure in order to yield unit row sums, but their specific approach fails.

]]>