Trajectory.atom_slice
uses different orders for coordinates and topology #1937
Description
Trajectory.atom_slice
method attempts to generate a subtrajectory for the atom slice. However, if the atom_indices
parameters are not sorted, this can lead to an error. What the method does is it takes coordinates for atom_indices
without sorting:
xyz = np.array(self.xyz[:, atom_indices], order="C")
and then creates a topology for a subset:
topology = self._topology.subset(atom_indices)
Topology.subset
method is a wrapper over Topology._topology_from_subset
method. This method iterates over all atoms in sorted way and checks if atom index is in atom indices. Thus, if provided atom_indices is not a sorted array, the order of coordinates in xyz
and atoms in topology
does not match. The real use case for this to happen was application of the following function to 3QAP
cif structurte downloaded from rcsb:
def cif_to_pdb(path_to_cif: str, ligand_names_list: list[str] = []) -> str:
"""
Path to cif file.
"""
base = os.path.basename(path_to_cif)
dir_name = os.path.dirname(path_to_cif)
name, _ = os.path.splitext(base)
out = os.path.join(dir_name, f"{name}.pdb")
st = md.load(path_to_cif)
prot = st.topology.select("protein").astype(int)
water = st.topology.select("water").astype(int)
ligands = (st.topology.select("resname " + " ".join(ligand_names_list)) if ligand_names_list else np.array([])).astype(int)
#sel = np.sort(np.concatenate([prot, water, ligands]))
sel = np.concatenate([prot, water, ligands])
st_slice = st.atom_slice(sel)
st_slice.save(out)
return out
cif_to_pdb("3QAP.cif", ["RET"])
In the structure ligand is before water molecules. So when I do not sort the final selection sel
, the order of coordinates is prot-water-ligands, while the order of atoms is prot-ligands-water. Sorting solves the problem