Question

Parsing Pdb file using Biopython parser: How to get all atoms instead of duplicates?

0

Entering edit mode

5.2 years ago

movchinar • 0

Hello everyone,

I am new in bioinformatics and when tried to parse pdb file using Biopython library came across this error: some atoms which defined twice in residue could not get their coordinates, id, name, etc.

How could I get all atoms instead of duplicates?

Here is my code:

from Bio.PDB.PDBParser import PDBParser
parser = PDBParser(PERMISSIVE=1)

structure = parser.get_structure("test", "/home/chinar/Downloads/Serinthreonine_ protein kinase, PIM 2/1_doc.pdb")


for model in structure:
    for chain in model:
        for residue in chain:
            for atom in residue:
                print(atom)

Output:

Warning: PDBConstructionException: Atom C defined twice in residue <Residue UNK het=H_UNK resseq=0 icode= > at line 31.

Exception ignored.

Some atoms or residues may be missing in the data structure.
  % message, PDBConstructionWarning)
/usr/local/lib/python3.6/dist-packages/Bio/PDB/PDBParser.py:291:

PDBConstructionWarning: 

PDBConstructionException: Atom C defined twice in residue <Residue UNK het=H_UNK resseq=0 icode= > at line 32.

Exception ignored.

Some atoms or residues may be missing in the data structure.

  % message, PDBConstructionWarning)

<Atom C>

<Atom N>

sequence software error assembly biopython • 6.3k views

ADD COMMENT • link updated 11 months ago by IkramInf ▴ 20 • written 5.2 years ago by movchinar • 0

0

Entering edit mode

Is this a PDB file you have created?

Is the file definitely correctly formed?

ADD REPLY • link 5.2 years ago by Joe 21k

0

Entering edit mode

After auto docking process the result was a file which format was pdbqt. I have changed the format pdbqt to PDB using UNIX command (cut -c-66 my_docking.pdbqt > my_docking.pdb).

Here is my PDB file:

MODEL 1

REMARK VINA RESULT:     -11.8      0.000      0.000

REMARK  1 active torsions:

REMARK  status: ('A' for Active; 'I' for Inactive)

REMARK    1  A    between atoms: C_18  and  C_23 

ROOT

HETATM    1  C   UNK     0      25.880   2.303   4.352  0.00  0.00

HETATM    2  C   UNK     0      26.930   7.122  -0.622  0.00  0.00

HETATM    3  C   UNK     0      27.007   7.975  -1.745  0.00  0.00

HETATM    4  C   UNK     0      26.792   7.469  -3.046  0.00  0.00

HETATM    5  C   UNK     0      26.498   6.104  -3.241  0.00  0.00

HETATM    6  C   UNK     0      26.639   5.755  -0.818  0.00  0.00

HETATM    7  C   UNK     0      26.423   5.271  -2.112  0.00  0.00

HETATM    8  N   UNK     0      26.153   3.962  -2.056  0.00  0.00

HETATM    9  N   UNK     0      26.501   4.704   0.026  0.00  0.00

HETATM   10  C   UNK     0      26.195   3.640  -0.750  0.00  0.00

HETATM   11  C   UNK     0      25.974   2.283  -0.221  0.00  0.00

HETATM   12  C   UNK     0      25.681   2.278   1.308  0.00  0.00

HETATM   13  C   UNK     0      26.620   4.618   1.394  0.00  0.00

HETATM   14  C   UNK     0      26.212   3.442   2.083  0.00  0.00

HETATM   15  C   UNK     0      26.293   3.445   3.503  0.00  0.00

HETATM   16  N   UNK     0      26.782   4.546   4.132  0.00  0.00

HETATM   17  C   UNK     0      27.213   5.646   3.465  0.00  0.00

HETATM   18  N   UNK     0      27.120   5.656   2.111  0.00  0.00

ENDROOT

BRANCH  17  19

HETATM   19  C   UNK     0      27.743   6.758   4.154  0.00  0.00

HETATM   20  C   UNK     0      27.954   7.982   3.476  0.00  0.00

HETATM   21  C   UNK     0      28.451   9.104   4.164  0.00  0.00

HETATM   22  C   UNK     0      28.735   9.018   5.539  0.00  0.00

HETATM   23  C   UNK     0      28.546   7.801   6.221  0.00  0.00

HETATM   24  C   UNK     0      28.061   6.673   5.531  0.00  0.00

ENDBRANCH  17  19

TORSDOF 1

ENDMDL

.

.

.

MODEL 20

.

.

.

ADD REPLY • link updated 5.2 years ago by Joe 21k • written 5.2 years ago by movchinar • 0

score 0 · Answer 1 · 2023-11-15

from Bio.PDB.PDBParser import PDBParser
parser = PDBParser(PERMISSIVE=1)

structure = parser.get_structure("test", "/home/chinar/Downloads/Serinthreonine_ protein kinase, PIM 2/1_doc.pdb")

atoms = set()
for model in structure:
    for chain in model:
        for residue in chain:
            for atom in residue:
                atoms.add(atom)

print(atoms)