Module: bio_mod

The module bio_mod is an universal module for manipulation of biological Data, specially in creation of contact maps and fixing some mistakes in SimRNA Rocks!.

Examples

Nucleotide Sequence and Length

import BioHelpers_FABER.bio_mod as bm

sequ = bm.sequFromPDB("file.pdb")
print("Sequence: ", sequ)
print("Length: ", len(sequ))
#--- Alternative ---
print("Length: ", bm.numberOfResidues("file.pdb"))

Contact Map from PDB

import BioHelpers_FABER.bio_mod as bm
import Bio.PDB as pdb
import numpy as np
import matplotlib.pyplot as plt

# Load and create contact map
chain = pdb.PDBParser().get_structure("id", "file.pdb")[0]['A']
l = bm.numberOfResidues("file.pdb")
contactMatrix = bm.calc_contact_matrix(chain, chain, 9.5)
contactList = np.array([[e[0], e[1]] for e in bm.arrToList(contactMatrix) if e[2] == 1])

# Plot contact map
fig, ax = plt.subplots(figsize=(7.5,7.5))
ax.set_aspect(1)
ax.set_xlim([0,l])
ax.set_ylim([0,l])
ax.scatter(contactList[:,0], contactList[:,1],marker='o')
plt.show()

Sometimes there is an additional oxygen atom in the predicted SimRNA PDB file, even if the file was created with the native PDB as structure input. If this is the case it is not possible to calculate the RMSD because the total number of atoms mismatches. With the command fixAdditionalOxygen this specific error can automatically be fixed:

Fix additional atom error for a bunch of files

import BioHelpers_FABER.bio_mod as bm
import os

pathToNativePDB = "../RNA_Testset/PDB"
nat_files = os.listdir(pathToNativePDB)


for f in nat_files:
        if ".pdb" in f:
                pathToExPDB = "Clustering/"+f.replace(".pdb", "")+"/largest"
                exp_pdb = os.listdir(pathToExPDB)
                for pdb in exp_pdb:
                        if ".pdb" in pdb:
                                bm.fixAdditionalOxygen(pathToExPDB + '/' + pdb, pathToNativePDB + '/' + f)

Members

BioHelpers_FABER.bio_mod.arrToList(array: ndarray) → list

Converts contact map from numpy.array to list format

Parameters:: array (np.array) – Contact map as numpy array, format \((L\times L)\) with specífic value as entry
Returns:: List w/ entries of the form [x,y,val] for every position (\((L\times L)\) elements), the position starts with 0!
Return type:: list

BioHelpers_FABER.bio_mod.arr_to_contact_list(array: ndarray) → list[tuple[int, int]]

Converts contact map from numpy.array to contact list format

Parameters:: array (np.ndarray) – Contact map as numpy array, format \((L\times L)\) with specífic value as entry
Returns:: List w/ entries of the form [x,y] for every position (\((L\times L)\) elements), the position starts with 0!
Return type:: list[tuple[int, int]]

BioHelpers_FABER.bio_mod.average_coord(residue: Residue) → ndarray

Average coordinates for a residue (NOT center of mass)

Parameters:: residue (Bio.PDB.Residue.Residue()) – input residue object
Returns:: Average coordinates \((\tilde{x}^1, \tilde{x}^2, \tilde{x}^3)^T\) of all \(n\) atoms \(\vec{x}_i\) from \(i=1,...,n\) from input residue: \(\tilde{x}^j = \frac{1}{n} \sum_{i=1}^{n} x^j_i\)
Return type:: np.array

BioHelpers_FABER.bio_mod.calc_contact_matrix(chain_one: Chain, chain_two: Chain, dist: float, res_type: str = 'rna', reference: str = 'nitrogen') → ndarray

Calculates contact matrix

Parameters:

chain_one (Chain) – Chain1
chain_two (Chain) – Chain2
dist (float) – Threshold distance
res_type (str, optional) – Type of biomolecule, defaults to “rna”
reference (str, optional) – Representative atom, defaults to “nitrogen”

Returns:

Contact matrix

Return type:

np.ndarray

BioHelpers_FABER.bio_mod.calc_dist_matrix(chain_one: Chain, chain_two: Chain, res_type: str = 'rna', reference: str = 'nitrogen', exclude_het: bool = True) → ndarray

Distance matrix \(D_{i,j} = \|\vec{x}_i - \vec{y}_j\|_2\) between two chains, where \(\vec{x}_i\) is the representatives of residue \(i\) from chain one and \(\vec{y}_j\) the representative of residue \(j\) from chain two (for contact map chain one = chain two)

Parameters:

chain_one (Bio.PDB.Chain.Chain()) – Chain one
chain_two (Bio.PDB.Chain.Chain()) – Chain two

Returns:

Distance matrix \(D\)

Return type:

np.array

BioHelpers_FABER.bio_mod.calc_residue_dist(residue_one: Residue, residue_two: Residue, res_type: str = 'rna', reference: str = 'nitrogen') → float

Distance between the representatives between two residues

Parameters:

residue_one (Residue) – Residue1
residue_two (Residue) – Residue2
res_type (str, optional) – Type of biomolecule, defaults to “rna”
reference (str, optional) – Which atom should represent the residue, defaults to “nitrogen”

Returns:

Distance of the two representatives

Return type:

float

BioHelpers_FABER.bio_mod.cleanup_chain(chain) → None

Delete all non AminoAcids or RNA, like GTP, Mg aso. from chain

Parameters:: chain (Bio.PDB.Chain.Chain()) – Chain to clean up

BioHelpers_FABER.bio_mod.comparePDB(simrna_file: str, experiment_file: str) → str

Compares all atoms of all residues, prints out mismatches! A function specific for SimRNA.

Parameters:

simrna_file (str) – Filename of SimRNA PDB
experiment_file (str) – Filename of structural PDB

Returns:

String with various informations.

Return type:

str

BioHelpers_FABER.bio_mod.conMatFromFile(filename: str, L: int, noc: int, neigh=0) → ndarray

Creation of a contact matrix from a prediction. The predictive contacts have to have the format res_i \t res_j \t score for each line. Lines beginning with # will be neglected.

Parameters:

filename (str) – Filename of the predicted contacts, for format see above.
L (int) – Size of the molecule.
noc (int) – Number of contacts, contacts will be listes by their score. The noc with highest score will be included.
neigh (int, optional) – Number of off-diagonals to exclude, defaults to 0

Returns:

\(L\times L\) matrix with \(0\) for no contact and \(1\) for contact.

Return type:

np.array

BioHelpers_FABER.bio_mod.delAtom(res: Residue, i: int) → None

Deletes Atom for a given Residue on Pos i

Parameters:

res (Bio.PDB.Residue.Residue()) – Residue
i (int) – Atom number in the choosen residue.

BioHelpers_FABER.bio_mod.deleteNeighbours(matrix: ndarray, neighbour_number: int) → ndarray

Deletes (set all entries to 0) diagonal and all off-diagonals (count neighbour_number)

Parameters:

matrix (np.array) – input matrix
neighbour_number (int) – off-diagonals to delete (if 0 only the main diagonal will be set to 0)

Returns:

output matrix

Return type:

np.array

BioHelpers_FABER.bio_mod.findFirstMismatch(simrna: Chain, experiment: Chain) → tuple[Residue, int]

Finds first atom mismatching with the experimental chain

Parameters:

simrna (Bio.PDB.Chain.Chain()) – Chain of the SimRNA prediction, with possible additional atom
experiment (Bio.PDB.Chain.Chain()) – Chain of the native PDB (e.q. from ProteinDatabase)

Returns:

Tuple with Residue and position

Return type:

tuple[Bio.PDB.Residue.Residue(), int]

BioHelpers_FABER.bio_mod.fixAdditionalOxygen(corruptedFile: str, structFile: str) → None

Delete additional Atom in corrupted File, SimRNA specific function

Parameters:

corruptedFile (str) – Filename/Path of corrupted File
structFile (str) – Filename/Path of reference File

Raises:

Exception – If Files do not exist!

BioHelpers_FABER.bio_mod.get_reference_coordinates(res1: Residue, res2: Residue, reference: str = 'nitrogen', res_type='rna') → tuple[ndarray, ndarray]

Returns two sets of reference coordinates for each residue.

Parameters:

res1 (Residue) – Residue 1
res2 (Residue) – Residue 2
reference (str, optional) – Which reference atom shall be taken? possible: “nitrogen” or “nearest, defaults to “nitrogen”
res_type (str, optional) – Type of biomolecule, defaults to “rna”

Returns:

Reference coordinates for both residues

Return type:

tuple[np.ndarray, np.ndarray]

BioHelpers_FABER.bio_mod.get_reference_coordinates_nearest(res1: Residue, res2: Residue) → tuple[ndarray, ndarray]

Returns reference coordinates for each of the residues. Point of reference: nearest heavy atom pair.

Parameters:

res1 (Residue) – Residue1
res2 (Residue) – Residue2

Returns:

Reference coordinates for both residues

Return type:

tuple[np.ndarray, np.ndarray]

BioHelpers_FABER.bio_mod.get_reference_coordinates_nitrogen(res1: Residue, res2: Residue) → tuple[ndarray, ndarray]

Returns reference coordinates for each of the residues. Point of reference: Nitrogen atom.

Parameters:

res1 (Residue) – Residue1
res2 (Residue) – Residue2

Returns:

Reference coordinates for both residues

Return type:

tuple[np.ndarray, np.ndarray]

BioHelpers_FABER.bio_mod.get_sequence_position(filename: str) → tuple

Show all the non-het Residues in a given pdb file and the associated position in the native molecule.

Parameters:: filename (str) – Filename of PDB
Returns:: Tuple of list with positions and sequence as string
Return type:: tuple

BioHelpers_FABER.bio_mod.is_rnaRes(residue) → bool

Checks if a residue is not an AminoAcid and other stuff

Parameters:: residue (Bio.PDB.Residue.Residue()) – Input residue
Returns:: Check
Return type:: bool

BioHelpers_FABER.bio_mod.listToArr(c: list) → ndarray

Converts contact map from list format to numpy.array

Parameters:: list (list) – List with entries [x,y,val], do not need to have \((L \times L)\) entries!
Returns:: Numpy Array of size \(\left( (\max_c(x)+1) \times (\max_c(y)+1)\right)\)
Return type:: np.array

BioHelpers_FABER.bio_mod.numberOfResidues(file: str) → int

Counts the number of residues for a given PDB file

Parameters:: file (str) – Input Filename in PDB format
Returns:: Number of residues
Return type:: int

BioHelpers_FABER.bio_mod.renumber_pdb(file: str) → None

Renumbers all present residues starting by one Caveat: Only use for cleaned chains!

Parameters:: file (str) – Input file, Output will be saved under file+”renum.pdb”

BioHelpers_FABER.bio_mod.residue_coord(residue, res_type: str = 'rna', reference: str = 'nitrogen') → ndarray

Representative Coordinate of a residue, for proteins C alpha for RNA the coordinate of the nitrogen atom Deprecated!

Parameters:

residue (Bio.PDB.Residue.Residue()) – input residue object
res_type (str, optional) – residue type, defaults to “rna”

Raises:

RuntimeError – if residue is neither rna or protein

Returns:

coordinate of the representative

Return type:

numpy.array

BioHelpers_FABER.bio_mod.sequFromPDB(filename: str) → str

Returns residues Sequence from PDB File, only residues in PDB!

Parameters:: filename (str) – Filename/Path of the PDB File
Returns:: RNA Sequence of the given PDB File
Return type:: str

BioHelpers_FABER.bio_mod.triangularMatrix(matrix: ndarray) → ndarray

Cuts off from the diagonal

Parameters:: matrix (np.array) – input matrix
Returns:: output matrix
Return type:: np.array