Module: bio_mod
The module bio_mod is an universal module for manipulation of biological Data, specially in creation of contact maps and fixing some mistakes in SimRNA Rocks!.
Examples
1import BioHelpers_FABER.bio_mod as bm
2
3sequ = bm.sequFromPDB("file.pdb")
4print("Sequence: ", sequ)
5print("Length: ", len(sequ))
6#--- Alternative ---
7print("Length: ", bm.numberOfResidues("file.pdb"))
1import BioHelpers_FABER.bio_mod as bm
2import Bio.PDB as pdb
3import numpy as np
4import matplotlib.pyplot as plt
5
6# Load and create contact map
7chain = pdb.PDBParser().get_structure("id", "file.pdb")[0]['A']
8l = bm.numberOfResidues("file.pdb")
9contactMatrix = bm.calc_contact_matrix(chain, chain, 9.5)
10contactList = np.array([[e[0], e[1]] for e in bm.arrToList(contactMatrix) if e[2] == 1])
11
12# Plot contact map
13fig, ax = plt.subplots(figsize=(7.5,7.5))
14ax.set_aspect(1)
15ax.set_xlim([0,l])
16ax.set_ylim([0,l])
17ax.scatter(contactList[:,0], contactList[:,1],marker='o')
18plt.show()
Sometimes there is an additional oxygen atom in the predicted SimRNA PDB file, even if the file was created with the native PDB as structure input. If this is the case it is not possible to calculate the RMSD because the total number of atoms mismatches. With the command fixAdditionalOxygen
this specific error can automatically be fixed:
1import BioHelpers_FABER.bio_mod as bm
2import os
3
4pathToNativePDB = "../RNA_Testset/PDB"
5nat_files = os.listdir(pathToNativePDB)
6
7
8for f in nat_files:
9 if ".pdb" in f:
10 pathToExPDB = "Clustering/"+f.replace(".pdb", "")+"/largest"
11 exp_pdb = os.listdir(pathToExPDB)
12 for pdb in exp_pdb:
13 if ".pdb" in pdb:
14 bm.fixAdditionalOxygen(pathToExPDB + '/' + pdb, pathToNativePDB + '/' + f)
Members
- BioHelpers_FABER.bio_mod.arrToList(array: ndarray) list
Converts contact map from numpy.array to list format
- Parameters:
array (np.array) – Contact map as numpy array, format \((L\times L)\) with specífic value as entry
- Returns:
List w/ entries of the form [x,y,val] for every position (\((L\times L)\) elements), the position starts with 0!
- Return type:
list
- BioHelpers_FABER.bio_mod.arr_to_contact_list(array: ndarray) list[tuple[int, int]]
Converts contact map from numpy.array to contact list format
- Parameters:
array (np.ndarray) – Contact map as numpy array, format \((L\times L)\) with specífic value as entry
- Returns:
List w/ entries of the form [x,y] for every position (\((L\times L)\) elements), the position starts with 0!
- Return type:
list[tuple[int, int]]
- BioHelpers_FABER.bio_mod.average_coord(residue: Residue) ndarray
Average coordinates for a residue (NOT center of mass)
- Parameters:
residue (Bio.PDB.Residue.Residue()) – input residue object
- Returns:
Average coordinates \((\tilde{x}^1, \tilde{x}^2, \tilde{x}^3)^T\) of all \(n\) atoms \(\vec{x}_i\) from \(i=1,...,n\) from input residue: \(\tilde{x}^j = \frac{1}{n} \sum_{i=1}^{n} x^j_i\)
- Return type:
np.array
- BioHelpers_FABER.bio_mod.calc_contact_matrix(chain_one: Chain, chain_two: Chain, dist: float, res_type: str = 'rna', reference: str = 'nitrogen') ndarray
Calculates contact matrix
- Parameters:
chain_one (Chain) – Chain1
chain_two (Chain) – Chain2
dist (float) – Threshold distance
res_type (str, optional) – Type of biomolecule, defaults to “rna”
reference (str, optional) – Representative atom, defaults to “nitrogen”
- Returns:
Contact matrix
- Return type:
np.ndarray
- BioHelpers_FABER.bio_mod.calc_dist_matrix(chain_one: Chain, chain_two: Chain, res_type: str = 'rna', reference: str = 'nitrogen', exclude_het: bool = True) ndarray
Distance matrix \(D_{i,j} = \|\vec{x}_i - \vec{y}_j\|_2\) between two chains, where \(\vec{x}_i\) is the representatives of residue \(i\) from chain one and \(\vec{y}_j\) the representative of residue \(j\) from chain two (for contact map chain one = chain two)
- Parameters:
chain_one (Bio.PDB.Chain.Chain()) – Chain one
chain_two (Bio.PDB.Chain.Chain()) – Chain two
- Returns:
Distance matrix \(D\)
- Return type:
np.array
- BioHelpers_FABER.bio_mod.calc_residue_dist(residue_one: Residue, residue_two: Residue, res_type: str = 'rna', reference: str = 'nitrogen') float
Distance between the representatives between two residues
- Parameters:
residue_one (Residue) – Residue1
residue_two (Residue) – Residue2
res_type (str, optional) – Type of biomolecule, defaults to “rna”
reference (str, optional) – Which atom should represent the residue, defaults to “nitrogen”
- Returns:
Distance of the two representatives
- Return type:
float
- BioHelpers_FABER.bio_mod.cleanup_chain(chain) None
Delete all non AminoAcids or RNA, like GTP, Mg aso. from chain
- Parameters:
chain (Bio.PDB.Chain.Chain()) – Chain to clean up
- BioHelpers_FABER.bio_mod.comparePDB(simrna_file: str, experiment_file: str) str
Compares all atoms of all residues, prints out mismatches! A function specific for SimRNA.
- Parameters:
simrna_file (str) – Filename of SimRNA PDB
experiment_file (str) – Filename of structural PDB
- Returns:
String with various informations.
- Return type:
str
- BioHelpers_FABER.bio_mod.conMatFromFile(filename: str, L: int, noc: int, neigh=0) ndarray
Creation of a contact matrix from a prediction. The predictive contacts have to have the format
res_i \t res_j \t score
for each line. Lines beginning with#
will be neglected.- Parameters:
filename (str) – Filename of the predicted contacts, for format see above.
L (int) – Size of the molecule.
noc (int) – Number of contacts, contacts will be listes by their score. The
noc
with highest score will be included.neigh (int, optional) – Number of off-diagonals to exclude, defaults to 0
- Returns:
\(L\times L\) matrix with \(0\) for no contact and \(1\) for contact.
- Return type:
np.array
- BioHelpers_FABER.bio_mod.delAtom(res: Residue, i: int) None
Deletes Atom for a given Residue on Pos i
- Parameters:
res (Bio.PDB.Residue.Residue()) – Residue
i (int) – Atom number in the choosen residue.
- BioHelpers_FABER.bio_mod.deleteNeighbours(matrix: ndarray, neighbour_number: int) ndarray
Deletes (set all entries to 0) diagonal and all off-diagonals (count neighbour_number)
- Parameters:
matrix (np.array) – input matrix
neighbour_number (int) – off-diagonals to delete (if 0 only the main diagonal will be set to 0)
- Returns:
output matrix
- Return type:
np.array
- BioHelpers_FABER.bio_mod.findFirstMismatch(simrna: Chain, experiment: Chain) tuple[Residue, int]
Finds first atom mismatching with the experimental chain
- Parameters:
simrna (Bio.PDB.Chain.Chain()) – Chain of the SimRNA prediction, with possible additional atom
experiment (Bio.PDB.Chain.Chain()) – Chain of the native PDB (e.q. from ProteinDatabase)
- Returns:
Tuple with Residue and position
- Return type:
tuple[Bio.PDB.Residue.Residue(), int]
- BioHelpers_FABER.bio_mod.fixAdditionalOxygen(corruptedFile: str, structFile: str) None
Delete additional Atom in corrupted File, SimRNA specific function
- Parameters:
corruptedFile (str) – Filename/Path of corrupted File
structFile (str) – Filename/Path of reference File
- Raises:
Exception – If Files do not exist!
- BioHelpers_FABER.bio_mod.get_reference_coordinates(res1: Residue, res2: Residue, reference: str = 'nitrogen', res_type='rna') tuple[ndarray, ndarray]
Returns two sets of reference coordinates for each residue.
- Parameters:
res1 (Residue) – Residue 1
res2 (Residue) – Residue 2
reference (str, optional) – Which reference atom shall be taken? possible: “nitrogen” or “nearest, defaults to “nitrogen”
res_type (str, optional) – Type of biomolecule, defaults to “rna”
- Returns:
Reference coordinates for both residues
- Return type:
tuple[np.ndarray, np.ndarray]
- BioHelpers_FABER.bio_mod.get_reference_coordinates_nearest(res1: Residue, res2: Residue) tuple[ndarray, ndarray]
Returns reference coordinates for each of the residues. Point of reference: nearest heavy atom pair.
- Parameters:
res1 (Residue) – Residue1
res2 (Residue) – Residue2
- Returns:
Reference coordinates for both residues
- Return type:
tuple[np.ndarray, np.ndarray]
- BioHelpers_FABER.bio_mod.get_reference_coordinates_nitrogen(res1: Residue, res2: Residue) tuple[ndarray, ndarray]
Returns reference coordinates for each of the residues. Point of reference: Nitrogen atom.
- Parameters:
res1 (Residue) – Residue1
res2 (Residue) – Residue2
- Returns:
Reference coordinates for both residues
- Return type:
tuple[np.ndarray, np.ndarray]
- BioHelpers_FABER.bio_mod.get_sequence_position(filename: str) tuple
Show all the non-het Residues in a given pdb file and the associated position in the native molecule.
- Parameters:
filename (str) – Filename of PDB
- Returns:
Tuple of list with positions and sequence as string
- Return type:
tuple
- BioHelpers_FABER.bio_mod.is_rnaRes(residue) bool
Checks if a residue is not an AminoAcid and other stuff
- Parameters:
residue (Bio.PDB.Residue.Residue()) – Input residue
- Returns:
Check
- Return type:
bool
- BioHelpers_FABER.bio_mod.listToArr(c: list) ndarray
Converts contact map from list format to numpy.array
- Parameters:
list (list) – List with entries [x,y,val], do not need to have \((L \times L)\) entries!
- Returns:
Numpy Array of size \(\left( (\max_c(x)+1) \times (\max_c(y)+1)\right)\)
- Return type:
np.array
- BioHelpers_FABER.bio_mod.numberOfResidues(file: str) int
Counts the number of residues for a given PDB file
- Parameters:
file (str) – Input Filename in PDB format
- Returns:
Number of residues
- Return type:
int
- BioHelpers_FABER.bio_mod.renumber_pdb(file: str) None
Renumbers all present residues starting by one Caveat: Only use for cleaned chains!
- Parameters:
file (str) – Input file, Output will be saved under file+”renum.pdb”
- BioHelpers_FABER.bio_mod.residue_coord(residue, res_type: str = 'rna', reference: str = 'nitrogen') ndarray
Representative Coordinate of a residue, for proteins C alpha for RNA the coordinate of the nitrogen atom Deprecated!
- Parameters:
residue (Bio.PDB.Residue.Residue()) – input residue object
res_type (str, optional) – residue type, defaults to “rna”
- Raises:
RuntimeError – if residue is neither rna or protein
- Returns:
coordinate of the representative
- Return type:
numpy.array
- BioHelpers_FABER.bio_mod.sequFromPDB(filename: str) str
Returns residues Sequence from PDB File, only residues in PDB!
- Parameters:
filename (str) – Filename/Path of the PDB File
- Returns:
RNA Sequence of the given PDB File
- Return type:
str
- BioHelpers_FABER.bio_mod.triangularMatrix(matrix: ndarray) ndarray
Cuts off from the diagonal
- Parameters:
matrix (np.array) – input matrix
- Returns:
output matrix
- Return type:
np.array