Module: bio_mod

The module bio_mod is an universal module for manipulation of biological Data, specially in creation of contact maps and fixing some mistakes in SimRNA Rocks!.

Examples

Nucleotide Sequence and Length
1import BioHelpers_FABER.bio_mod as bm
2
3sequ = bm.sequFromPDB("file.pdb")
4print("Sequence: ", sequ)
5print("Length: ", len(sequ))
6#--- Alternative ---
7print("Length: ", bm.numberOfResidues("file.pdb"))
Contact Map from PDB
 1import BioHelpers_FABER.bio_mod as bm
 2import Bio.PDB as pdb
 3import numpy as np
 4import matplotlib.pyplot as plt
 5
 6# Load and create contact map
 7chain = pdb.PDBParser().get_structure("id", "file.pdb")[0]['A']
 8l = bm.numberOfResidues("file.pdb")
 9contactMatrix = bm.calc_contact_matrix(chain, chain, 9.5)
10contactList = np.array([[e[0], e[1]] for e in bm.arrToList(contactMatrix) if e[2] == 1])
11
12# Plot contact map
13fig, ax = plt.subplots(figsize=(7.5,7.5))
14ax.set_aspect(1)
15ax.set_xlim([0,l])
16ax.set_ylim([0,l])
17ax.scatter(contactList[:,0], contactList[:,1],marker='o')
18plt.show()

Sometimes there is an additional oxygen atom in the predicted SimRNA PDB file, even if the file was created with the native PDB as structure input. If this is the case it is not possible to calculate the RMSD because the total number of atoms mismatches. With the command fixAdditionalOxygen this specific error can automatically be fixed:

Fix additional atom error for a bunch of files
 1import BioHelpers_FABER.bio_mod as bm
 2import os
 3
 4pathToNativePDB = "../RNA_Testset/PDB"
 5nat_files = os.listdir(pathToNativePDB)
 6
 7
 8for f in nat_files:
 9        if ".pdb" in f:
10                pathToExPDB = "Clustering/"+f.replace(".pdb", "")+"/largest"
11                exp_pdb = os.listdir(pathToExPDB)
12                for pdb in exp_pdb:
13                        if ".pdb" in pdb:
14                                bm.fixAdditionalOxygen(pathToExPDB + '/' + pdb, pathToNativePDB + '/' + f)

Members

BioHelpers_FABER.bio_mod.arrToList(array: ndarray) list

Converts contact map from numpy.array to list format

Parameters:

array (np.array) – Contact map as numpy array, format \((L\times L)\) with specífic value as entry

Returns:

List w/ entries of the form [x,y,val] for every position (\((L\times L)\) elements), the position starts with 0!

Return type:

list

BioHelpers_FABER.bio_mod.arr_to_contact_list(array: ndarray) list[tuple[int, int]]

Converts contact map from numpy.array to contact list format

Parameters:

array (np.ndarray) – Contact map as numpy array, format \((L\times L)\) with specífic value as entry

Returns:

List w/ entries of the form [x,y] for every position (\((L\times L)\) elements), the position starts with 0!

Return type:

list[tuple[int, int]]

BioHelpers_FABER.bio_mod.average_coord(residue: Residue) ndarray

Average coordinates for a residue (NOT center of mass)

Parameters:

residue (Bio.PDB.Residue.Residue()) – input residue object

Returns:

Average coordinates \((\tilde{x}^1, \tilde{x}^2, \tilde{x}^3)^T\) of all \(n\) atoms \(\vec{x}_i\) from \(i=1,...,n\) from input residue: \(\tilde{x}^j = \frac{1}{n} \sum_{i=1}^{n} x^j_i\)

Return type:

np.array

BioHelpers_FABER.bio_mod.calc_contact_matrix(chain_one: Chain, chain_two: Chain, dist: float, res_type: str = 'rna', reference: str = 'nitrogen') ndarray

Calculates contact matrix

Parameters:
  • chain_one (Chain) – Chain1

  • chain_two (Chain) – Chain2

  • dist (float) – Threshold distance

  • res_type (str, optional) – Type of biomolecule, defaults to “rna”

  • reference (str, optional) – Representative atom, defaults to “nitrogen”

Returns:

Contact matrix

Return type:

np.ndarray

BioHelpers_FABER.bio_mod.calc_dist_matrix(chain_one: Chain, chain_two: Chain, res_type: str = 'rna', reference: str = 'nitrogen', exclude_het: bool = True) ndarray

Distance matrix \(D_{i,j} = \|\vec{x}_i - \vec{y}_j\|_2\) between two chains, where \(\vec{x}_i\) is the representatives of residue \(i\) from chain one and \(\vec{y}_j\) the representative of residue \(j\) from chain two (for contact map chain one = chain two)

Parameters:
  • chain_one (Bio.PDB.Chain.Chain()) – Chain one

  • chain_two (Bio.PDB.Chain.Chain()) – Chain two

Returns:

Distance matrix \(D\)

Return type:

np.array

BioHelpers_FABER.bio_mod.calc_residue_dist(residue_one: Residue, residue_two: Residue, res_type: str = 'rna', reference: str = 'nitrogen') float

Distance between the representatives between two residues

Parameters:
  • residue_one (Residue) – Residue1

  • residue_two (Residue) – Residue2

  • res_type (str, optional) – Type of biomolecule, defaults to “rna”

  • reference (str, optional) – Which atom should represent the residue, defaults to “nitrogen”

Returns:

Distance of the two representatives

Return type:

float

BioHelpers_FABER.bio_mod.cleanup_chain(chain) None

Delete all non AminoAcids or RNA, like GTP, Mg aso. from chain

Parameters:

chain (Bio.PDB.Chain.Chain()) – Chain to clean up

BioHelpers_FABER.bio_mod.comparePDB(simrna_file: str, experiment_file: str) str

Compares all atoms of all residues, prints out mismatches! A function specific for SimRNA.

Parameters:
  • simrna_file (str) – Filename of SimRNA PDB

  • experiment_file (str) – Filename of structural PDB

Returns:

String with various informations.

Return type:

str

BioHelpers_FABER.bio_mod.conMatFromFile(filename: str, L: int, noc: int, neigh=0) ndarray

Creation of a contact matrix from a prediction. The predictive contacts have to have the format res_i \t res_j \t score for each line. Lines beginning with # will be neglected.

Parameters:
  • filename (str) – Filename of the predicted contacts, for format see above.

  • L (int) – Size of the molecule.

  • noc (int) – Number of contacts, contacts will be listes by their score. The noc with highest score will be included.

  • neigh (int, optional) – Number of off-diagonals to exclude, defaults to 0

Returns:

\(L\times L\) matrix with \(0\) for no contact and \(1\) for contact.

Return type:

np.array

BioHelpers_FABER.bio_mod.delAtom(res: Residue, i: int) None

Deletes Atom for a given Residue on Pos i

Parameters:
  • res (Bio.PDB.Residue.Residue()) – Residue

  • i (int) – Atom number in the choosen residue.

BioHelpers_FABER.bio_mod.deleteNeighbours(matrix: ndarray, neighbour_number: int) ndarray

Deletes (set all entries to 0) diagonal and all off-diagonals (count neighbour_number)

Parameters:
  • matrix (np.array) – input matrix

  • neighbour_number (int) – off-diagonals to delete (if 0 only the main diagonal will be set to 0)

Returns:

output matrix

Return type:

np.array

BioHelpers_FABER.bio_mod.findFirstMismatch(simrna: Chain, experiment: Chain) tuple[Residue, int]

Finds first atom mismatching with the experimental chain

Parameters:
  • simrna (Bio.PDB.Chain.Chain()) – Chain of the SimRNA prediction, with possible additional atom

  • experiment (Bio.PDB.Chain.Chain()) – Chain of the native PDB (e.q. from ProteinDatabase)

Returns:

Tuple with Residue and position

Return type:

tuple[Bio.PDB.Residue.Residue(), int]

BioHelpers_FABER.bio_mod.fixAdditionalOxygen(corruptedFile: str, structFile: str) None

Delete additional Atom in corrupted File, SimRNA specific function

Parameters:
  • corruptedFile (str) – Filename/Path of corrupted File

  • structFile (str) – Filename/Path of reference File

Raises:

Exception – If Files do not exist!

BioHelpers_FABER.bio_mod.get_reference_coordinates(res1: Residue, res2: Residue, reference: str = 'nitrogen', res_type='rna') tuple[ndarray, ndarray]

Returns two sets of reference coordinates for each residue.

Parameters:
  • res1 (Residue) – Residue 1

  • res2 (Residue) – Residue 2

  • reference (str, optional) – Which reference atom shall be taken? possible: “nitrogen” or “nearest, defaults to “nitrogen”

  • res_type (str, optional) – Type of biomolecule, defaults to “rna”

Returns:

Reference coordinates for both residues

Return type:

tuple[np.ndarray, np.ndarray]

BioHelpers_FABER.bio_mod.get_reference_coordinates_nearest(res1: Residue, res2: Residue) tuple[ndarray, ndarray]

Returns reference coordinates for each of the residues. Point of reference: nearest heavy atom pair.

Parameters:
  • res1 (Residue) – Residue1

  • res2 (Residue) – Residue2

Returns:

Reference coordinates for both residues

Return type:

tuple[np.ndarray, np.ndarray]

BioHelpers_FABER.bio_mod.get_reference_coordinates_nitrogen(res1: Residue, res2: Residue) tuple[ndarray, ndarray]

Returns reference coordinates for each of the residues. Point of reference: Nitrogen atom.

Parameters:
  • res1 (Residue) – Residue1

  • res2 (Residue) – Residue2

Returns:

Reference coordinates for both residues

Return type:

tuple[np.ndarray, np.ndarray]

BioHelpers_FABER.bio_mod.get_sequence_position(filename: str) tuple

Show all the non-het Residues in a given pdb file and the associated position in the native molecule.

Parameters:

filename (str) – Filename of PDB

Returns:

Tuple of list with positions and sequence as string

Return type:

tuple

BioHelpers_FABER.bio_mod.is_rnaRes(residue) bool

Checks if a residue is not an AminoAcid and other stuff

Parameters:

residue (Bio.PDB.Residue.Residue()) – Input residue

Returns:

Check

Return type:

bool

BioHelpers_FABER.bio_mod.listToArr(c: list) ndarray

Converts contact map from list format to numpy.array

Parameters:

list (list) – List with entries [x,y,val], do not need to have \((L \times L)\) entries!

Returns:

Numpy Array of size \(\left( (\max_c(x)+1) \times (\max_c(y)+1)\right)\)

Return type:

np.array

BioHelpers_FABER.bio_mod.numberOfResidues(file: str) int

Counts the number of residues for a given PDB file

Parameters:

file (str) – Input Filename in PDB format

Returns:

Number of residues

Return type:

int

BioHelpers_FABER.bio_mod.renumber_pdb(file: str) None

Renumbers all present residues starting by one Caveat: Only use for cleaned chains!

Parameters:

file (str) – Input file, Output will be saved under file+”renum.pdb”

BioHelpers_FABER.bio_mod.residue_coord(residue, res_type: str = 'rna', reference: str = 'nitrogen') ndarray

Representative Coordinate of a residue, for proteins C alpha for RNA the coordinate of the nitrogen atom Deprecated!

Parameters:
  • residue (Bio.PDB.Residue.Residue()) – input residue object

  • res_type (str, optional) – residue type, defaults to “rna”

Raises:

RuntimeError – if residue is neither rna or protein

Returns:

coordinate of the representative

Return type:

numpy.array

BioHelpers_FABER.bio_mod.sequFromPDB(filename: str) str

Returns residues Sequence from PDB File, only residues in PDB!

Parameters:

filename (str) – Filename/Path of the PDB File

Returns:

RNA Sequence of the given PDB File

Return type:

str

BioHelpers_FABER.bio_mod.triangularMatrix(matrix: ndarray) ndarray

Cuts off from the diagonal

Parameters:

matrix (np.array) – input matrix

Returns:

output matrix

Return type:

np.array