forgi.threedee.classification.aminor module¶

class forgi.threedee.classification.aminor.AMinorClassifier(kernel='linear', bandwidth=0.3, symmetric=True, p_I=0.05)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

A classifier that predicts A-Minor interactions based on the following formula:

\[P(I|geo)=f(geo|I)/f(geo)*P(I)\]

where \(f\) is a probability density and \(P\) is a probability and \(I\) means interaction.

Since there are no interactions with more than 30 Angstrom distance, interactions require an A in the loop sequence and estimating numerator and denominator seperately could lead to probabilities greater than 1, we use the following formula:

\[P(I| (geo, d<30, A in seq)) = num/denom\]

where .. math:

num=f(geo|I)*P(I|(d<30, A \in seq))

denom=num+f(geo|(\not I, d<30))*(1-P(I|(d<30, A \in seq))

We estimate the probability densities as kernel density estimates. For \(f(geo|I)\) we use annotations created by FR3D-searches for all 4 types of A-minor motifs. We assume that \(f(geo|(\not I, A \in seq))== f(geo|(\not I, A \in seq))\) and that FR3D might miss some true interactions. Thus we estimate \(f(geo|(\not I, A \in seq, d<30))\) as \(f(geo|(\not I, A \notin seq, d<30))\)

Since both densities are normalized, it does not matter that we use a different number of datapoints for their estimation.

We estimate P(I|(d<30, A in seq)) as the number of occurrences where a loop with A is in an A-minor interaction, over the number of all loop-stem pairs with less than 30 angstrom distance and an A in the sequence.

fit(X, y)[source]¶

Train the model.

Parameters:

Parameters:	X – A Nx3 array, where the features are distance(Angstrom)/10, angle1(rad), angle2(rad) The distance is the closest distance between the two line segments (i.e. coarse grained elementts) angle1 is the angle between the line along the stem vector and the line along the shortest connection between the two elements. A an angle between two straight lines, it is defined between 0 and 90 degrees. angle2 is the angle between the connecting vector (pointing from the stem to the loop), projected onto the plane normal to the stem direction and the twist vector (location of minor groove) at the point closest to the interaction. As an angle between two vectors, it is defined between 0 and 180 degrees. y – An array of length N. 0 means no interaction, 1 means interaction.

X – A Nx3 array, where the features are distance(Angstrom)/10, angle1(rad), angle2(rad) The distance is the closest distance between the two line segments (i.e. coarse grained elementts) angle1 is the angle between the line along the stem vector and the line along the shortest connection between the two elements. A an angle between two straight lines, it is defined between 0 and 90 degrees. angle2 is the angle between the connecting vector (pointing from the stem to the loop), projected onto the plane normal to the stem direction and the twist vector (location of minor groove) at the point closest to the interaction. As an angle between two vectors, it is defined between 0 and 180 degrees.
y – An array of length N. 0 means no interaction, 1 means interaction.

predict(X)[source]¶

predict_proba(X)[source]¶

score(X, y)[source]¶: The average between specificity and sensitivity

set_params(**kwargs)[source]¶

forgi.threedee.classification.aminor.ANGLEWEIGHT = 10¶

This module contains code for classifying a coarse-grained geometry as A-Minor interaction.

Warning

This is intended for low-resolution data , as it does not take the orientation of individual bases into account. If you have all-atom data, dedicated tools like FR3D will be more accurate.

If you just want to classify interactions in a given structure, you only need the functions classify_interaction or all_interactions.

To train your own version of the classifier, modify its parameters or perform cross-validation, use the AMinorClassifier.

Access the default trainings data with the get_trainings_data(loop_type) function.

forgi.threedee.classification.aminor.all_interactions(cg, clfs=None)[source]¶

Get a list of all predicted A-Minor interactions in a cg-object.

This is more efficient than using classify_interaction iteratively, because it uses vectorization.

Parameters:	clfs – A dictionary {loop_type: AMinorClassifier} where loop_type is one of “i”, “h”, “m”. If clfs is None or a key is missing, uses the default pretrained classifier.
Returns:	A list of tuples (loop, stem)

forgi.threedee.classification.aminor.classify_interaction(cg, loop, stem=None, clf=None)[source]¶

Returns the interaction pair loop, stem as a tuple or False if no interaction exists.

Parameters:

Parameters:	cg – The CoarseGrainRNA loop – The loop name, e.g. “i0” stem – A stem name, e.g. “s0” to consider interactions of loop with another stem as False and only an interaction with this stem as True. ..warning:: Our statistical modelling allows for at most 1 interaction per loop. This means that we have calculate the interaction probability of this loop with all stems, even if stem is given. If another stem has a higher interaction probability than the given stem, this function will return False, regardless of the interaction probability stem-loop. clf – A trained AMinorClassifier or None (use default classifier for loop type)

cg – The CoarseGrainRNA
loop – The loop name, e.g. “i0”
stem –

A stem name, e.g. “s0” to consider interactions of loop with another stem as False

and only an interaction with this stem as True.

..warning:: Our statistical modelling allows for at most 1 interaction per loop.

This means that we have calculate the interaction probability of this loop with all stems, even if stem is given. If another stem has a higher interaction probability than the given stem, this function will return False, regardless of the interaction probability stem-loop.
clf – A trained AMinorClassifier or None (use default classifier for loop type)

..note:: all_interactions is more efficient, if you are interested in all loop-stem pairs.

Returns:	A tuple (loop, stem) or False

forgi.threedee.classification.aminor.df_to_data_labels(df, loop_type)[source]¶

Create the trainings data as two arrays X and y (or data and labels) from the initial dataframe

Returns:	X, y

forgi.threedee.classification.aminor.get_loop_flexibility(cg, loop)[source]¶: Unused. We tried to see if the length of the loop vs # bases had an effect on ointeraction probability.

forgi.threedee.classification.aminor.get_relative_orientation(cg, loop, stem)[source]¶

Return how loop is related to stem in terms of three parameters.

The stem is the receptor of a potential A-Minor interaction, whereas the loop is the donor.

The 3 parameters are:

Distance between the closest points of the two elements

The angle between the stem and the vector between the two

The angle between the minor groove of l2 and the projection of the vector between stem and loop onto the plane normal to the stem direction.

forgi.threedee.classification.aminor.get_trainings_data(loop)[source]¶

forgi.threedee.classification.aminor.loop_potential_interactions(cg, loop, domain=None)[source]¶: Iterate over all stems and return those loop-stem pairs that will be passed to the AMinor classification and are not ruled out beforehand.

forgi.threedee.classification.aminor.potential_interactions(cg, loop_type, domain=None)[source]¶

Returns:	A tuple `geos`, `labels`. `geos` is an Nx3 array, where N is the number of potential interactions and the inner dimension is dist, angle1, angle2. `geos` can be passed to the AMinor classifier. `labels` is an Nx2 array, where the inner dimension is loopname, stemname

forgi 2.0.0 documentation

forgi.threedee.classification.aminor module¶