forgi.graph.bulge_graph module¶
bulge_graph.py: A graph representation of RNA secondary structure based on its decomposition into primitive structure types: stems, hairpins, interior loops, multiloops, etc…
-
class
forgi.graph.bulge_graph.
BulgeGraph
(graph_construction, seq_obj, name=None, infos=None, _dont_split=False)[source]¶ Bases:
forgi.graph._basegraph.BaseGraph
A bulge graph object.
Users of the forgi library should use the provided factory functions instead of invoking BulgeGraph() directly!
Parameters: - graph_construction – A forgi.graph._graph_construction.Graph-construction instance that contains defines and edges.
- seq_obj – A forgi.graph.sequence.Sequence instance
- name – A string
- infos – A dict of lists
-
adjacent_stem_pairs_iterator
()[source]¶ Iterate over all pairs of stems which are separated by some element.
This will always yield triples of the form (s1, e1, s2) where s1 and s2 are the stem identifiers and e1 denotes the element that separates them.
-
are_adjacent_stems
(s1, s2, multiloops_count=True)[source]¶ Are two stems separated by only one element. If multiloops should not count as edges, then the appropriate parameter should be set.
Parameters: - s1 – The name of the first stem
- s2 – The name of the second stem
- multiloops_count – Whether to count multiloops as an edge linking two stems
-
backbone_breaks_after
¶
-
buildorder_of
(element)[source]¶ Returns the index into build_order where the element FIRST appears.
Parameters: element – Element name, a string. e.g. “m0” or “s0” Returns: An index into self.build_order or None, if the element is not part of the build_order (e.g. hairpin loops)
-
connected
(n1, n2)[source]¶ Are the nucleotides n1 and n2 connected?
Parameters: - n1 – A node in the BulgeGraph
- n2 – Another node in the BulgeGraph
Returns: True or False indicating whether they are connected.
-
connection_ends
(connection_type)[source]¶ Find out which ends of the stems are connected by a particular angle type.
Parameters: connection_type – The angle type, as determined by which corners of a stem are connected Returns: (s1e, s2b) 0 means the side of the stem with the lowest nucleotide, 1 the other side
-
connection_type
(define, connections)[source]¶ Classify the way that two stems are connected according to the type of bulge that separates them.
Potential angle types for single stranded segments, and the ends of the stems they connect:
1 2 (1, 1) #pseudoknot 1 0 (1, 0) 3 2 (0, 1) 3 0 (0, 0) Parameters: - define – The name of the bulge separating the two stems
- connections – The two stems and their separation
Returns: INT connection type
positive values mean forward (from the connected stem starting at the lower nucleotide number to the one starting at the higher nuc. number) negative values mean backwards. 1 interior loop 2 first multi-loop segment of normal multiloops and most pseudoknots 3 middle segment of a normal multiloop 4 last segment of normal multiloops and most pseudoknots 5 middle segments of pseudoknots
-
define_residue_num_iterator
(node, adjacent=False, seq_ids=False)[source]¶ Iterate over the residue numbers that belong to this node.
Parameters: node – The name of the node
-
defines
= None¶ The coarse grain element definitions: Keys are for example ‘s1’/ ‘m2’/ ‘h3’/ ‘f1’/ ‘t1’ Values are the positions in the sequence (1D-coordinate) of start , end, …
-
describe_multiloop
(multiloop)[source]¶ Parameters: multiloop – An iterable of nodes (only “m”, “t” and “f” elements)
-
element_length
(key)[source]¶ Get the number of residues that are contained within this element.
Parameters: key – The name of the element.
-
elements_to_nucleotides
(elements)[source]¶ Convert a list of element names to a list of nucleotide numbers.
Remove redundant entries.
-
find_bulge_loop
(vertex, max_length=4)[source]¶ Find a set of nodes that form a loop containing the given vertex and being no greater than max_length nodes long.
Parameters: - vertex – The vertex to start the search from.
- max_length – Only fond loops that contain no more then this many elements
Returns: A list of the nodes in the loop.
-
flanking_nuc_at_stem_side
(s, side)[source]¶ Return the nucleotide number that is next to the stem at the given stem side.
Parameters: side – 0, 1, 2 or 3, as returned by self._get_sides_plus Returns: The nucleotide position. If the stem has no neighbor at that side, 0 or self.seq_length+1 is returned instead.
-
floop_iterator
()[source]¶ Yield the name of the 5’ prime unpaired region if it is present in the structure.
-
classmethod
from_bg_file
(bg_file)[source]¶ Load a BulgeGraph from a file containing a text-based representation.
Parameters: bg_file – The filename. Returns: A bulge Graph.
-
classmethod
from_bg_string
(bg_str)[source]¶ Create a BulgeGraph from the string created by the method to_bg_string.
Parameters: bg_str – The string representation of a BugleGraph. Returns: A BulgeGraphObject
-
classmethod
from_bpseq_str
(bpseq_str, breakpoints=(), name=None, dissolve_length_one_stems=False, remove_pseudoknots=False)[source]¶ Create the graph from a string listing the base pairs.
The string should be formatted like so:
1 G 115 2 A 0 3 A 0 4 U 0 5 U 112 6 G 111Parameters: - bpseq_str – The string, containing newline characters.
- breakpoints – A list of positions, after which there is a backbone break.
Returns: A new BulgeGraph object.
-
classmethod
from_ct_string
(ct_string, dissolve_length_one_stems=False, remove_pseudoknots=False)[source]¶ Create the graph from a string holding a connectivity table. See http://x3dna.org/highlights/dssr-derived-secondary-structure-in-ct-format
-
classmethod
from_dotbracket
(dotbracket_str, seq=None, name=None, dissolve_length_one_stems=False, remove_pseudoknots=False)[source]¶ Create a BulgeGraph object from a dotbracket string.
Parameters: - dotbracket_str – A string
- seq – A string, with the same length as the dotbracket string, a forgi.graph.sequence.Sequence instance or None. If it is None, the sequence will be all ‘N’s
- name – Optional string to use as molecule name.
-
classmethod
from_fasta
(filename, dissolve_length_one_stems=False)[source]¶ Return a list of BulgeGraphs from a fasta file.
-
classmethod
from_fasta_text
(fasta_text, dissolve_length_one_stems=False, remove_pseudoknots=False)[source]¶ Create one or more Bulge Graphs from some fasta text.
Returns: A list of BulgeGraphs
-
get_angle_type
(bulge, allow_broken=False)[source]¶ Return what type of angle this bulge is, based on the way this would be built using a breadth-first traversal along the minimum spanning tree.
Parameters: allow_broken – How to treat broken multiloop segments.
- False (default): Return None
- True: Return the angle type according to the build-order (i.e. from the first built stem to the last-built stem)
-
get_bulge_dimensions
(bulge, with_missing=False)[source]¶ Return the dimensions of the bulge.
If it is single stranded it will be (x, -1) for h,t,f or (x, 1000) for m. Otherwise it will be (x, y).
Parameters: bulge – The name of the bulge. Returns: A pair containing its dimensions
-
get_connected_residues
(s1, s2, bulge=None)[source]¶ Get the nucleotides which are connected by the element separating s1 and s2. They should be adjacent stems.
Parameters: - s2 (s1,) – 2 adjacent stems
- bulge – Optional: The bulge seperating the two stems. If s1 and s2 are connected by more than one element, this has to be given, or a ValueError will be raised. (useful for pseudoknots)
The connected nucleotides are those which are spanned by a single interior loop or multiloop. In the case of an interior loop, this function will return a list of two tuples and in the case of multiloops if it will be a list of one tuple.
If the two stems are not separated by a single element, then return an empty list.
-
get_define_seq_str
(elem, adjacent=False)[source]¶ Get a list containing the sequences for the given define.
Parameters: - d – The element name for which to get the sequences
- adjacent – Boolean. Include adjacent nucleotides (for single stranded RNA only)
Returns: A list containing the sequence(s) corresponding to the defines
-
get_domains
()[source]¶ Get secondary structure domains.
- Currently domains found are:
- multiloops (without any connected stems)
- rods: stretches of stems + interior loops (without branching), with trailing hairpins
- pseudoknots
-
get_elem
(position)[source]¶ Get the secondary structure element from a nucleotide position
Parameters: position – An integer or a fgr.RESID instance, describing the nucleotide number.
-
get_flanking_handles
(bulge_name, side=0)[source]¶ Get the indices of the residues for fitting bulge regions.
So if there is a loop like so (between residues 7 and 16):
(((...)))) 7890123456 ^ ^
Then residues 9 and 13 will be used as the handles against which to align the fitted region.
In the fitted region, the residues (2,6) will be the ones that will be aligned to the handles.
Returns: (orig_chain_res1, orig_chain_res1, flanking_res1, flanking_res2)
-
get_flanking_region
(bulge_name, side=0)[source]¶ If a bulge is flanked by stems, return the lowest residue number of the previous stem and the highest residue number of the next stem.
Parameters: - bulge_name – The name of the bulge
- side – The side of the bulge (indicating the strand)
-
get_flanking_sequence
(bulge_name, side=0)[source]¶ Return the sequence of a bulge and the adjacent strand of the adjacent stems.
Parameters: - bulge_name – The name of the bulge, e.g. ‘h0’
- side – Used for interior loops: The strand of interest (0=forward, 1=backward)
-
get_length
(vertex)[source]¶ Get the minimum length of a vertex.
If it’s a stem, then the result is its length (in base pairs).
If it’s a bulge, then the length is the smaller of it’s dimensions.
Parameters: vertex – The name of the vertex.
-
get_link_direction
(stem1, stem2, bulge=None)[source]¶ Get the direction in which stem1 and stem2 are linked (by the bulge)
Returns: 1 if the bulge connects stem1 with stem2 in forward direction (5’ to 3’) -1 otherwise
-
get_mst
()[source]¶ Create a minimum spanning tree from this BulgeGraph. This is useful for constructing a structure where each section of a multiloop is sampled independently and we want to introduce a break at the largest multiloop section.
-
get_multiloop_nucleotides
(multiloop_loop)[source]¶ Return a list of nucleotides which make up a particular multiloop.
Parameters: multiloop_loop – The elements which make up this multiloop Returns: A list of nucleotides
-
get_multiloop_side
(m)[source]¶ Find out which strand a multiloop is on. An example of a situation in which the loop can be on both sides can be seen in the three-stemmed structure below:
(.().().)In this case, the first multiloop section comes off of the 5’ strand of the first stem (the prior stem is always the one with a lower numbered first residue). The second multiloop section comess of the 3’ strand of the second stem and the third loop comes off the 3’ strand of the third stem.
-
get_next_ml_segment
(ml_segment)[source]¶ Get the adjacent multiloop-segment (or 3’ loop) next to the 3’ side of ml_segment.
If there is no other single stranded RNA after the stem, the backbone must end there. In that case return None.
-
get_node_dimensions
(node, with_missing=False)[source]¶ Return the dimensions of a node.
If the node is a stem, then the dimensions will be l where l is the length of the stem.
Otherwise, see get_bulge_dimensions(node)
Parameters: node – The name of the node Returns: A pair containing its dimensions
-
get_position_in_element
(resnum)[source]¶ Return the position of the residue in the cg-element and the length of the element.
Parameters: resnum – An integer. The 1-based position in the total sequence. Returns: A tuple (p,l) where p is the position of the residue in the cg-element (0-based for stems, 1-based for loops) and p/l gives a measure for the position of the residue along the cg-element’s axis (0 means at cg.coords[elem][0], 1 at cg.coords[elem][1] and 0.5 exactely in the middle of these two. )
-
get_resseqs
(define, seq_ids=True)[source]¶ Return the pdb ids of the nucleotides in this define.
Parameters: define – The name of this element. Param: Return a tuple of two arrays containing the residue ids on each strand
-
get_side_nucleotides
(stem, side)[source]¶ Get the nucleotide numbers on the given side of them stem. Side 0 corresponds to the 5’ end of the stem whereas as side 1 corresponds to the 3’ side of the stem.
Parameters: - stem – The name of the stem
- side – Either 0 or 1, indicating the 5’ or 3’ end of the stem
Returns: A tuple of the nucleotide numbers on the given side of the stem.
-
get_sides
(s1, b)[source]¶ Get the side of s1 that is next to b.
s1e -> s1b -> b
Parameters: - s1 – The stem.
- b – The bulge.
Returns: A tuple indicating which side is the one next to the bulge and which is away from the bulge.
-
get_stem_edge
(stem, pos)[source]¶ Returns the side (strand) of the stem that position is on.
Side 0 corresponds to the 5’ pairing residues in the stem whereas as side 1 corresponds to the 3’ pairing residues in the stem. :param stem: The name of the stem :param pos: A position in the stem :return: 0 if pos on 5’ edge of stem
-
get_strand
(multiloop)[source]¶ Get the strand on which this multiloop is located.
Parameters: multiloop – The name of the multiloop Returns: 0 for being on the lower numbered strand and 1 for being on the higher numbered strand.
-
is_loop_pseudoknot
(loop)[source]¶ Is a particular loop a pseudoknot?
Parameters: loop – A list of elements that are part of the loop (only m,f and t elements). Returns: Either True or false
-
is_single_stranded
(node)[source]¶ Does this node represent a single-stranded region?
Single stranded regions are five-prime and three-prime unpaired regions, multiloops, and hairpins
Warning
Interior loops are never considered single stranded by this function.
Parameters: node – The name of the node Returns: True if yes, False if no
-
iter_elements_along_backbone
(startpos=1)[source]¶ Iterate all coarse grained elements along the backbone.
Note that stems are yielded twice (for forward and backward strand). Interior loops may be yielded twice or once (if one side has no nucleotide)
0-length multiloop-segments are correctly yielded.
Parameters: startpos – The nucleotide position at which to start Yields: Coarse grained element names, like “s0”, “i0”
-
iterate_over_seqid_range
(start_id, end_id)[source]¶ Iterate over the seq_ids between the start_id and end_id.
-
junctions
¶ Get all regular multiloops of this structure.
Returns: A list of tuples of multiloop segments. Each tuple contains the segments of one regular (i.e. not pseudoknotted) multiloop.
-
min_max_bp_distance
(e1, e2)[source]¶ Get the minimum and maximum base pair distance between these two elements.
If they are connected, the minimum distance will be 1. The maximum will be 1 + length(e1) + length(e1)
Parameters: - e1 – The name of the first element
- e2 – The name of the second element
Returns: A tuple containing the minimum and maximum distance between the two elements.
-
nucleotides_to_elements
(nucleotides)[source]¶ Convert a list of nucleotides (nucleotide numbers) to element names.
Remove redundant entries and return a set.
- ..note::
- Use
self.get_node_from_residue_num
if you have only a single nucleotide number.
-
pairing_partner
(nucleotide_number)[source]¶ Return the base pairing partner of the nucleotide at position nucleotide_number. If this nucleotide is unpaired, return None.
Parameters: nucleotide_number – The position of the query nucleotide in the sequence or a RESID instance. Returns: The number of the nucleotide base paired with the one at position nucleotide_number.
-
pseudoknotted_basepairs
(ignore_basepairs=())[source]¶ Return a list of base-pairs that will be removed to remove pseudoknots using the knotted2nested.py script.
Parameters: ignore_basepairs – An optional list of basepairs that knested2knotted will not consider present in the structure. Returns: A list of base-pairs that can be removed.
-
random_subgraph
(subgraph_length=None)[source]¶ Return a random subgraph of this graph.
Returns: A list containing a the nodes comprising a random subgraph
-
rods
¶
-
seq
¶
-
seq_id_to_pos
(seq_id)[source]¶ Convert a pdb seq_id to a 1-based nucleotide position
Parameters: seq_id – An instance of RESID
-
seq_length
¶
-
shortest_bg_loop
(vertex)[source]¶ Find a shortest loop containing this node. The vertex should be a multiloop.
Parameters: vertex – The name of the vertex to find the loop. Returns: A list containing the elements in the shortest cycle.
-
shortest_path
(e1, e2)[source]¶ Determine the shortest path between two elements (e1, e2) along the secondary structure.
Parameters: - e1 – The name of the first element
- e2 – The name of the second element
Returns: A list of the element names along the shortest path
-
sorted_edges_for_mst
()[source]¶ Keep track of all linked nodes. Used for the generation of the minimal spanning tree.
-
sorted_element_iterator
()[source]¶ Iterate over a list of the coarse grained elements sorted by the lowest numbered nucleotide in each stem. Multiloops with no nucleotide coordinates come last.
-
sorted_stem_iterator
()[source]¶ Iterate over a list of the stems sorted by the lowest numbered nucleotide in each stem.
-
ss_distance
(e1, e2)[source]¶ Calculate the distance between two elements (e1, e2) along the secondary structure. The distance only starts at the edge of each element, and is the closest distance between the two elements.
Parameters: - e1 – The name or nucleotide number of the first element
- e2 – The name or nucleotide number of the second element
Returns: The integer distance between the two elements / residues along the secondary structure. (if a element is given, we use its corner for the distance, otherwise the exact nucleotide)
-
stem_length
(key)[source]¶ Get the length of a particular element. If it’s a stem, it’s equal to the number of paired bases. If it’s an interior loop, it’s equal to the number of unpaired bases on the strand with less unpaired bases. If it’s a multiloop, then it’s the number of unpaired bases.
-
stem_side_vres_to_resn
(stem, side, vres)[source]¶ Return the residue number given the stem name, the strand (side) it’s on and the virtual residue number.
-
tloop_iterator
()[source]¶ Yield the name of the 3’ prime unpaired region if it is present in the structure.
-
to_dotbracket_string
(include_missing=False)[source]¶ Convert the BulgeGraph representation to a dot-bracket string and return it.
Returns: A dot-bracket representation of this BulgeGraph
-
to_element_string
(with_numbers=False)[source]¶ Create a string similar to dotbracket notation that identifies what type of element is present at each location.
For example the following dotbracket:
..((..))..
Should yield the following element string:
ffsshhsstt
Indicating that it begins with a fiveprime region, continues with a stem, has a hairpin after the stem, the stem continues and it is terminated by a threeprime region.
Parameters: with_numbers – show the last digit of the element id in a second line.:
(((.(((...))))))
Could result in:
sssissshhhssssss 0000111000111000
Indicating that the first stem is named ‘s0’, followed by ‘i0’,’ s1’, ‘h0’, the second strand of ‘s1’ and the second strand of ‘s0’
-
to_fasta_string
(include_missing=False)[source]¶ Output the BulgeGraph representation as a fast string of the format:
>id AACCCAA ((...))
Parameters: include_missing – Whether or not residues for which no structure information is present should be included in the output.
-
to_networkx
()[source]¶ Convert this graph to a networkx representation. This representation will contain all of the nucleotides as nodes and all of the base pairs as edges as well as the adjacent nucleotides.
-
to_pair_table
()[source]¶ Create a pair table from the list of elements.
The first element in the returned list indicates the number of nucleotides in the structure.
i.e. [5,5,4,0,2,1]
-
to_pair_tuples
(remove_basepairs=None)[source]¶ Create a list of tuples corresponding to all of the base pairs in the structure. Unpaired bases will be shown as being paired with a nucleotide numbered 0.
i.e. [(1,5),(2,4),(3,0),(4,2),(5,1)]
Parameters: remove_basepairs – A list of 2-tuples containing basepairs that should be removed
-
transformed
¶