forgi 2.0.0 documentation

«  forgi.graph package   ::   Contents   ::   forgi.graph.residue module  »

forgi.graph.bulge_graph module

bulge_graph.py: A graph representation of RNA secondary structure based on its decomposition into primitive structure types: stems, hairpins, interior loops, multiloops, etc…

class forgi.graph.bulge_graph.BulgeGraph(graph_construction, seq_obj, name=None, infos=None, _dont_split=False)[source]

Bases: forgi.graph._basegraph.BaseGraph

A bulge graph object.

Users of the forgi library should use the provided factory functions instead of invoking BulgeGraph() directly!

Parameters:
  • graph_construction – A forgi.graph._graph_construction.Graph-construction instance that contains defines and edges.
  • seq_obj – A forgi.graph.sequence.Sequence instance
  • name – A string
  • infos – A dict of lists
add_info(key, value)[source]
adjacent_stem_pairs_iterator()[source]

Iterate over all pairs of stems which are separated by some element.

This will always yield triples of the form (s1, e1, s2) where s1 and s2 are the stem identifiers and e1 denotes the element that separates them.

are_adjacent_stems(s1, s2, multiloops_count=True)[source]

Are two stems separated by only one element. If multiloops should not count as edges, then the appropriate parameter should be set.

Parameters:
  • s1 – The name of the first stem
  • s2 – The name of the second stem
  • multiloops_count – Whether to count multiloops as an edge linking two stems
backbone_breaks_after
buildorder_of(element)[source]

Returns the index into build_order where the element FIRST appears.

Parameters:element – Element name, a string. e.g. “m0” or “s0”
Returns:An index into self.build_order or None, if the element is not part of the build_order (e.g. hairpin loops)
connected(n1, n2)[source]

Are the nucleotides n1 and n2 connected?

Parameters:
  • n1 – A node in the BulgeGraph
  • n2 – Another node in the BulgeGraph
Returns:

True or False indicating whether they are connected.

connected_stem_iterator()[source]

Iterate over all pairs of connected stems.

connection_ends(connection_type)[source]

Find out which ends of the stems are connected by a particular angle type.

Parameters:connection_type – The angle type, as determined by which corners of a stem are connected
Returns:(s1e, s2b) 0 means the side of the stem with the lowest nucleotide, 1 the other side
connection_type(define, connections)[source]

Classify the way that two stems are connected according to the type of bulge that separates them.

Potential angle types for single stranded segments, and the ends of the stems they connect:

1 2 (1, 1) #pseudoknot
1 0 (1, 0)  
3 2 (0, 1)  
3 0 (0, 0)  
Parameters:
  • define – The name of the bulge separating the two stems
  • connections – The two stems and their separation
Returns:

INT connection type

positive values mean forward (from the connected stem starting at the lower nucleotide number to the one starting at the higher nuc. number)
negative values mean backwards.
1 interior loop
2 first multi-loop segment of normal multiloops and most pseudoknots
3 middle segment of a normal multiloop
4 last segment of normal multiloops and most pseudoknots
5 middle segments of pseudoknots

define_a(elem)[source]
define_residue_num_iterator(node, adjacent=False, seq_ids=False)[source]

Iterate over the residue numbers that belong to this node.

Parameters:node – The name of the node
defines = None

The coarse grain element definitions: Keys are for example ‘s1’/ ‘m2’/ ‘h3’/ ‘f1’/ ‘t1’ Values are the positions in the sequence (1D-coordinate) of start , end, …

describe_multiloop(multiloop)[source]
Parameters:multiloop – An iterable of nodes (only “m”, “t” and “f” elements)
element_length(key)[source]

Get the number of residues that are contained within this element.

Parameters:key – The name of the element.
elements_to_nucleotides(elements)[source]

Convert a list of element names to a list of nucleotide numbers.

Remove redundant entries.

find_bulge_loop(vertex, max_length=4)[source]

Find a set of nodes that form a loop containing the given vertex and being no greater than max_length nodes long.

Parameters:
  • vertex – The vertex to start the search from.
  • max_length – Only fond loops that contain no more then this many elements
Returns:

A list of the nodes in the loop.

find_mlonly_multiloops()[source]
flanking_nuc_at_stem_side(s, side)[source]

Return the nucleotide number that is next to the stem at the given stem side.

Parameters:side – 0, 1, 2 or 3, as returned by self._get_sides_plus
Returns:The nucleotide position. If the stem has no neighbor at that side, 0 or self.seq_length+1 is returned instead.
floop_iterator()[source]

Yield the name of the 5’ prime unpaired region if it is present in the structure.

classmethod from_bg_file(bg_file)[source]

Load a BulgeGraph from a file containing a text-based representation.

Parameters:bg_file – The filename.
Returns:A bulge Graph.
classmethod from_bg_string(bg_str)[source]

Create a BulgeGraph from the string created by the method to_bg_string.

Parameters:bg_str – The string representation of a BugleGraph.
Returns:A BulgeGraphObject
classmethod from_bpseq_str(bpseq_str, breakpoints=(), name=None, dissolve_length_one_stems=False, remove_pseudoknots=False)[source]

Create the graph from a string listing the base pairs.

The string should be formatted like so:

1 G 115 2 A 0 3 A 0 4 U 0 5 U 112 6 G 111
Parameters:
  • bpseq_str – The string, containing newline characters.
  • breakpoints – A list of positions, after which there is a backbone break.
Returns:

A new BulgeGraph object.

classmethod from_ct_string(ct_string, dissolve_length_one_stems=False, remove_pseudoknots=False)[source]

Create the graph from a string holding a connectivity table. See http://x3dna.org/highlights/dssr-derived-secondary-structure-in-ct-format

classmethod from_dotbracket(dotbracket_str, seq=None, name=None, dissolve_length_one_stems=False, remove_pseudoknots=False)[source]

Create a BulgeGraph object from a dotbracket string.

Parameters:
  • dotbracket_str – A string
  • seq – A string, with the same length as the dotbracket string, a forgi.graph.sequence.Sequence instance or None. If it is None, the sequence will be all ‘N’s
  • name – Optional string to use as molecule name.
classmethod from_fasta(filename, dissolve_length_one_stems=False)[source]

Return a list of BulgeGraphs from a fasta file.

classmethod from_fasta_text(fasta_text, dissolve_length_one_stems=False, remove_pseudoknots=False)[source]

Create one or more Bulge Graphs from some fasta text.

Returns:A list of BulgeGraphs
get_angle_type(bulge, allow_broken=False)[source]

Return what type of angle this bulge is, based on the way this would be built using a breadth-first traversal along the minimum spanning tree.

Parameters:allow_broken

How to treat broken multiloop segments.

  • False (default): Return None
  • True: Return the angle type according to the build-order (i.e. from the first built stem to the last-built stem)
get_bulge_dimensions(bulge, with_missing=False)[source]

Return the dimensions of the bulge.

If it is single stranded it will be (x, -1) for h,t,f or (x, 1000) for m. Otherwise it will be (x, y).

Parameters:bulge – The name of the bulge.
Returns:A pair containing its dimensions
get_connected_residues(s1, s2, bulge=None)[source]

Get the nucleotides which are connected by the element separating s1 and s2. They should be adjacent stems.

Parameters:
  • s2 (s1,) – 2 adjacent stems
  • bulge – Optional: The bulge seperating the two stems. If s1 and s2 are connected by more than one element, this has to be given, or a ValueError will be raised. (useful for pseudoknots)

The connected nucleotides are those which are spanned by a single interior loop or multiloop. In the case of an interior loop, this function will return a list of two tuples and in the case of multiloops if it will be a list of one tuple.

If the two stems are not separated by a single element, then return an empty list.

get_define_seq_str(elem, adjacent=False)[source]

Get a list containing the sequences for the given define.

Parameters:
  • d – The element name for which to get the sequences
  • adjacent – Boolean. Include adjacent nucleotides (for single stranded RNA only)
Returns:

A list containing the sequence(s) corresponding to the defines

get_domains()[source]

Get secondary structure domains.

Currently domains found are:
  • multiloops (without any connected stems)
  • rods: stretches of stems + interior loops (without branching), with trailing hairpins
  • pseudoknots
get_elem(position)[source]

Get the secondary structure element from a nucleotide position

Parameters:position – An integer or a fgr.RESID instance, describing the nucleotide number.
get_flanking_handles(bulge_name, side=0)[source]

Get the indices of the residues for fitting bulge regions.

So if there is a loop like so (between residues 7 and 16):

(((...))))
7890123456
  ^   ^

Then residues 9 and 13 will be used as the handles against which to align the fitted region.

In the fitted region, the residues (2,6) will be the ones that will be aligned to the handles.

Returns:(orig_chain_res1, orig_chain_res1, flanking_res1, flanking_res2)
get_flanking_region(bulge_name, side=0)[source]

If a bulge is flanked by stems, return the lowest residue number of the previous stem and the highest residue number of the next stem.

Parameters:
  • bulge_name – The name of the bulge
  • side – The side of the bulge (indicating the strand)
get_flanking_sequence(bulge_name, side=0)[source]

Return the sequence of a bulge and the adjacent strand of the adjacent stems.

Parameters:
  • bulge_name – The name of the bulge, e.g. ‘h0’
  • side – Used for interior loops: The strand of interest (0=forward, 1=backward)
get_length(vertex)[source]

Get the minimum length of a vertex.

If it’s a stem, then the result is its length (in base pairs).

If it’s a bulge, then the length is the smaller of it’s dimensions.

Parameters:vertex – The name of the vertex.

Get the direction in which stem1 and stem2 are linked (by the bulge)

Returns:1 if the bulge connects stem1 with stem2 in forward direction (5’ to 3’) -1 otherwise
get_mst()[source]

Create a minimum spanning tree from this BulgeGraph. This is useful for constructing a structure where each section of a multiloop is sampled independently and we want to introduce a break at the largest multiloop section.

get_multiloop_nucleotides(multiloop_loop)[source]

Return a list of nucleotides which make up a particular multiloop.

Parameters:multiloop_loop – The elements which make up this multiloop
Returns:A list of nucleotides
get_multiloop_side(m)[source]

Find out which strand a multiloop is on. An example of a situation in which the loop can be on both sides can be seen in the three-stemmed structure below:

(.().().)

In this case, the first multiloop section comes off of the 5’ strand of the first stem (the prior stem is always the one with a lower numbered first residue). The second multiloop section comess of the 3’ strand of the second stem and the third loop comes off the 3’ strand of the third stem.

get_next_ml_segment(ml_segment)[source]

Get the adjacent multiloop-segment (or 3’ loop) next to the 3’ side of ml_segment.

If there is no other single stranded RNA after the stem, the backbone must end there. In that case return None.

get_node_dimensions(node, with_missing=False)[source]

Return the dimensions of a node.

If the node is a stem, then the dimensions will be l where l is the length of the stem.

Otherwise, see get_bulge_dimensions(node)

Parameters:node – The name of the node
Returns:A pair containing its dimensions
get_node_from_residue_num(base_num)[source]

USE get_elem instead.

get_position_in_element(resnum)[source]

Return the position of the residue in the cg-element and the length of the element.

Parameters:resnum – An integer. The 1-based position in the total sequence.
Returns:A tuple (p,l) where p is the position of the residue in the cg-element (0-based for stems, 1-based for loops) and p/l gives a measure for the position of the residue along the cg-element’s axis (0 means at cg.coords[elem][0], 1 at cg.coords[elem][1] and 0.5 exactely in the middle of these two. )
get_resseqs(define, seq_ids=True)[source]

Return the pdb ids of the nucleotides in this define.

Parameters:define – The name of this element.
Param:Return a tuple of two arrays containing the residue ids on each strand
get_side_nucleotides(stem, side)[source]

Get the nucleotide numbers on the given side of them stem. Side 0 corresponds to the 5’ end of the stem whereas as side 1 corresponds to the 3’ side of the stem.

Parameters:
  • stem – The name of the stem
  • side – Either 0 or 1, indicating the 5’ or 3’ end of the stem
Returns:

A tuple of the nucleotide numbers on the given side of the stem.

get_sides(s1, b)[source]

Get the side of s1 that is next to b.

s1e -> s1b -> b

Parameters:
  • s1 – The stem.
  • b – The bulge.
Returns:

A tuple indicating which side is the one next to the bulge and which is away from the bulge.

get_stem_edge(stem, pos)[source]

Returns the side (strand) of the stem that position is on.

Side 0 corresponds to the 5’ pairing residues in the stem whereas as side 1 corresponds to the 3’ pairing residues in the stem. :param stem: The name of the stem :param pos: A position in the stem :return: 0 if pos on 5’ edge of stem

get_strand(multiloop)[source]

Get the strand on which this multiloop is located.

Parameters:multiloop – The name of the multiloop
Returns:0 for being on the lower numbered strand and 1 for being on the higher numbered strand.
has_connection(v1, v2)[source]

Is there an edge between these two nodes

hloop_iterator()[source]

Iterator over all of the hairpin in the structure.

iloop_iterator()[source]

Iterator over all of the interior loops in the structure.

is_loop_pseudoknot(loop)[source]

Is a particular loop a pseudoknot?

Parameters:loop – A list of elements that are part of the loop (only m,f and t elements).
Returns:Either True or false
is_single_stranded(node)[source]

Does this node represent a single-stranded region?

Single stranded regions are five-prime and three-prime unpaired regions, multiloops, and hairpins

Warning

Interior loops are never considered single stranded by this function.

Parameters:node – The name of the node
Returns:True if yes, False if no
iter_elements_along_backbone(startpos=1)[source]

Iterate all coarse grained elements along the backbone.

Note that stems are yielded twice (for forward and backward strand). Interior loops may be yielded twice or once (if one side has no nucleotide)

0-length multiloop-segments are correctly yielded.

Parameters:startpos – The nucleotide position at which to start
Yields:Coarse grained element names, like “s0”, “i0”
iterate_over_seqid_range(start_id, end_id)[source]

Iterate over the seq_ids between the start_id and end_id.

junctions

Get all regular multiloops of this structure.

Returns:A list of tuples of multiloop segments. Each tuple contains the segments of one regular (i.e. not pseudoknotted) multiloop.
length_one_stem_basepairs()[source]

Return a list of basepairs that correspond to length-1 stems.

log(level=10)[source]
min_max_bp_distance(e1, e2)[source]

Get the minimum and maximum base pair distance between these two elements.

If they are connected, the minimum distance will be 1. The maximum will be 1 + length(e1) + length(e1)

Parameters:
  • e1 – The name of the first element
  • e2 – The name of the second element
Returns:

A tuple containing the minimum and maximum distance between the two elements.

mloop_iterator()[source]

Iterator over all of the multiloops in the structure.

nucleotides_to_elements(nucleotides)[source]

Convert a list of nucleotides (nucleotide numbers) to element names.

Remove redundant entries and return a set.

..note::
Use self.get_node_from_residue_num if you have only a single nucleotide number.
pairing_partner(nucleotide_number)[source]

Return the base pairing partner of the nucleotide at position nucleotide_number. If this nucleotide is unpaired, return None.

Parameters:nucleotide_number – The position of the query nucleotide in the sequence or a RESID instance.
Returns:The number of the nucleotide base paired with the one at position nucleotide_number.
pseudoknotted_basepairs(ignore_basepairs=())[source]

Return a list of base-pairs that will be removed to remove pseudoknots using the knotted2nested.py script.

Parameters:ignore_basepairs – An optional list of basepairs that knested2knotted will not consider present in the structure.
Returns:A list of base-pairs that can be removed.
random_subgraph(subgraph_length=None)[source]

Return a random subgraph of this graph.

Returns:A list containing a the nodes comprising a random subgraph
rods
seq
seq_id_to_pos(seq_id)[source]

Convert a pdb seq_id to a 1-based nucleotide position

Parameters:seq_id – An instance of RESID
seq_length
set_angle_types()[source]

Fill in the angle types based on the build order

shortest_bg_loop(vertex)[source]

Find a shortest loop containing this node. The vertex should be a multiloop.

Parameters:vertex – The name of the vertex to find the loop.
Returns:A list containing the elements in the shortest cycle.
shortest_mlonly_multiloop(ml_segment)[source]
shortest_path(e1, e2)[source]

Determine the shortest path between two elements (e1, e2) along the secondary structure.

Parameters:
  • e1 – The name of the first element
  • e2 – The name of the second element
Returns:

A list of the element names along the shortest path

sorted_edges_for_mst()[source]

Keep track of all linked nodes. Used for the generation of the minimal spanning tree.

sorted_element_iterator()[source]

Iterate over a list of the coarse grained elements sorted by the lowest numbered nucleotide in each stem. Multiloops with no nucleotide coordinates come last.

sorted_stem_iterator()[source]

Iterate over a list of the stems sorted by the lowest numbered nucleotide in each stem.

ss_distance(e1, e2)[source]

Calculate the distance between two elements (e1, e2) along the secondary structure. The distance only starts at the edge of each element, and is the closest distance between the two elements.

Parameters:
  • e1 – The name or nucleotide number of the first element
  • e2 – The name or nucleotide number of the second element
Returns:

The integer distance between the two elements / residues along the secondary structure. (if a element is given, we use its corner for the distance, otherwise the exact nucleotide)

stem_bp_iterator(stem, seq_ids=False)[source]

Iterate over all the base pairs in the stem.

stem_iterator()[source]

Iterator over all of the stems in the structure.

stem_length(key)[source]

Get the length of a particular element. If it’s a stem, it’s equal to the number of paired bases. If it’s an interior loop, it’s equal to the number of unpaired bases on the strand with less unpaired bases. If it’s a multiloop, then it’s the number of unpaired bases.

stem_resn_to_stem_vres_side(stem, res)[source]
stem_side_vres_to_resn(stem, side, vres)[source]

Return the residue number given the stem name, the strand (side) it’s on and the virtual residue number.

tloop_iterator()[source]

Yield the name of the 3’ prime unpaired region if it is present in the structure.

to_bg_string()[source]

Output a string representation that can be stored and reloaded.

to_bpseq_string()[source]

Create a bpseq string from this structure.

to_dotbracket_string(include_missing=False)[source]

Convert the BulgeGraph representation to a dot-bracket string and return it.

Returns:A dot-bracket representation of this BulgeGraph
to_element_string(with_numbers=False)[source]

Create a string similar to dotbracket notation that identifies what type of element is present at each location.

For example the following dotbracket:

..((..))..

Should yield the following element string:

ffsshhsstt

Indicating that it begins with a fiveprime region, continues with a stem, has a hairpin after the stem, the stem continues and it is terminated by a threeprime region.

Parameters:with_numbers

show the last digit of the element id in a second line.:

(((.(((...))))))

Could result in:

sssissshhhssssss
0000111000111000

Indicating that the first stem is named ‘s0’, followed by ‘i0’,’ s1’, ‘h0’, the second strand of ‘s1’ and the second strand of ‘s0’

to_fasta_string(include_missing=False)[source]

Output the BulgeGraph representation as a fast string of the format:

>id
AACCCAA
((...))
Parameters:include_missing – Whether or not residues for which no structure information is present should be included in the output.
to_file(filename)[source]
to_neato_string()[source]
to_networkx()[source]

Convert this graph to a networkx representation. This representation will contain all of the nucleotides as nodes and all of the base pairs as edges as well as the adjacent nucleotides.

to_pair_table()[source]

Create a pair table from the list of elements.

The first element in the returned list indicates the number of nucleotides in the structure.

i.e. [5,5,4,0,2,1]

to_pair_tuples(remove_basepairs=None)[source]

Create a list of tuples corresponding to all of the base pairs in the structure. Unpaired bases will be shown as being paired with a nucleotide numbered 0.

i.e. [(1,5),(2,4),(3,0),(4,2),(5,1)]

Parameters:remove_basepairs – A list of 2-tuples containing basepairs that should be removed
transformed
traverse_graph()[source]

Traverse the graph to get the angle types. The angle type depends on which corners of the stem are connected by the multiloop or internal loop.

Returns:A list of triples (stem, loop, stem)
forgi.graph.bulge_graph.print_brackets(brackets)[source]

Print the brackets and a numbering, for debugging purposes

Parameters:brackets – A string with the dotplot passed as input to this script.
forgi.graph.bulge_graph.profile(x)[source]

«  forgi.graph package   ::   Contents   ::   forgi.graph.residue module  »