Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Double negation - artificial intelligence, Double Negation - Artificial int...

Double Negation - Artificial intelligence: Always parents are correcting their children for the use of double negatives, but we have to be very alert with them in natural langu

Difference among using a filter and a query to find records, What is the di...

What is the difference among using a filter and a query to find records? Filter is used to quickly limit the records as we are already viewing in a Datasheet or a form to those

Mode counter, desing mode level counter starting at 0011 using D flipflop

desing mode level counter starting at 0011 using D flipflop

Example of asymptotic notations, Q. Example of asymptotic notations? Th...

Q. Example of asymptotic notations? The function f (n) belongs to the set  (g(n)) if there exists positive constants c such that for satisfactorily large values of n we have 0

How can we create an outline around text, Place some text wherever. Then cl...

Place some text wherever. Then click "Create path from text" in the "Text tool option" window. Then use "Edit" -> "Stroke path" and choose the appropriate options in the following

Texts in the text elements of the program, Which Texts  in the text elemen...

Which Texts  in the text elements of the program helps in changing the displayed names of variables in the parameters statement. Selection text in the elements of the program

Explain the resources of an operating system, Explain the resources of da...

Explain the resources of data structure is used by an operating system to keep track of process information? Explain A process is a program in execution. An operating system

Example of structural hazards - computer architecture, Example of Structura...

Example of Structural hazards - computer architecture: A machine has shared a single-memory pipeline for instructions and data. As a consequence, when an instruction which con

Sites are useful to the target audience members, Normal 0 false...

Normal 0 false false false EN-IN X-NONE X-NONE MicrosoftInternetExplorer4 Select a range of a

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd