Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Define variants of turing machine, Define variants of Turing Machine?  ...

Define variants of Turing Machine?  Variants are  Non deterministic turing machine.  Mutlitape turing  machine.  Enumerators

Hypothetical reliable data transfer protocol, the c code for hypothetical r...

the c code for hypothetical reliable data transfer protocol

Persuasive communication , 1)   Discuss various types of persuasive communi...

1)   Discuss various types of persuasive communication you might be needed to write or present in your professional and personal life. 2)    Describe two ways to organize a resume

What is the purpose of cdata in an xml document, Question: (a) What is...

Question: (a) What is the purpose of CDATA in an XML document? Explain your answer using extract codes. (b) Consider the following definition for an address element:

What do you mean by first fit, What do you mean by first fit? First fi...

What do you mean by first fit? First fit allocates the first hole that is big enough. Searching can either begin at the beginning of the set of holes or where the last first-f

State the web server security through ssl, Web server security through SSL ...

Web server security through SSL (Secure Socket Layer) As it is well known that the Intranets and internet are purely based on use of powerful web servers to deliver information

What is artificial intelligence language processing, Artificial intelligenc...

Artificial intelligence language processing (AILP) is a field of computer science and linguistics concerned with the interactions among computers and human (natural) languages; it

Give example of stack using encapsulation of OOA, Give example of stack usi...

Give example of stack using encapsulation of OOA An example of the Stack.  A Stack abstraction provides methods like pop (), push (), isEmpty(), isFull(). The Stack can be i

4, Ask question 4#Minimum 100 words accepted#

Ask question 4#Minimum 100 words accepted#

Write a menu driven program to find 10''s complement, Q. Write a menu dri...

Q. Write a menu driven program to find 9's and 10's complement of a decimal number using file. Perform necessary validation with proper message that entered numbers must be de

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd