Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

What is wmfc, What is WMFC? WMFC if the control signals that causes the...

What is WMFC? WMFC if the control signals that causes the processor's control circuitry to wait for the arrival of the MFC signal.

What is microcomputer system, Q. What is microcomputer system? The micr...

Q. What is microcomputer system? The microcomputer has a single microprocessor and a number of RAM and ROM chips as well as an interface unit which communicates with several ex

Approach to reasoning - first-order logic, Approach to reasoning - first-or...

Approach to reasoning - first-order logic: The formal approach to reasoning has bigger return and disadvantages. In generally we notice, if a computer program has proved somet

Explain multithreading in a programming language, Multithreading is the pro...

Multithreading is the process in which more than single thread run independent of each other within the process in any programming language such as C, C++, etc.

What are the parts of a deployment diagram, ? Nodes: A node shows any hardw...

? Nodes: A node shows any hardware component. The configuration of hardware is shown by attributes of nodes. ? Components: A component shows software. Every component straight

How a shift register can be used as a ring counter, Explain how a shift reg...

Explain how a shift register can be used as a ring counter giving the wave forms at the output of the flipflops. Ans: Shift Register as a Ring Counter: A Ring Counter is a

Serial port communication, The 68HC11F1 has two types of serial ports namel...

The 68HC11F1 has two types of serial ports namely asynchronous (SCI) and synchronous (SPI). The most common form of communication device used in control is the SCI as this provides

What are transmission bridges, What are transmission bridges? A usual ...

What are transmission bridges? A usual transmission bridge is demonstrated in figure. The series capacitance and the shunt inductances of the two relays give a high-pass filte

What is the difference between eprom and eeprom, Question 1: (a) Wha...

Question 1: (a) What is the difference between IT (Information Technology) and ICT(Information Communication Technology) (b) Explain why information systems are so im

Explain the microprocessor development system, Microprocessor development s...

Microprocessor development system Computer systems have undergone many changes recently. Machines that once filled large areas have been reduced to small desktop computer syste

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd