Dna sequences, Computer Engineering

Assignment Help:

The dataset provided in this assignment contains a collection of real DNA sequences. The number of true binding sites is quite limited and that makes the problem challenging. In machine learning community, this is termed as imbalanced datasets. Some techniques dealing with imbalanced data classification, such as sampling or filtering, can be applied for the biological data. It is a good idea to find some relevant publications to see in which way you can build effective classifiers for motif recognition.

The whole dataset should be partitioned into a training dataset used to build the learner models, and a testing dataset used to evaluate generalization capability of the classification systems. System performance will be evaluated by looking at the recall, precision, F-measure and recognition rate for both the training dataset and the test dataset.

It is very important to notice that unlike traditional way for evaluating classifier's performance, here a kmer is classified as a motif instance if its location has at least 50% overlap with a true binding site in the DNA sequences. For example, consider two true binding sites ACACGGGA and ACACGGGA in the following DNA sequence.

ccttacacaaACACGGGAgaattaatACACGGGAtcagatcaataaa (1)

Suppose that the 8mers acaaACAC and ACGGGAtc are classified as binding sites by a learner model. Then, we will count them as correct prediction because they have 50% and 75% overlaps with the true binding sites in sequence (1), respectively. Conversely, if classifiers classify them as non-binding sites, then we will count them as incorrect prediction because they have at least 50% overlaps with the true binding sites. Take another 8mer, GAgaatta, in (1). If it is classified by a learner model as a binding site, then it will be counted as a misclassified one because it has only 25% overlap with the true binding site ACACGGGA


Related Discussions:- Dna sequences

Write the truth table to realize the function nand gate, For F = A.B.C +...

For F = A.B.C + B.C.D ‾ + A ‾.B.C, write  the  truth  table to realize the function using NAND gates only ? Ans. Logic Function given as F = ABC + BC‾D + A‾BC, simplification o

Show types under which networks will be divided, What are the two broad typ...

What are the two broad types under which Networks will be divided? Ans: All computer networks fit in one of the two dimensions specifically: a)  Transmission Technology, thi

Illustrate program on hypothetical machine, Q. Illustrate program on hypoth...

Q. Illustrate program on hypothetical machine? The program given in figure above is a hypothetical program which performs addition of numbers stored from locations 2001 onwards

Interpolation algorithm, Design two matlab algorithms for enlarging the 256...

Design two matlab algorithms for enlarging the 256x256 images into 512x512 images by using bilinear and bicubic interpolations   a)  Evaluate the interpolated images with the

What is meant by context switch, What is meant by context switch?  Swit...

What is meant by context switch?  Switching the CPU to another process requires saving the state of the old process and loading the saved state for the new process. This task i

External report cannot be called, When calling an external report the param...

When calling an external report the parameters or select-options specified in the external report cannot be called.

Mini project, give proper code for any kind of project in oop c++

give proper code for any kind of project in oop c++

Wireless Networking, Suppose you work in a network security company, and yo...

Suppose you work in a network security company, and you need to prepare a survey report of a particular security issue of wireless networking. To start with, select an area of wire

How do you save data in bdc tables, How do you save data in BDC tables? ...

How do you save data in BDC tables? The data in BDC tables is saved by using the field name 'BDC_OKCODE' and field value of '/11'.

Define the node of object oriented modeling, Define the node of object orie...

Define the node of object oriented modeling A node is a physical element which exists at runtime and represents a computational resource usually having a large memory and often

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd