What is the range of planting dates in these data

Assignment Help Applied Statistics

Reference no: EM132347945

Raw Data Final Project -

The raw data used to create the data sets for both midterm and final projects were in ArcGIS shape file format, one file for seeding data and one file for harvest data per field. Shape file data are commonly tagged with GPS coordinates, so require GPS enabled machinery, in this case a seeder and a harvester, respectively.

Read each file using QGIS and exported the tables containing GPS coordinates and the associated seeding rate and harvest observations to CSV files, then anonymized the data by projecting the GPS coordinates onto a cartesian grid, with the origin in the lower left (south west) corner of each field. Then selected some of the columns to be exported as saved as the file uploaded to D2L. The original GPS coordinate columns were labelled Latitude and Longitude, the projections are LatM and LonM. The harvest data files include the fields Moisture (percent), DISTANCE (ft traveled from previous point) and VRYIELDVOL (Yield in bu/acre).

The data were collected using two different machines on at least two different dates, so cannot be directly superimposed. Instead, I used a process called kriging to interpolate VRYIELDVOL samples onto GPS coordinates in the seed*.csv files, attached the estimated values as Yield and saved the combined data as field*.csv, which were uploaded to D2L for the midterm project.

Instructions -

The final project will be a continuation of the midterm project. We will continue working with data from four corn fields, but will be looking at some issues with times and dates.

For each field, I've uploaded two files, seed*.csv and harvest*.csv, corresponding to seeding rate and yield data, respectively. Both sets of files have a column Timestamp with strings of the form 2018-05-20T13:20:08.201Z

Your task will be to extract the date and time values from these strings, and answer these questions. The text before 'T" is the date string, and the text between 'T" and "Z" is the time string, in universal time.

1. What is the range of planting dates in these data?

2. Was each field harvested entirely in one day, and where they harvested each at approximately the same time of day?

It might be enough to process only the first and last rows in the data. Each should be sampled at one second intervals, so the time difference between the first and last rows, in seconds, should be almost equal to the number of rows. I would not be too worried about gaps on the data on the order of seconds, but I would want to know about hour long gaps in the data, and where they occur. Note that time will reset to 0 at 23:59:59 and date will increment, so account for this in the analysis.

Some additional thoughts.

There is a relationship between planting date and yield. We could review the literature to get a more precise estimate (and you can do that, if you wish), but we'll start with a rough back-of-the-envelope calculation.

Suppose we are working with 100-day corn - we expect corn to reach maturity in roughly 100 days. Let's suppose that the difference between the first field planting date and the last field planting date is 5 days. That's 5 percent of the growing period, so let %Diff = 5. What is the standard deviation for Yield, with regard to planting date? Again, we can look to the literature, but we'll use a simplifying assumption that it will be similar to the sd for yield vs planting rate, as determined in the midterm project. Convert that to CV.

Now we have a first approximation for effect size (%Diff/CV). Is this effect size large enough that we need to worry that the effect of planting date will confound our analysis of the relationship between yield and planting rate? When I gather data from more fields, do I need to be careful and only include fields in a narrow range of dates? What is the first approximation for the number of fields required to test the relationship between planting date and yield?

Not, about gaps in harvest. I've been arguing that analyzing yield monitor data should be a two-step process. First, analysis the yield as a time series (as grain moves through the harvester) then analyze as spatial data (the as the harvester moves over the field). There will be auto-correlated errors in each process that should be analyzed independently.

In particular, the value for yield as reported in these data is not the yield of the grain as it is measured going through the harvester. The grain moving through the harvester will be of varying degrees of maturity, thus of grain moisture - less mature grain will have more water content, thus more weight. Yield values are standardized to a define percent moisture, so the harvester has a moisture sensor, and the percent moisture reading is use to normalize yield.

But yield is measured at one second intervals, while moisture can only be measured at approximately 10-15 second intervals. I'm curious if gaps in the harvest record will affect how yield is normalized by moisture - there may be some cases where the moisture reading is uncorrelated with yield, because of this difference in measurement (I suspect it will be very small, but it's an interesting problem, to me).

Also interested in how percent moisture changes over the course of a day, and if there is an effect when fields are harvested at different times of day.

Attachment:- Raw Data Final Project Assignment Files.rar

Reference no: EM132347945

Questions Cloud

Put option with a current value : A put option with a current value of $7.80. Both options written on the same stock, with 1 year until expiration, and a strike price of $44.00.

What a change in function might mean to future students : In Chapters 1 and 4 of your A History of American Higher Education text, Thelin describes how responsibilities were managed in the early colonial colleges.

What is the after-tax cash : Manzana Inc. is buying a piece of equipment. The equipment costs $4,000,000. The equipment is considered for tax purposes as a 5-year MACRS class.

How the built environment influences health : Include details, your opinion and facts. Use your checksheet to better understand how the built environment influences health. Utilize evidence and class.

What is the range of planting dates in these data : Your task will be to extract the date and time values from these strings, and answer these questions - What is the range of planting dates in these data

What is meant by life expectancy : NAMIBIA SENIOR SECONDARY CERTIFICATE-NDEVELOPMENT STUDIES ORDINARY LEVEL-4331/1-What is meant by life expectancy?

Explain the implications of globalization : Explain the implications of globalization. Identify at least two ethical issues that go along with the global societal topic you have chosen for your final.

What must be the current price of the stock : The prevailing risk-free rate is 6.00%. What must be the current price of the stock on which these two options are written?

How the fallacies interact or co-exist in the organization : For this paper, select an organization with which you are familiar, (MIAMI DADE COLLEGE) use Bolman & Deal's common fallacies to analyze the organization's.

User Account

All Pages