Fill missing values on a table
I have a list of dates and prices:
Date Price
1/3/2000 10.00
1/5/2000 10.45
1/7/2000 10.25
... ...
I have a separate list of dates with all dates:
Date
1/1/2000
1/2/2000
1/3/2000
...
I need to have them combined so that the prior price is filled in for the dates that are missing prices:
Date Price
1/1/2000 10.00
1/2/2000 10.00
1/3/2000 10.00
1/4/2000 10.00
1/5/2000 10.45
1/6/2000 10.45
1/7/2000 10.25
... ...
I am currently trying to loop through array lists holding the data but can't line the dates up correctly, especially at the beginning and开发者_如何学Python end. I am using Java/Mysql/JDBC right now but am open to R also. Thanks for any suggestions.
Thanks to everyone for your help. Here's what I ended up doing:
-I created a list of all indexes where the dates matched.
-I then inserted the prices into an array with the same number of elements as the full time list.
-I then created 3 loops, one for the elements before the first matching time, one for the elements after the last matching element and finally one for everything in between.
-These three filled in the prices that were missing.
Just though I'd share. Thanks for all your help.
public static void checkLengths(ArrayList<String> masterTimes, ArrayList<String> testTimes, ArrayList<Double> prices){
ArrayList<Double> temp = new ArrayList<Double>();
ArrayList<Integer> matches = new ArrayList<Integer>();
Double[] temp2 = new Double [masterTimes.size()];
int mt = masterTimes.size();
int tt = testTimes.size();
if(mt == tt){
return;
}else{
int mast = 0;
int test = 0;
String mt1 = masterTimes.get(0);
String tt1 = testTimes.get(0);
test = 0;
for(int i = 0; i < masterTimes.size(); i++){
mt1 = masterTimes.get(i);
tt1 = testTimes.get(test);
System.out.println(" | mt1: " + mt1 + " | tt1: " + tt1);
if(mt1.equals(tt1)){
matches.add(i);
System.out.println("Inserting: " + i);
if(test < testTimes.size()){
test++;
}
if(test == testTimes.size()){
break;
}
}
}
System.out.println("Matches:");
printAL(matches);
// puts in known prices.
for(int i = 0; i < matches.size(); i++){
int g = matches.get(i);
temp2[g] = prices.get(i);
}
System.out.println("FirstPrices:");
printAR(temp2);
// Finds index of first and last matching times.
int matcher1 = matches.get(0);
int ind = matches.size() - 1;
int matcher2 = matches.get(ind);
System.out.println("Matcher1:" + matcher1 + " | Matcher2: " + matcher2);
// If a price is empty/null, it puts the prior price in it.
for(int i = matcher1; i < matcher2; i ++){
System.out.println(i + " | " + temp2[i]);
if(temp2[i] == null){
System.out.println(temp2[i] + " | " + temp2[i-1]);
temp2[i] = temp2[i-1];
}
}
System.out.println("SecondPrices:");
printAR(temp2);
// Deals with start.
for(int i = matcher1; i >= 0; i--){
if(temp2[i] == null){
temp2[i] = temp2[i+1];
}
}
System.out.println("ThirdPrices:");
printAR(temp2);
// Deals with end.
for(int i = matcher2; i < temp2.length; i++){
if(temp2[i] == null){
temp2[i] = temp2[i-1];
}
}
System.out.println("FourthPrices:");
printAR(temp2);
prices.clear();
System.out.println("Final Check:");
for (int i = 0; i < masterTimes.size(); i++){
System.out.println(i + " | " + masterTimes.get(i) + " | " + temp2[i]);
}
}
}
It is difficult to help without looking at the code but it seems like your indexes are not matching up or something is wrong with your looping logic.
Consider using a HashTable or a HashMap using the date strings as keys and price as values.
Loop through your date range one day at a time look up the price in the HashTable, if not found then use the previous price.
This sort of problem does take a bit of doing to do correctly. Sometimes using a flow chart helps if you get stuck.
Try using the following sample code:
import java.sql.*;
import java.util.*;
public class FillDates
{
public static void fillUnknownDates(Connection c) throws SQLException
{
// Loads in a Vector of Strings of all the dates
Statement state = c.createStatement();
ResultSet results = state.executeQuery("SELECT d FROM Dates ORDER BY d;");
Vector<String> dates = new Vector<String>();
while (results.next())
{
dates.add(results.getString("d"));
}
// Load in a list of all date/price combinations
Vector<DatePrice> pairs = new Vector<DatePrice>();
state = c.createStatement();
results = state.executeQuery("SELECT d, p FROM DatePrices ORDER BY d;");
while (results.next())
{
pairs.add(new DatePrice(results.getString("d"), results.getString("p")));
}
// Now go through the two lists and add missing prices
state = c.createStatement();
int dateIndex = 0;
DatePrice last = pairs.get(0), current;
for (int pairIndex = 1; pairIndex < pairs.size(); pairIndex++)
{
current = pairs.get(pairIndex);
while (dateIndex < dates.size() && dates.get(dateIndex).compareTo(current.getDate()) < 0)
{
// Batch things up so it takes less time to run
state.addBatch("INSERT INTO DatePrices VALUES (\""+dates.get(dateIndex)+"\", \""+current.getPrice+"\");");
dateIndex ++;
}
last = current;
}
state.executeBatch();
}
// A convenience class
public static class DatePrice
{
private String date, price;
public DatePrice(String date, String price)
{
this.date = date;
this.price = price;
}
public String getDate()
{
return date;
}
public String getPrice()
{
return price;
}
}
}
Note that it's not complete, and you'll need to change the names of your tables and columns before trying it out.
Okay... I just shooting at it while being on the fon :)
In MySQL, let's assume you got two tables, dates_prices and all_dates. Then LEFT JOIN them on dates and order them by date.
If you use R and MySQL you can use the RMySQL package to load the resulting table to R.
In R you can convert the dates to POSIX with as.POSIXlt
. You also might want to use the lag
function in R (but I am not sure yet if that helps with lags of varying spans).
Apart from that you could use R's ´sqldf` package if you want to try with "plain" R but want to use SQL functionality. If you post some reproducible code to set up the data.. I could try to the give something more concrete back.
EDIT:
The impute
package might be what you really looking for... see also here
Here is an R solution.
Uncomment the two install.packages
lines if you don't have those packages already installed. Also textConnection(Lines1)
and textConnection(Lines2)
are just to keep the example self contained and in reality would be replaced with something like "myfile1.dat"
and "myfile2.dat"
assuming the data is in those files.
It reads in the data creating zoo object z
and a Date vector dt
. It then merges z
with a zero width zoo object (i.e. it has dates but no data) whose date index is made from dt
. na.locf
(last observation carried forward) fills out the missing values in reverse order since fromLast = TRUE
Lines1 <- "Date Price
1/3/2000 10.00
1/5/2000 10.45
1/7/2000 10.25"
Lines2 <- "Date
1/1/2000
1/2/2000
1/3/2000"
# install.packages("zoo")
# install.packages("chron")
library(zoo)
library(chron)
z <- read.zoo(textConnection(Lines1), header = TRUE, FUN = as.chron)
dt <- as.chron(scan(textConnection(Lines2), skip = 1, what = ""))
na.locf(merge(z, zoo(, dt)), fromLast = TRUE)
The result is:
> na.locf(merge(z, zoo(, dt)), fromLast = TRUE)
01/01/00 01/02/00 01/03/00 01/05/00 01/07/00
10.00 10.00 10.00 10.45 10.25
There are three vignettes (PDF documents) that come with the zoo package and R News 4/1 Help Desk article has info and references on dates.
精彩评论