I have a large dataset(around 50000 points) which has be visualized in a line chart.The size of Canvas may vary according to the size of dataset. Massive amount of points makes the drawing too slow.It also results in cluttering and overlapping of points due to plotting of several points close to each other.So visual representation of data will be unsatisfactory. How can I display subset of available points which will increase the performance? Can anybody suggest me with a suitable algorithm which will help to extract the subset of actual number of points?
16 replies to this topic
#1
Posted 12 July 2010 - 09:43 PM
|
|
|
#2
Posted 12 July 2010 - 11:50 PM
be more specific please ... also post the code so that we can analyse and help you !!
#3
Posted 13 July 2010 - 12:12 AM
I have a collection of points (say around 50,000) which has to be drawn in a line chart.I am using line segment and Path to draw this points in chart.
List<double> src = ... //50000 points.
for(i=0;i<src.count;i++)
{
LineSegment segment = new LineSegment();
//get the point x and y value present in the collection
//get the previous point
//add line to path
}
Since I am using large data set it consumes lot of time to load.So I need to eliminate some unwanted points like duplicates..and also for example if i have points like (1,2)(2,4)(3,6) then (2,4) can be removed.The points which are located very closely to each other also can be removed.I have tried calculating average of consecutive points so that length will reduce to half.But it would not be feasible since it will not show proper values.Can you please provide me how can in achieve filtering of point list?How can I do sampling of data?
List<double> src = ... //50000 points.
for(i=0;i<src.count;i++)
{
LineSegment segment = new LineSegment();
//get the point x and y value present in the collection
//get the previous point
//add line to path
}
Since I am using large data set it consumes lot of time to load.So I need to eliminate some unwanted points like duplicates..and also for example if i have points like (1,2)(2,4)(3,6) then (2,4) can be removed.The points which are located very closely to each other also can be removed.I have tried calculating average of consecutive points so that length will reduce to half.But it would not be feasible since it will not show proper values.Can you please provide me how can in achieve filtering of point list?How can I do sampling of data?
#4
Posted 13 July 2010 - 01:57 AM
Here is the solution for you:
In this example, I've set canvas size 700 pixels, and formula said that 694 groups will be created. This guarantees that line drawn will be absolutely smooth and without overlapping segments.
And here is the picture that would be drawn with these 694 dots, and you can see that it is quite smooth (drawn by Excel):
Sinewave..png 13.12K
17 downloads
using System;
using System.Collections.Generic;
using System.Drawing;
namespace TestApp
{
class Program
{
static void Main(string[] args)
{
int count = 50000;
Size canvasSize = new Size(700, 300);
List<double> src = new List<double>();
for (int i = 0; i < count; i++)
src.Add(1 + Math.Sin(Math.PI * i / 10000));
int groupSize = (int)Math.Ceiling(count / (double)canvasSize.Width);
List<double> drawList = new List<double>();
int curGroupSize = 0;
double curSum = 0;
foreach (double d in src)
{
if (curGroupSize == groupSize)
{
drawList.Add(curSum / curGroupSize);
curSum = 0;
curGroupSize = 0;
}
curSum += d;
curGroupSize++;
}
if (curGroupSize > 0)
drawList.Add(curSum / curGroupSize);
foreach (double d in drawList)
Console.WriteLine(d);
}
}
}
This code groups adjacent points and replaces them with their average value. Number of groups is calculated based on canvas width, so that total number of groups does not exceed canvas width in pixels. As a result, no segment will have equal X coordinates, so line drawn is guaranteed to be smooth.In this example, I've set canvas size 700 pixels, and formula said that 694 groups will be created. This guarantees that line drawn will be absolutely smooth and without overlapping segments.
And here is the picture that would be drawn with these 694 dots, and you can see that it is quite smooth (drawn by Excel):
Sinewave..png 13.12K
17 downloads
#5
Posted 13 July 2010 - 04:39 AM
Thanks :)
#6
Posted 13 July 2010 - 05:09 AM
Hi,
As I see it is calculating the average of two consecutive points.Which will reduce the size to half.But my question is suppose the two points are (1,5) (80,90) then average will be 47 which will not depict the appropriate value.Taking average will be helpful when the points are close to each other.Is there any other way to reduce the size?or taking average of all adjacent points is the solution?
As I see it is calculating the average of two consecutive points.Which will reduce the size to half.But my question is suppose the two points are (1,5) (80,90) then average will be 47 which will not depict the appropriate value.Taking average will be helpful when the points are close to each other.Is there any other way to reduce the size?or taking average of all adjacent points is the solution?
#7
Posted 13 July 2010 - 06:39 AM
I see. If you have non-uniform distribution of X coordinates then colution must be adapted to that condition. Give me a minute and I'll send you the solution.
#8
Posted 13 July 2010 - 06:48 AM
Wait a second, how do you get X coordinate if you start with List<double>? Using list of Y coordinates implies that points are uniformly distributed over X.
Anyway, you can imagine your canvas as an array of vertical rectangles, each rectangle being exactly one pixel in width. Then you just distribute all points into these rectangles, for each rectangle calculating average Y coordinate of all points that fit in it.
Once this process is finished, each rectangle having at least one starting point in it gives one dot: rectangle's X coordinate is the X coordinate of the dot and Y coordinate is the average Y of all points fitting in the rectangle. Then you just draw lines between successive rectangles.
Note that first algorithm supposed that points are sorted in non-descending X order. This algorithm does not make such assumption, but of course you must sort rectangles by X coordinates before you start drawing.
Anyway, you can imagine your canvas as an array of vertical rectangles, each rectangle being exactly one pixel in width. Then you just distribute all points into these rectangles, for each rectangle calculating average Y coordinate of all points that fit in it.
Once this process is finished, each rectangle having at least one starting point in it gives one dot: rectangle's X coordinate is the X coordinate of the dot and Y coordinate is the average Y of all points fitting in the rectangle. Then you just draw lines between successive rectangles.
Note that first algorithm supposed that points are sorted in non-descending X order. This algorithm does not make such assumption, but of course you must sort rectangles by X coordinates before you start drawing.
#9
Posted 13 July 2010 - 08:15 AM
This is the general solution when one should reduce number of points to draw a smooth graph:
Sinewave1..png 22.12K
16 downloads
These points are then smoothed by dividing the canvas into as many buckets as many pixels there are in the width. This is done in the ReduceArray method, which is commented so that you see what it does.
Once reduction to 700 points is complete, the image looks like this:
Sinewave2..png 25.46K
14 downloads
This algorithm does not depend on distribution of X and Y coordinates in source list.
using System;
using System.Drawing;
using System.Collections.Generic;
namespace TestApp
{
class Program
{
static void Main(string[] args)
{
int count = 30000;
List<PointF> src = null;
List<PointF> lst = null;
Size canvasSize = new Size(700, 300);
CreateRandomArray(out src, count);
ReduceArray(src, out lst, canvasSize.Width);
PrintPoints("Raw points:", src);
PrintPoints("Reduced array:", lst);
}
public static void CreateRandomArray(out List<PointF> lst, int count)
{
lst = new List<PointF>();
Random rnd = new Random();
double maxX = 4 * Math.PI;
double maxDistortion = 0.05;
for (int i = 0; i < count; i++)
{
double x = (double)rnd.Next() / (double)Int32.MaxValue * maxX; // Double value between 0.0 and maxX
double relativeDistortion = 2 * ((double)rnd.Next() / (double)Int32.MaxValue - 0.5); // Random value between -1.0 and +1.0
double y = 1 + Math.Sin(x) + relativeDistortion * maxDistortion;
PointF point = new PointF((float)x, (float)y);
lst.Add(point);
}
lst.Sort(new Comparison<PointF>(ComparePoints));
}
public static void ReduceArray(List<PointF> src, out List<PointF> lst, int canvasWidth)
{
int[] bucketSize = new int[canvasWidth];
double[] bucketSum = new double[canvasWidth];
// First step: determine range of X values
float minX = 0;
float maxX = 0;
bool first = true;
foreach (PointF p in src)
if (first)
{
minX = maxX = p.X;
first = false;
}
else if (p.X < minX)
minX = p.X;
else if (p.X > maxX)
maxX = p.X;
// Second step: divide points into buckets of equal width
float bucketWidth = (maxX - minX) / canvasWidth;
foreach (PointF p in src)
{
int bucketIndex = (int)((p.X - minX) / bucketWidth);
if (bucketIndex >= canvasWidth) // Protect yourself from errors caused by
bucketIndex = canvasWidth - 1; // lack of precision when bucketWidth was calculated!
bucketSize[bucketIndex]++;
bucketSum[bucketIndex] += p.Y;
}
// Third step: read average Y coordinates from all non-empty buckets
lst = new List<PointF>();
for (int i = 0; i < canvasWidth; i++)
if (bucketSize[i] > 0)
{
PointF p = new PointF();
p.X = i * bucketWidth + bucketWidth / 2; // Middle of bucket's position is its X coordinate
p.Y = (float)(bucketSum[i] / bucketSize[i]); // Average Y coordinate is bucket's Y coordinate
lst.Add(p);
}
}
public static void PrintPoints(string msg, List<PointF> lst)
{
Console.WriteLine(msg);
foreach (PointF p in lst)
Console.WriteLine("{0}\t{1}", p.X, p.Y);
}
public static int ComparePoints(PointF p1, PointF p2)
{
int result = 0;
if (p1.X < p2.X || (p1.X == p2.X && p1.Y < p2.Y))
result = -1;
else if (p1.X > p2.X || (p1.X == p2.X && p1.Y > p2.Y))
result = 1;
return result;
}
}
}
In this test, I've generated distorted sinewave with 30,000 points, which looks like this:
Sinewave1..png 22.12K
16 downloadsThese points are then smoothed by dividing the canvas into as many buckets as many pixels there are in the width. This is done in the ReduceArray method, which is commented so that you see what it does.
Once reduction to 700 points is complete, the image looks like this:
Sinewave2..png 25.46K
14 downloadsThis algorithm does not depend on distribution of X and Y coordinates in source list.
#10
Posted 13 July 2010 - 11:20 AM
If you don't wish to code your own Graphing utilites fully, I've used Gigasoft Proessentials Graphing utilities at my Job, and they work wonderful (and quickly) allow Zooming / Tons of different graphs & good speed. (They also have a free trial that you can download)
#11
Posted 13 July 2010 - 09:03 PM
Thanks
Edited by sharmilakumari, 14 July 2010 - 03:57 AM.
#12
Posted 13 July 2010 - 09:34 PM
Thanks
Edited by sharmilakumari, 14 July 2010 - 03:56 AM.
1 user(s) are reading this topic
0 members, 1 guests, 0 anonymous users


Sign In
Create Account


Back to top









