Jump to content

Handling large datasets in charts quickely

- - - - -

  • Please log in to reply
16 replies to this topic

#1
sharmilakumari

sharmilakumari

    Newbie

  • Members
  • PipPip
  • 11 posts
I have a large dataset(around 50000 points) which has be visualized in a line chart.The size of Canvas may vary according to the size of dataset. Massive amount of points makes the drawing too slow.It also results in cluttering and overlapping of points due to plotting of several points close to each other.So visual representation of data will be unsatisfactory. How can I display subset of available points which will increase the performance? Can anybody suggest me with a suitable algorithm which will help to extract the subset of actual number of points?

#2
sp3tsnaz

sp3tsnaz

    Learning Programmer

  • Members
  • PipPipPip
  • 41 posts
be more specific please ... also post the code so that we can analyse and help you !!

#3
sharmilakumari

sharmilakumari

    Newbie

  • Members
  • PipPip
  • 11 posts
I have a collection of points (say around 50,000) which has to be drawn in a line chart.I am using line segment and Path to draw this points in chart.
List<double> src = ... //50000 points.
for(i=0;i<src.count;i++)
{
LineSegment segment = new LineSegment();
//get the point x and y value present in the collection
//get the previous point
//add line to path
}
Since I am using large data set it consumes lot of time to load.So I need to eliminate some unwanted points like duplicates..and also for example if i have points like (1,2)(2,4)(3,6) then (2,4) can be removed.The points which are located very closely to each other also can be removed.I have tried calculating average of consecutive points so that length will reduce to half.But it would not be feasible since it will not show proper values.Can you please provide me how can in achieve filtering of point list?How can I do sampling of data?

#4
zoranh

zoranh

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 207 posts
Here is the solution for you:
using System;
using System.Collections.Generic;
using System.Drawing;

namespace TestApp
{
    class Program
    {

        static void Main(string[] args)
        {

            int count = 50000;
            Size canvasSize = new Size(700, 300);

            List<double> src = new List<double>();

            for (int i = 0; i < count; i++)
                src.Add(1 + Math.Sin(Math.PI * i / 10000));

            int groupSize = (int)Math.Ceiling(count / (double)canvasSize.Width);

            List<double> drawList = new List<double>();
            int curGroupSize = 0;
            double curSum = 0;

            foreach (double d in src)
            {
                
                if (curGroupSize == groupSize)
                {
                    drawList.Add(curSum / curGroupSize);
                    curSum = 0;
                    curGroupSize = 0;
                }

                curSum += d;
                curGroupSize++;

            }

            if (curGroupSize > 0)
                drawList.Add(curSum / curGroupSize);

            foreach (double d in drawList)
                Console.WriteLine(d);

        }

    }
}
This code groups adjacent points and replaces them with their average value. Number of groups is calculated based on canvas width, so that total number of groups does not exceed canvas width in pixels. As a result, no segment will have equal X coordinates, so line drawn is guaranteed to be smooth.

In this example, I've set canvas size 700 pixels, and formula said that 694 groups will be created. This guarantees that line drawn will be absolutely smooth and without overlapping segments.

And here is the picture that would be drawn with these 694 dots, and you can see that it is quite smooth (drawn by Excel):
Attached File  &#83;&#105;&#110;&#101;&#119;&#97;&#118;&#101;&#46;.png   13.12K   17 downloads

#5
sharmilakumari

sharmilakumari

    Newbie

  • Members
  • PipPip
  • 11 posts
Thanks :)

#6
sharmilakumari

sharmilakumari

    Newbie

  • Members
  • PipPip
  • 11 posts
Hi,

As I see it is calculating the average of two consecutive points.Which will reduce the size to half.But my question is suppose the two points are (1,5) (80,90) then average will be 47 which will not depict the appropriate value.Taking average will be helpful when the points are close to each other.Is there any other way to reduce the size?or taking average of all adjacent points is the solution?

#7
zoranh

zoranh

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 207 posts
I see. If you have non-uniform distribution of X coordinates then colution must be adapted to that condition. Give me a minute and I'll send you the solution.

#8
zoranh

zoranh

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 207 posts
Wait a second, how do you get X coordinate if you start with List<double>? Using list of Y coordinates implies that points are uniformly distributed over X.

Anyway, you can imagine your canvas as an array of vertical rectangles, each rectangle being exactly one pixel in width. Then you just distribute all points into these rectangles, for each rectangle calculating average Y coordinate of all points that fit in it.

Once this process is finished, each rectangle having at least one starting point in it gives one dot: rectangle's X coordinate is the X coordinate of the dot and Y coordinate is the average Y of all points fitting in the rectangle. Then you just draw lines between successive rectangles.

Note that first algorithm supposed that points are sorted in non-descending X order. This algorithm does not make such assumption, but of course you must sort rectangles by X coordinates before you start drawing.

#9
zoranh

zoranh

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 207 posts
This is the general solution when one should reduce number of points to draw a smooth graph:

using System;
using System.Drawing;
using System.Collections.Generic;

namespace TestApp
{
    class Program
    {

        static void Main(string[] args)
        {

            int count = 30000;
            List<PointF> src = null;
            List<PointF> lst = null;
            Size canvasSize = new Size(700, 300);

            CreateRandomArray(out src, count);

            ReduceArray(src, out lst, canvasSize.Width);

            PrintPoints("Raw points:", src);
            PrintPoints("Reduced array:", lst);

        }

        public static void CreateRandomArray(out List<PointF> lst, int count)
        {
            lst = new List<PointF>();
            Random rnd = new Random();
            double maxX = 4 * Math.PI;
            double maxDistortion = 0.05;

            for (int i = 0; i < count; i++)
            {

                double x = (double)rnd.Next() / (double)Int32.MaxValue * maxX;                             // Double value between 0.0 and maxX
                double relativeDistortion = 2 * ((double)rnd.Next() / (double)Int32.MaxValue - 0.5);    // Random value between -1.0 and +1.0
                double y = 1 + Math.Sin(x) + relativeDistortion * maxDistortion;

                PointF point = new PointF((float)x, (float)y);
                lst.Add(point);

            }

            lst.Sort(new Comparison<PointF>(ComparePoints));

        }

        public static void ReduceArray(List<PointF> src, out List<PointF> lst, int canvasWidth)
        {

            int[] bucketSize = new int[canvasWidth];
            double[] bucketSum = new double[canvasWidth];

            // First step: determine range of X values

            float minX = 0;
            float maxX = 0;
            bool first = true;

            foreach (PointF p in src)
                if (first)
                {
                    minX = maxX = p.X;
                    first = false;
                }
                else if (p.X < minX)
                    minX = p.X;
                else if (p.X > maxX)
                    maxX = p.X;

            // Second step: divide points into buckets of equal width

            float bucketWidth = (maxX - minX) / canvasWidth;

            foreach (PointF p in src)
            {
                
                int bucketIndex = (int)((p.X - minX) / bucketWidth);

                if (bucketIndex >= canvasWidth)     // Protect yourself from errors caused by
                    bucketIndex = canvasWidth - 1;  // lack of precision when bucketWidth was calculated!

                bucketSize[bucketIndex]++;
                bucketSum[bucketIndex] += p.Y;

            }

            // Third step: read average Y coordinates from all non-empty buckets

            lst = new List<PointF>();

            for (int i = 0; i < canvasWidth; i++)
                if (bucketSize[i] > 0)
                {
                    
                    PointF p = new PointF();
                    p.X = i * bucketWidth + bucketWidth / 2;        // Middle of bucket's position is its X coordinate
                    p.Y = (float)(bucketSum[i] / bucketSize[i]);    // Average Y coordinate is bucket's Y coordinate
                    
                    lst.Add(p);

                }

        }

        public static void PrintPoints(string msg, List<PointF> lst)
        {

            Console.WriteLine(msg);

            foreach (PointF p in lst)
                Console.WriteLine("{0}\t{1}", p.X, p.Y);

        }

        public static int ComparePoints(PointF p1, PointF p2)
        {
            
            int result = 0;

            if (p1.X < p2.X || (p1.X == p2.X && p1.Y < p2.Y))
                result = -1;
            else if (p1.X > p2.X || (p1.X == p2.X && p1.Y > p2.Y))
                result = 1;
            
            return result;

        }

    }
}
In this test, I've generated distorted sinewave with 30,000 points, which looks like this:
Attached File  &#83;&#105;&#110;&#101;&#119;&#97;&#118;&#101;&#49;&#46;.png   22.12K   16 downloads

These points are then smoothed by dividing the canvas into as many buckets as many pixels there are in the width. This is done in the ReduceArray method, which is commented so that you see what it does.

Once reduction to 700 points is complete, the image looks like this:
Attached File  &#83;&#105;&#110;&#101;&#119;&#97;&#118;&#101;&#50;&#46;.png   25.46K   14 downloads

This algorithm does not depend on distribution of X and Y coordinates in source list.

#10
PGP_Protector

PGP_Protector

    Programming Professional

  • Members
  • PipPipPipPipPip
  • 253 posts
If you don't wish to code your own Graphing utilites fully, I've used Gigasoft Proessentials Graphing utilities at my Job, and they work wonderful (and quickly) allow Zooming / Tons of different graphs & good speed. (They also have a free trial that you can download)

#11
sharmilakumari

sharmilakumari

    Newbie

  • Members
  • PipPip
  • 11 posts
Thanks

Edited by sharmilakumari, 14 July 2010 - 03:57 AM.


#12
sharmilakumari

sharmilakumari

    Newbie

  • Members
  • PipPip
  • 11 posts
Thanks

Edited by sharmilakumari, 14 July 2010 - 03:56 AM.





1 user(s) are reading this topic

0 members, 1 guests, 0 anonymous users