The method proposed herein - to solve or at least facilitate approximation for the traveling salesman problem - uses vector mathematics, regressions, and optimizations. Large amounts of testing would be required to determine whether it really is a foolproof solution in all cases but it is hypothesized that it will at least give us a new method for approximating a solution in any given plot of points. I will now lay out a proposed algorithm then discuss the method.

Step 1:

Plot all points. (in positive space for simplicity) And find the centroid location.

Step 2:

At each individual point on the plot; calculate a best fit regression line that goes exactly through the given point and is (besides that restriction) a best fit approximation for the entire data set.

Step 3:

For each of the regression lines above (one for each point) calculate the R squared (coefficient of determination) of each line.

Step 4:

Create a vector field so there is a vector situated at the coordinates of each point in the plot that we started with. The direction of the vector should be in the direction of that point's respective regression line and shall be oriented in the direction closest to pointing to the centroid. The magnitude of the vector should be the R squared (or similar method) of that point's regression line multiplied by the RMS distance of that point to the centroid.

Now comes connecting the points:

Step 5:

Start at the point of the largest magnitude vector for simplicity.

Step 6:

In determining which point to travel to next; determine every other points vector sum with the current point. Also determine the distance between every other point and the current point. Maximize the value of the magnitude of the vector sum between the current point with each other point divided by the distance between the points. Depending on what units are used and whether or not they are scaled between 0 and 1, a coefficient may be used to balance the equation.

So what we are doing here is an attempt to assign each point in the plot a value. The desire is that this value would give a sort of "connect the dots" effect where you connect one point to the next based on the order of these values. For example lets say we have 5 points and following an equaiton or algorithm we can assign each point a ranking between 1 and 5. Lets say we start at point 1 and just draw a line to point 2, then continue that line to point 3 and on to point 4 then point 5. If we could achieve a system of ranking the points in some way like this then finding the least possible distance to travel through all points with straight lines would be as simple as connecting the dots. In trying to determine how to rank the points I thought of some factors that would go into whether a path is optimized.

Here is a list of some factors:

1. distance between current point and proposed next point

2. how good a points given regression line is an approximation of the data

3. the "outlier-ness" of each point

It seemed to me that the most obvious factor of course was the distance the next point is from the current point. Obviously you would want to travel to closer points preferably to travelling across the entire plot multiple times.

The other being how far the proposed next point is from the best fit line for the data. If we choose to start with the most outlier point possible then the direction we will want to head towards is towards the rest of the data.

And the last factor I figured was that one should try to go towards the outliers. For example if our points are in a circle we want to go along the outer most edge of the circle, not cutting across it and having to go back.

So to fulfill these factors I stared thinking of how we could possibly rank points based on these factors. Vectors are great ways to assign a type of value to a point that contains a lot of information. The magnitude of a vector could designate the standard deviation (R squared) of the best fit line that goes through it. This means that points whos best fit line that are a good approximation of the plot have a larger magnitude. This can help us determine how many options we have when leaving that point. We want to travel somewhat towards the rest of the points so having a point with a good regression R squared value means we are going in a good direction. We can determine if the next point we pick puts us on that trajectory if the vector addition of the current point with the proposed next point is a large number. The bigger it is the closer we are to that ideal direction towards the rest of the points. In order to take into account the need to "stick to the edges" so we don't miss the outside points is taken into account by the rms value of the point to the centroid of the plot. The larger this number the better, the more of an outlier our point is. We want to prioritize these outlier points so we don't miss them and have to come all the way back to them. Of course when we pass through a point on our journey, that point is now no longer in the pool of possible canidates for future points.

In conclusion we created a method to assign a independent objective value to each point in a 2D plot of point locations. Using these objective values we can "connect the dots" and trace out a hopefully optimized "shortest path" through every point by simply following the locations of the optimized outputs from each point along the journey. This is accomplished by assigning a vector to each point by using 3 factors; 1. The direction of the regression line originating at every given point and approximating the best fit through the rest of the data. 2. The magnitude of the R squared value of that regression line, 3. The magnitude of the RMS value of each point from the centroid. Next we devised a method to traverse these values using the distance from the current point to the proposed next point and using vector addition to find the best candidate for the next point on the journey.

Here is an example plot of 6 points:

Step 1:

Plot all points. (in positive space for simplicity) And find the centroid location.

Step 2:

At each individual point on the plot; calculate a best fit regression line that goes exactly through the given point and is (besides that restriction) a best fit approximation for the entire data set.

Step 3:

For each of the regression lines above (one for each point) calculate the R squared (coefficient of determination) of each line.

Step 4:

Create a vector field so there is a vector situated at the coordinates of each point in the plot that we started with. The direction of the vector should be in the direction of that point's respective regression line and shall be oriented in the direction closest to pointing to the centroid. The magnitude of the vector should be the R squared (or similar method) of that point's regression line multiplied by the RMS distance of that point to the centroid.

Now comes connecting the points:

Step 5:

Start at the point of the largest magnitude vector for simplicity.

Step 6:

In determining which point to travel to next; determine every other points vector sum with the current point. Also determine the distance between every other point and the current point. Maximize the value of the magnitude of the vector sum between the current point with each other point divided by the distance between the points. Depending on what units are used and whether or not they are scaled between 0 and 1, a coefficient may be used to balance the equation.

So what we are doing here is an attempt to assign each point in the plot a value. The desire is that this value would give a sort of "connect the dots" effect where you connect one point to the next based on the order of these values. For example lets say we have 5 points and following an equaiton or algorithm we can assign each point a ranking between 1 and 5. Lets say we start at point 1 and just draw a line to point 2, then continue that line to point 3 and on to point 4 then point 5. If we could achieve a system of ranking the points in some way like this then finding the least possible distance to travel through all points with straight lines would be as simple as connecting the dots. In trying to determine how to rank the points I thought of some factors that would go into whether a path is optimized.

Here is a list of some factors:

1. distance between current point and proposed next point

2. how good a points given regression line is an approximation of the data

3. the "outlier-ness" of each point

It seemed to me that the most obvious factor of course was the distance the next point is from the current point. Obviously you would want to travel to closer points preferably to travelling across the entire plot multiple times.

The other being how far the proposed next point is from the best fit line for the data. If we choose to start with the most outlier point possible then the direction we will want to head towards is towards the rest of the data.

And the last factor I figured was that one should try to go towards the outliers. For example if our points are in a circle we want to go along the outer most edge of the circle, not cutting across it and having to go back.

So to fulfill these factors I stared thinking of how we could possibly rank points based on these factors. Vectors are great ways to assign a type of value to a point that contains a lot of information. The magnitude of a vector could designate the standard deviation (R squared) of the best fit line that goes through it. This means that points whos best fit line that are a good approximation of the plot have a larger magnitude. This can help us determine how many options we have when leaving that point. We want to travel somewhat towards the rest of the points so having a point with a good regression R squared value means we are going in a good direction. We can determine if the next point we pick puts us on that trajectory if the vector addition of the current point with the proposed next point is a large number. The bigger it is the closer we are to that ideal direction towards the rest of the points. In order to take into account the need to "stick to the edges" so we don't miss the outside points is taken into account by the rms value of the point to the centroid of the plot. The larger this number the better, the more of an outlier our point is. We want to prioritize these outlier points so we don't miss them and have to come all the way back to them. Of course when we pass through a point on our journey, that point is now no longer in the pool of possible canidates for future points.

In conclusion we created a method to assign a independent objective value to each point in a 2D plot of point locations. Using these objective values we can "connect the dots" and trace out a hopefully optimized "shortest path" through every point by simply following the locations of the optimized outputs from each point along the journey. This is accomplished by assigning a vector to each point by using 3 factors; 1. The direction of the regression line originating at every given point and approximating the best fit through the rest of the data. 2. The magnitude of the R squared value of that regression line, 3. The magnitude of the RMS value of each point from the centroid. Next we devised a method to traverse these values using the distance from the current point to the proposed next point and using vector addition to find the best candidate for the next point on the journey.

Here is an example plot of 6 points:

Thanks for sharing, please keep an update about this info. custom dissertation writing service love to read it more. i like this site too much.

ReplyDeleteThat's awesome, you've explained everything that I needed for https://languagelearning9.webnode.com/. These algorithms were always a problem for me.

ReplyDelete